Histogrammar 0.7.1 for Python

All aggregation primitives descend from two classes, Container and Factory. Container defines all the methods for the primitive to aggregate and contain data, while Factory has methods for making containers. (In other languages, the two roles are distinct.)

The “functions” passed to these primitives may be Python lambda functions, normally defined functions (with def), or strings, which may be interpreted different ways by different back-ends. All primitives immediately wrap your functions as UserFcn, which are serializable (with pickle), may be cached (CachedFcn), and may have a name. Although the primitives wrap your function automatically, you may do it yourself to add features, like caching or a name. See serializable, cached, and named.

The primitive classes are listed below, grouped by kind. See the index for a list of all classes, members, and functions.

Zeroth kind: depend only on weights

Count: sum of weights
Count entries by accumulating the sum of all observed weights or a sum of transformed weights (e.g. sum of squares of weights).

First kind: aggregate a quantity without sub-aggregators

Sum: sum of a given quantity
Accumulate the (weighted) sum of a given quantity, calculated from the data.
Average: mean of a quantity
Accumulate the weighted mean of a given quantity.
Deviate: mean and variance
Accumulate the weighted mean and weighted variance of a given quantity.
AbsoluteErr: mean-absolute-error
Accumulate the weighted mean absolute error (MAE) of a quantity around zero.
Minimize: minimum value
Find the minimum value of a given quantity. If no data are observed, the result is NaN.
Maximize: maximum value
Find the maximum value of a given quantity. If no data are observed, the result is NaN.
Quantile: such as median, quartiles, quintiles, etc.
Estimate a quantile, such as 0.5 for median, (0.25, 0.75) for quartiles, or (0.2, 0.4, 0.6, 0.8) for quintiles, etc.

Second kind: group data by a quantity and pass to sub-aggregators

Bin: regular binning for histograms
Split a quantity into equally spaced bins between a low and high threshold and fill exactly one bin per datum.
SparselyBin: ignore zeros
Split a quantity into equally spaced bins, creating them whenever their entries would be non-zero. Exactly one sub-aggregator is filled per datum.
CentrallyBin: irregular but fully partitioning
Split a quantity into bins defined by irregularly spaced bin centers, with exactly one sub-aggregator filled per datum (the closest one).
AdaptivelyBin: for unknown distributions
Adaptively partition a domain into bins and fill them at the same time using a clustering algorithm. Each input datum contributes to exactly one final bin.
Categorize: string-valued bins, bar charts
Split a given quantity by its categorical value and fill only one category per datum.
Fraction: efficiency plots
Accumulate two aggregators, one containing only entries that pass a given selection (numerator) and another that contains all entries (denominator).
Stack: cumulative filling
Accumulates a suite of aggregators, each filtered with a tighter selection on the same quantity.
Partition: exclusive filling
Accumulate a suite of aggregators, each between two thresholds, filling exactly one per datum.
Select: apply a cut
Filter or weight data according to a given selection.
Limit: keep detail until entries is large
Accumulate an aggregator until its number of entries reaches a predefined limit.

Third kind: pass to all sub-aggregators

Label: directory with string-based keys
Accumulate any number of aggregators of the same type and label them with strings. Every sub-aggregator is filled with every input datum.
UntypedLabel: directory of different types
Accumulate any number of aggregators of any type and label them with strings. Every sub-aggregator is filled with every input datum.
Index: list with integer keys
Accumulate any number of aggregators of the same type in a list. Every sub-aggregator is filled with every input datum.
Branch: tuple of different types
Accumulate aggregators of different types, indexed by i0 through i9. Every sub-aggregator is filled with every input datum.

Fourth kind: collect sets of raw data

Bag: accumulate values for scatter plots
Accumulate raw numbers, vectors of numbers, or strings, with identical values merged.
Sample: reservoir sampling
Accumulate raw numbers, vectors of numbers, or strings, randomly replacing them with Reservoir Sampling when the number of values exceeds a limit.