# List of aggregators

The aggregators are similar to functions, but they benefit from a richer syntax with options. Below, we list the aggregators supported by Envision.

## Aggregator usage

There are two main usages for aggregators, whether a join between tables is performed or not.

### Simple aggregation

When there is no join, the general usage for an aggregator `agg`

is:

`u = agg(v) by [w1, w2, w3] sort [w4, w5] if c or x over [a .. b]`

Where:

`agg`

is the name of the aggregator.`by`

is an option indicating the groups to be used when aggregating. It is possible to aggregate against one or more values.`sort`

is an option indicating the sorting order to be used when aggregating. It is possible to sort against one or more values.`if`

is an aggregator filter. When present, all lines that are`false`

are omitted from the aggregation.`or`

is an option indicating the value to return if the group is empty.`over`

is an option used to define an aggregation time-span, with boundaries relative to the date of the target line.

When `by`

is not specified, the grouping happens according to the context, typically using the left-side of the statement in the case of an assignment. It is possible to list multiple variables after `by`

; if only one variable is used, then the parentheses are not required.

The `sort`

option has typically no impact on most aggregators, e.g. `max`

, because those operators do not depend on any sort order in the first place. However, we suggest to leverage this option with `first`

and `last`

aggregators.

When `if`

is not specified, all lines are included that are equivalent to `if true`

.

When `or`

is not specified, default value of each aggregator/type is returned when the grouping is empty. The default values for each aggregator are defined below.

The `over`

option is available when aggregating data into a calendar table (see also Calendar tables). The boundaries of the time interval are expressed in days, weeks or months and they are inclusive. For example, considering a table Day, the option `over [-5 .. 0]`

will perform an aggregation over 6 days, namely the day of the target line and the 5 previous days.

### Aggregation with table joins

The general syntax for this second case is similar but different to the one described above:

`u = agg(v) by [w1, w2, w3] sort [w4, w5] if c at [z1, z2, z3] or x`

The `at`

option indicates the groups used when joining the tables. Note that there must be a one-to-one correspondence between these groups and the groups used in the `by`

option. As a consequence, the `by`

option is mandatory and the number of entries specified in the `by`

and in the `at`

options must be identical.

## Basic aggregators

The most commonly used aggregators are:

`aresame`

: is defined as`aresame(x) == distinct(x) <= 1`

but applies to all types supported by`same()`

whereas`distinct()`

only supports simple types. Aggregator`aresame()`

returns`true`

if and only if`same()`

would not fail on that group.`avg`

: numeric argument only, returns the average line value. The`avg`

of an empty group is`0`

.`count`

: counts the number of lines that are either`true`

(Booleans) or non-zero (numbers) or non-empty (text). The`count`

of an empty group is`0`

.`distinct`

: counts the number of distinct values. The`distinct`

of an empty group is`0`

.`first`

: returns the value associated to the earliest line, as defined per the`sort`

option.`last`

: returns the value associated to the latest line, as defined per the`sort`

option.`max`

: returns the highest value. The`max`

of an empty group is`0`

.`median`

: behaves like the MEDIAN function in Excel. The`median`

of an empty group is`0`

.`min`

: returns the lowest value. The`min`

of an empty group is`0`

.`product`

: numeric argument only, returns the product of line values. The`product`

of an empty group is`1`

.`ratio`

: returns the percentage (between`0`

and`1`

) of lines that are either`true`

(Booleans) or non-zero (numbers) or non-empty (text). The`ratio`

of an empty group is`1`

.`same`

: expects all the grouped values to be identical (fails otherwise) and returns this value.`sum`

: numeric argument only, returns the sum of line values. The`sum`

of an empty group is`0`

.

The `first`

, `last`

and `same`

aggregators of an empty group return the default value accordingly to the input data type.

## Logical aggregators

The aggregators below returns a Boolean value:

`all`

: returns`true`

if all the input lines are`true`

and returns`false`

otherwise.`any`

: returns`false`

if all the input lines are`false`

and returns`true`

otherwise.`exists`

: returns`true`

if there is at least one line passed to the aggregator.

Any logical aggregator of an empty group returns `false`

.

## Temporal aggregators

These aggregators apply only to tables that benefit from a `Date`

column:

`latest`

(cross-table, special case): returns the most recent value observed.`latest`

is not compatible with any of the available options.

## Statistical aggregators

The aggregators below provide statistical calculations:

`entropy`

: returns the Shannon Entropy of the group. The value is returned expressed in*shannons*. The`entropy`

of an empty group is`0`

.`mode`

: similar to the MODE function in Excel. In the case where more than one value have the highest number of occurrences, the aggregator returns the smallest of these values (different from) Excel (see Ordering of values). The`mode`

of an empty group returns the default value accordingly to the input data type.`percentile`

: similar to the PERCENTILE.INC function in Excel. It requires two arguments: a vector of data and a number representing the percentile target (included between`0`

and`1`

).`stdev`

: similar to the STDEV function in Excel. The`stdev`

of an empty group is`1`

.`stdevp`

: similar to the STDEV.P function in Excel. The`stdevp`

of an empty group is`1`

.

## Random variables

Ranvars can be generated through aggregation (see also algebra of ranvars):

`ranvar(v)`

: returns the empirical probability distribution of the associated random variable, using the numeric observations provided in the input vector.`ranvar(v1, v2)`

: two-argument aggregator that returns the empirical probability distribution of the associated random variable, using the numeric observations provided in the input vector`v1`

along with the weights contained in vector`v2`

.`sumr`

: returns the sum of additive convolutions of random variables. Just like`sum()`

is the aggregator associated to the`+`

operator, the aggregator`sumr()`

is associated to the`+*`

operator.