# List of aggregators

The aggregators are similar to functions, but they benefit from a richer syntax with options. Below, we list the aggregators supported by Envision.

## Aggregator usage

There are two main usages for aggregators, whether a join between tables is performed or not.

### Simple aggregation

When there is no join, the general usage for an aggregator `agg`

is:

`u = agg(v) by [w1, w2, w3] sort [w4, w5] if c or x over [a .. b]`

Where:

`agg`

is the name of the aggregator.`by`

is an option indicating the groups to be used when aggregating. It is possible to aggregate against one or more values.`sort`

is an option indicating the sorting order to be used when aggregating. It is possible to sort against one or more values.`if`

is an aggregator filter. When present, all lines that are`false`

are omitted from the aggregation.`or`

is an option indicating the value to return if the group is empty.`over`

is an option used to define an aggregation time-span, with boundaries relative to the date of the target line.

When `by`

is not specified, the grouping happens according to the context, typically using the left-side of the statement in the case of an assignment. It is possible to list multiple variables after `by`

; if only one variable is used, then the parentheses are not required.

The `sort`

option has typically no impact on most aggregators, e.g. `max`

, because those operators do not depend on any sort order in the first place. However, we suggest to leverage this option with `first`

and `last`

aggregators.

When `if`

is not specified, all lines are included that are equivalent to `if true`

.

When `or`

is not specified, default values are returned when the grouping is empty: `0`

(zero) for number, `false`

for Booleans, the empty string for text and the zero distribution for distributions.

The `over`

option is available when aggregating data into a calendar table (see also Calendar tables). The boundaries of the time interval are expressed in days, weeks or months and they are inclusive. For example, considering a table Day, the option `over [-5 .. 0]`

will perform an aggregation over 6 days, namely the day of the target line and the 5 previous days.

### Aggregation with table joins

The general syntax for this second case is similar but different to the one described above:

`u = agg(v) by [w1, w2, w3] sort [w4, w5] if c at [z1, z2, z3] or x`

The `at`

option indicates the groups used when joining the tables. Note that there must be a one-to-one correspondence between these groups and the groups used in the `by`

option. As a consequence, the `by`

option is mandatory and the number of entries specified in the `by`

and in the `at`

options must be identical.

## Basic aggregators

The most commonly used aggregators are:

`avg`

: numeric argument only, returns the average line value.`count`

: counts the number of lines that are either`true`

(Booleans) or non-zero (numbers) or non-empty (text).`distinct`

: counts the number of distinct values.`first`

: returns the value associated to the earliest line, as defined per the`sort`

option.`last`

: returns the value associated to the latest line, as defined per the`sort`

option.`max`

: returns the highest value.`median`

: behaves like the MEDIAN function in Excel.`min`

: returns the lowest value.`product`

: numeric argument only, returns the product of line values.`ratio`

: returns the percentage (between 0 and 1) of Booleans that are true, or of numbers that are non-zero. The ratio of an empty group is 1.`same`

: expects all the grouped values to be identical (fails otherwise) and returns this value.`sum`

: numeric argument only, returns the sum of line values.

## Logical aggregators

The aggregators below returns a Boolean value:

`all`

: returns`true`

if all the input lines are`true`

and returns`false`

otherwise.`any`

: returns`false`

if all the input lines are`false`

and returns`true`

otherwise.`exists`

: returns`true`

if there is at least one line passed to the aggregator.

## Temporal aggregators

These aggregators apply only to tables that benefit from a `Date`

column:

`latest`

(cross-table, special case): returns the most recent value observed.`latest`

is not compatible with any of the available options.

## Statistical aggregators

The aggregators below provide statistical calculations:

`entropy`

: returns the Shannon Entropy of the group. The value is returned expressed in*shannons*.`mode`

: similar to the MODE function in Excel. In the case where more than one value have the highest number of occurrences, the aggregator returns the smallest of these values (different from Excel). If the input values are Booleans, “false” is taken as the smallest value.`percentile`

: similar to the PERCENTILE.INC function in Excel. It requires two arguments: a vector of data and a number representing the percentile target (included between 0 and 1).`stdev`

: similar to the STDEV function in Excel.`stdevp`

: similar to the STDEV.P function in Excel.

## Random variables

Distributions can be generated through aggregation (see also algebra of distributions):

`ranvar(T.X)`

: returns the empirical distribution of the associated random variable, using the numeric observations provided in the input vector.`sumr`

: returns the sum of additive convolutions of random variables. Just like`sum()`

is the aggregator associated to the`+`

operator, the aggregator`sumr()`

is associated to the`+*`

operator.