Index expressions
There are several situations in Envision where one or more vectors are provided with the intent to group, sort or look up data. These are:
- The
by
andat
grouping options when aggregating. - The
sort
andscan
ordering options when aggregating or scanning. - The
scan
ordering option of for and each loops. - The
order by
ordering option of a show statement or write statement. - The
group by
grouping option of a show statement. - The
by
grouping option when creating a table. - The keys in a lookup.
The universal syntax for these situations is to have a tuple of columns surrounded by square brackets []
(the brackets can be omitted if only one column is provided).
Ordering index expressions
Supported types
When used for ordering, all vectors in the index expression must be of a type that can be sorted. These types are text
, number
, boolean
, date
, week
and month
, as well as the internal types ordinal
, double
and long
.
table T = extend.range(5)
T.M = T.N mod 2
show table "Values" with
T.M
T.N
order by [T.M, T.N]
This example produces the following output:
M | N |
---|---|
0 | 2 |
0 | 4 |
1 | 1 |
1 | 3 |
1 | 5 |
Complex types, such as markdown
, ranvar
or zedfunc
cannot be ordered.
Enum values
While enums could be ordered, the order of the enum is optimized for performance and subject to change even within a single script, and is almost always different from the order of the text values. By design, Envision will report an error whenever a text value cannot be safely replaced with an enum value, which is why it is not allowed to use an enum directly in an order by
, sort
or scan
. If the order of the enum is acceptable, it can be used by using the hash
function:
table enum T = "A", "B", "C", "D", "E"
show table "Values" with
T.Label
order by hash(T.Value)
As of October 2024, this example will produce the following output:
Label |
---|
C |
D |
A |
B |
E |
Ascending and descending
By default, ordering is done in ascending lexicographical order. However, it is possible to reverse the order on individual vectors by adding the desc
keyword after the vector:
table T = extend.range(5)
T.M = T.N mod 2
show table "Values" with
T.M
T.N
order by [T.M desc, T.N]
This example produces the following output:
M | N |
---|---|
1 | 1 |
1 | 3 |
1 | 5 |
0 | 2 |
0 | 4 |
Tie breakers
A tie is the situation in which several values have the same ordering key. Envision explicitly leaves unspecified the order in which tied lines are sorted, and this order may change arbitrarily over time, whether by the release of new Envision versions, or by changes made to the script causing the optimizer to re-use indices.
In particular, the order created by a scan 1
or sort 1
is arbitrary, and it is therefore not recommended. In a situation where the order truly irrelevant, there are usually ways to make that irrelevance more explicit. For example, replacing first
and last
with whichever
:
table T = extend.range(5)
A = first(T.N) sort T.N // will be 1
B = first(T.N) sort [T.N desc] // will be 5
C = first(T.N) sort 1 // value not specified
D = whichever(T.N) // value not specified
The scan auto
is a shortcut that uses the primary dimension of the table as the scan order. If there are several primary dimensions (as for a cross table), the dimensions are all used in left-to-right order. If the primary dimension is an enum, the order of iteration will be arbitrary.
Tuples
Ordering operations do not currently support tuples. It is necessary to deconstruct the tuple ahead of time and use its individual components in the ordering index expression instead.
Grouping index expressions
This includes both the group-creation options by
and at
, and the keys of a lookup expression.
Supported types
When used for grouping, all vectors in the index expression must be of a type that can be compared for equality with good performance. These types are text
, number
, boolean
, date
, week
, month
, and flagset
, all enums, as well as the internal types ordinal
, double
and long
.
Tuples
It is possible to use tuples in index expressions, so long as all their components are of supported types.
If the index expression is the entire tuple, then the tuple variable should be used directly without surrounding []
:
table T = extend.range(5)
T.M = T.N mod 2
T.K = (T.M, T.N)
table U = single by T.K
If the index expression contains other columns in addition to the tuple, then the tuple variable should be unpacked as part of the index expression:
table T = extend.range(10)
T.M2 = T.N mod 2
T.M3 = T.N mod 3
T.M = (T.M2, T.M3)
table U = single by [...T.M, T.N]
Additional considerations
Whether for grouping, ordering or lookups, index expressions always require the construction of an index data structure, which can be relatively costly.
Because of this, an index expression used in for, each, autodiff or montecarlo blocks is a potential performance issue, since an index data structure would need to be constructed millions of times, once for every iteration !
In those blocks, Envision rejects index expressions that include data computed inside the block. By being dependent only on data available before the block, the index data structure can be constructed only once before the block execution begins, instead of once per iteration.