Index expressions

There are several situations in Envision where one or more vectors are provided with the intent to group, sort or look up data. These are:

The universal syntax for these situations is to have a tuple of columns surrounded by square brackets [] (the brackets can be omitted if only one column is provided).

Ordering index expressions

Supported types

When used for ordering, all vectors in the index expression must be of a type that can be sorted. These types are text, number, boolean, date, week and month, as well as the internal types ordinal, double and long.

table T = extend.range(5)
T.M = T.N mod 2
show table "Values" with 
  T.M
  T.N
  order by [T.M, T.N]

This example produces the following output:

M N
0 2
0 4
1 1
1 3
1 5

Complex types, such as markdown, ranvar or zedfunc cannot be ordered.

Enum values

While enums could be ordered, the order of the enum is optimized for performance and subject to change even within a single script, and is almost always different from the order of the text values. By design, Envision will report an error whenever a text value cannot be safely replaced with an enum value, which is why it is not allowed to use an enum directly in an order by, sort or scan. If the order of the enum is acceptable, it can be used by using the hash function:

table enum T = "A", "B", "C", "D", "E"
show table "Values" with 
  T.Label
  order by hash(T.Value)

As of October 2024, this example will produce the following output:

Label
C
D
A
B
E

Ascending and descending

By default, ordering is done in ascending lexicographical order. However, it is possible to reverse the order on individual vectors by adding the desc keyword after the vector:

table T = extend.range(5)
T.M = T.N mod 2
show table "Values" with 
  T.M
  T.N
  order by [T.M desc, T.N]

This example produces the following output:

M N
1 1
1 3
1 5
0 2
0 4

Tie breakers

A tie is the situation in which several values have the same ordering key. Envision explicitly leaves unspecified the order in which tied lines are sorted, and this order may change arbitrarily over time, whether by the release of new Envision versions, or by changes made to the script causing the optimizer to re-use indices.

In particular, the order created by a scan 1 or sort 1 is arbitrary, and it is therefore not recommended. In a situation where the order truly irrelevant, there are usually ways to make that irrelevance more explicit. For example, replacing first and last with whichever:

table T = extend.range(5)

A = first(T.N) sort T.N        // will be 1
B = first(T.N) sort [T.N desc] // will be 5
C = first(T.N) sort 1          // value not specified
D = whichever(T.N)             // value not specified

The scan auto is a shortcut that uses the primary dimension of the table as the scan order. If there are several primary dimensions (as for a cross table), the dimensions are all used in left-to-right order. If the primary dimension is an enum, the order of iteration will be arbitrary.

Tuples

Ordering operations do not currently support tuples. It is necessary to deconstruct the tuple ahead of time and use its individual components in the ordering index expression instead.

Grouping index expressions

This includes both the group-creation options by and at, and the keys of a lookup expression.

Supported types

When used for grouping, all vectors in the index expression must be of a type that can be compared for equality with good performance. These types are text, number, boolean, date, week, month, and flagset, all enums, as well as the internal types ordinal, double and long.

Tuples

It is possible to use tuples in index expressions, so long as all their components are of supported types.

If the index expression is the entire tuple, then the tuple variable should be used directly without surrounding []:

table T = extend.range(5)
T.M = T.N mod 2
T.K = (T.M, T.N)

table U = single by T.K

If the index expression contains other columns in addition to the tuple, then the tuple variable should be unpacked as part of the index expression:

table T = extend.range(10)
T.M2 = T.N mod 2
T.M3 = T.N mod 3
T.M  = (T.M2, T.M3)

table U = single by [...T.M, T.N]

Additional considerations

Whether for grouping, ordering or lookups, index expressions always require the construction of an index data structure, which can be relatively costly.

Because of this, an index expression used in for, each, autodiff or montecarlo blocks is a potential performance issue, since an index data structure would need to be constructed millions of times, once for every iteration !

In those blocks, Envision rejects index expressions that include data computed inside the block. By being dependent only on data available before the block, the index data structure can be constructed only once before the block execution begins, instead of once per iteration.

User Contributed Notes
0 notes + add a note