Table sizes
Envision supports very large tables, up to several billion lines, but this comes at a performance cost, and a script left unsupervised where the input dataset steadily grows out of control can end up consuming significant amounts of resources. At the same time, there are several operations which do not support arbitrarily large tables. In order to avoid runtime failures, where the system discovers that a table is too large for a given operation, Envision lets the script author specify the constraint that “this table will be small enough”. Additionally, Envision also supports specifying the table to be “big enough”.
Table of contents
Maximum table size
In table definitions, Envision provides the max <number>
syntax after the table name in order to limit the number of lines in that table. This is supported in three contexts:
// In a 'read' statement
read "/Items.csv" as Items[id] max 5500 with
"Id" as id : text
Label : text
Category : text
read "/Orders.csv" as SalesOrders expect[id,date] with
"Id" as id : text
"Date" as date : date
Quantity : number
// In any 'table' statement
table Categories[cat] max 100 = by Items.Category
// As a standalone statement
expect table Week max 104
This limit is applied at runtime: as soon as the number of lines in the table has been determined, an error interrupts the script if that number exceeds the limit. As such, the main purpose of the max
statement is to serve as a safeguard against tables growing beyond the limits that were deemed reasonable when the script was originally written.
In particular, it is recommended to add expect table
constraints to the Day
, Week
or Month
tables, which have a tendency to grow over time as the order history deepens.
Abreviated number suffixes
For very high limits, Envision provides helpful numeric suffixes:
Unit | Abbreviation | Expansion |
---|---|---|
Thousands | max 1.2k |
`max 1200 |
Millions | max 31.7m |
max 31700000 |
Billions | max 9b |
max 9000000000 |
Minimum table size
Similar to max <number>
, Envision also supports min <number>
as a means to ensure the minimum size of a table at run-time. This is supported in the same three contexts:
// In a 'read' statement
read "/Items.csv" as Items[id] min 5500 with
"Id" as id : text
Label : text
Category : text
read "/Orders.csv" as SalesOrders expect[id,date] with
"Id" as id : text
"Date" as date : date
Quantity : number
// In any 'table' statement
table Categories[cat] min 100 = by Items.Category
// As a standalone statement
expect table Week min 104
If the minimum is not zero, the table will be examined at run-time to check if the minimum number of lines is reached, and an error will be produced otherwise.
By default, read
statements, non-partitioned path schemas, and table comprehensions have size min 1
, which can be overridden by providing an explicit min 0
, should the table be empty. Other tables have min 0
as a default.
Sizes inside where
and when
filters
The expect table
statement can be used to enforce a size limit within a filter. For example:
table PriceBreaks = with
[| as Qty, as Price |]
[| 1, 10 |]
[| 10, 9 |]
[| 100, 7.5 |]
where arglast(PriceBreaks.Qty <= 8) sort PriceBreaks.Qty
// The filter has left at most one line in table PriceBreaks
expect table PriceBreaks max 1
Small tables
If a table is flagged as small, it will support additional operations not available to normal tables. This includes:
- Appearing on the right side of a
cross
, - Being a
kept
variable in aneach
loop, - Being a
params
variable in anautodiff
block.
To flag the table as small, use the small
keyword in place of max
when defining a maximum table size:
// In a 'read' statement
read "/Items.csv" as Items[id] small 5500 expect [supplier] with
Id : text
Label : text
// In any 'table' statement
table Categories[cat] small 100 = by Items.Category
// As a standalone statement
expect table Week small 104
The table will then count as small, and the Envision compiler will verify that you respect all the rules for small tables. These are:
- A small table may not have more than 100 million lines.
- A small table that contains at least one text vector may not have more than 2.75 million lines.
- A small table that contains at least one ranvar or zedfunc vector may not have more than 1 million lines.
If a table does not qualify as small due to a vector, the Envision compiler will report which vector.
For your convenience, any table with less than 1 million lines automatically counts as small, whether that limit was enforced by a max 1m
statement, or because by design the table cannot have more than 1 million lines, such as Scalar
, Day
, Week
, Month
, Slices
and Files
.