Filtering
Filtering a table means identifying the lines that should be kept and those that should be removed. For the readers that are familiar with SQL, the filtering capabilities are those associated with the WHERE
keyword. Envision provides several filtering mechanisms, the most notable ones are reviewed below.
Table of contents
where
blocks
The first filtering mechanism is the where
block. The keyword where
is followed by a condition, i.e. an expression that evaluates to the Boolean type. The where
statement is followed by a block, which requires an extra level of indentation, as illustrated with:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity < 5 // Level 0 of indent
show table "Products" with // Level 1 of indent!
Orders.Pid // Level 2 of indent!
Orders.OrderDate
Orders.Quantity
This script displays the following table, given below, where (as expected) the second line of the original Orders
table has been filtered.
Pid | OrderDate | Quantity |
---|---|---|
apple | Apr 15, 2020 | 3 |
orange | Apr 16, 2020 | 2 |
The where
filter is active until the block terminates. The following script adds a second tile within the filtered block, and a third tile after exiting the block:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity < 5
show table "Products" with
Orders.Pid
Orders.OrderDate
Orders.Quantity
show scalar "Filtered Total (5)" with // Level 1 of indent
sum(Orders.Quantity) // Level 2 of indent
show scalar "Unfiltered Total (12)" with // Level 0 of indent
sum(Orders.Quantity) // Level 1 of indent
Exiting the (filtering) block is done by decreasing the indent from one level. Thus, the last two tiles display different results. The second tile displays 5
as it operates over the filtered table. The third tile displays 12
as it operates over the unfiltered table.
Any table can be filtered except the scalar table in general circumstances. Indeed, filtering the scalar table would lead to counterintuitive behaviors, not aligned with what practitioners would expect from scalar variables. However, there are situations where even the scalar table can be filtered, most notably user-defined function and each
blocks, discussed later.
Nested where
blocks
For complex filtering strategies, it is possible to use complex Boolean expressions, but Envision also supports nesting where
blocks. For example, the Boolean expression below that uses an and
operator:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity >= 0 and Orders.Quantity < 6
show scalar "Filtered Total (5)" with
sum(Orders.Quantity)
The latter part of this script is strictly equivalent to:
where Orders.Quantity >= 0 // Level 0 of indent
where Orders.Quantity < 6 // Level 1 of indent
show scalar "Filtered Total (5)" with // Level 2 of indent
sum(Orders.Quantity)
Which can even be simplified with:
where Orders.Quantity >= 0 // Level 0 of indent
where Orders.Quantity < 6 // Level 0 of indent
show scalar "Filtered Total (5)" with // Level 1 of indent
sum(Orders.Quantity)
As Envision does not require extra indentation when multiple where
filters are used.
In the example above, opting for a complex Boolean expression or for two where
statements was an arbitrary choice, however, when several tables are involved, several where
statements need to be used as well.
No-nesting filters with keep where
It’s not always convenient to introduce an extra level of indentation when filtering data. Thus, Envision provides a keyword modifier keep
that enforces the filtering until the end of the current scope. For example, when keep where
is used at the level zero of indentation, the filter applies until the end of the script, as illustrated with:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
keep where Orders.Quantity < 6
show scalar "Filtered Total (5)" with // Level 0 of indent
sum(Orders.Quantity)
However, using keep where
does not imply that the filter remains active until the end of the script, only the end of the scope. For example:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity >= 0 // Level 0 of indent
keep where Orders.Quantity < 6 // Level 1 of indent
show scalar "Filtered Total (5)" with // Level 1 of indent
sum(Orders.Quantity)
show scalar "Unfiltered Total (12)" with // Level 0 of indent
sum(Orders.Quantity)
In this script, the scope of the keep where
filter ends with its own containing scope.
Suffix inline filters
As filters are ubiquitous in scripts, sometimes there is only a single expression that one seeks to filter. In this situation, Envision offers a more compact syntax as an alternative to the where
block as illustrated by:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
filtered = sum(Orders.Quantity) where Orders.Quantity < 6
show scalar "Filtered Total (5)" with filtered
The where
is suffixed to the aggregation expression. Aggregators will be revisited later in greater detail, however please note that the suffixed where
is a pure syntactic sugar replacing the nested where
blocks. The where
filter does not directly interact with the aggregator sum
itself.
This behavior becomes more evident when considering a filtered assignment, as illustrated by:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
Orders.Quantity = 10 where Orders.Quantity == 7
show scalar "Unfiltered Total (15)" with
sum(Orders.Quantity)
In the script above, the variable Orders.Quantity
is overwritten with the value 10
, but only for lines that verifies the condition Orders.Quantity == 7
.
Inferred filtering
When filtering one table, the intent is frequently to filter other tables as well. For example, if one filters a Products
table then one would expect that the Orders
table - where each order line is associated to a given product - to be filtered as well. Envision provides this behavior, as illustrated by the following example :
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table Products[pid] = by Orders.Pid
where Products.pid == "apple"
show scalar "Filtered Total (10)" with
sum(Orders.Quantity)
Here above, the filter is applied to the Products
table - we are spelling out the Products.pid
variable for the sake of clarity, but the short name pid
could have been used instead - and yet, as illustrated by the scalar tile below, the Orders
table gets filtered as well.
Filter inference helps with keeping Envision scripts concise, as a single filter can be used to filter an arbitrary large number of related tables. In particular, as we will see in the section dedicated to calendar elements, this mechanism is particularly useful to select a time range of interest, and get all the tables accordingly filtered time-wise.
Filtered assignments
Envision preserves all its capabilities when operating under one or several filters. This implies that new variables can be defined and that variables defined prior to the filter scope can be re-assigned. In this section, we clarify what happens when assignments are made within where
. Let’s revisit an example that we introduced earlier:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity == 7
Orders.Quantity = 10
show scalar "Unfiltered Total (15)" with
sum(Orders.Quantity)
In the script above, the variable Orders.Quantity
is defined prior to the where
block. This variable gets partially overwritten - due to the presence of the filter - and the results remain persistent after exiting the where
block.
It is also possible to define a new variable in a where
block, as illustrated by:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity == 7
Orders.DefinedInBlock = 10
show table "This works! (10)" with Orders.DefinedInBlock
The variable Orders.DefinedInBlock
is defined within the where
block, and used within the same block.
However, if we were to attempt using the variable Orders.DefinedInBlock
outside its originating scope, as done by:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity == 7
Orders.DefinedInBlock = 10
show table "" with Orders.DefinedInBlock // WRONG! Undefined variable 'Orders.DefinedInBlock'.
The script above does not compile in Envision. Indeed, the variable Orders.DefinedInBlock
has only been defined for a filtered version of the Orders
table. This variable has no defined values for non-filtered filtered lines, and thus Envision rejects the use of the Orders.DefinedInBlock
variable at the last line above, considering that this variable is undefined outside its scope.
Nevertheless, this restriction only applies to assignments related to filtered tables. Variables defined with unfiltered tables can be used outside of a where
scope :
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table Products[pid] = by Orders.Pid
where Orders.Quantity == 7
Products.DefinedInBlock = 10
show table "This works!" with Products.DefinedInBlock
The script above introduces a variable Products.DefinedInBlock
within the where
block. However, as this variable belongs to the Products
table, which remains unfiltered, this variable can be used after exiting the where
block, as illustrated by the last line of the script.
Roadmap: We plan to allow the usage of partially defined variables. For example, this would allow the above variable Orders.DefinedInBlock
to be used outside of its block. Variables would become nullables with null values on lines that are not defined. However, we do not plan to repeat the mistake that many languages made regarding null values. Our design will involve some guarding mechanisms to ensure that null values are never actually processed as such.
Filter tables
A table can be created by filtering another table. This is implicitly what is happening in Envision when a where
block is used. However, for convenience, within a where
block the filtered table keeps the name of its original unfiltered parent. Nevertheless, Envision offers the possibility to name a filtered table. This feature is referred to as filter tables.
This mechanism is supported by Envision, as illustrated by:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table FewOrders = where Orders.Quantity < 6 // filter table
show table "Values 3 and 2" with FewOrders.Quantity
The script above does two things of interest. First, the filter table FewOrders
is declared using two keywords: table
and where
. Second, the variable FewOrders.Quantity
is displayed without ever being explicitly assigned, it’s actually the data originating from Orders.Quantity
.
When a filter table is created, it copies all the vectors found in its source table. Under the hood however, no actual data copy takes place, it’s a pure referencing mechanism in Envision. Creating a filter table costs no more than introducing a where
block.
The source table can be broadcast, by default, to any of its filter tables. Thus it is possible to assign values from the source table directly to the filter table, as illustrated with:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table FewOrders = where Orders.Quantity < 6
FewOrders.QuantityBis = Orders.Quantity
show table "Values 3 and 2" with FewOrders.QuantityBis
This assignment works because, by construction, every single line of FewOrders
has exactly one originating line in Orders
. However, the converse is not true, and thus, some lines in Orders.Quantity
do not get assigned to anything in FewOrders.QuantityBis
.
Variables from the original (non-filter) table can be assigned freely to the filter table. However, doing the reverse assignment, from the filter table back to the original table, requires a bit more work. Indeed, as the filter table has fewer lines than its original counterpart, Envision must be told what value to use when the value is missing. This process is illustrated in:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table FewOrders = where Orders.Quantity < 6
FewOrders.Quantity = 12
Orders.Quantity = whichever(FewOrders.Quantity) default Orders.Quantity
show table "Updated values" with
Orders.Pid
Orders.OrderDate
Orders.Quantity
The displayed tile table is:
Pid | OrderDate | Quantity |
---|---|---|
apple | Apr 15, 2020 | 12 |
apple | Apr 16, 2020 | 7 |
orange | Apr 16, 2020 | 12 |
This script diverges from the previous one. The variable FewOrders.Quantity
is defined with a number literal, which gets broadcast to all the lines of the table FewOrders
. Then, Orders.Quantity
is overwritten using either the line from FewOrders
when it exists or the original Orders.Quantity
value otherwise.
The keyword whichever
is an Envision aggregator to be detailed later. For now, suffice to say that whichever
returns any matching value found in the FewOrders
table (at most one value in the present example, so there isn’t any ambiguity) with a fallback value specified as Orders.Quantity
if no matching value is found.
Also, it is possible to declare a filter table within a filtered scope, that is, within a where
block. In this case, the ambient filters are applied to the filter table. For example, the script above is strictly equivalent to the following:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
where Orders.Quantity < 6
table FewOrders = where true into Orders
FewOrders.Quantity = 12
Orders.Quantity = whichever(FewOrders.Quantity) default Orders.Quantity
show table "Updated values" with
Orders.Pid
Orders.OrderDate
Orders.Quantity
The script above leverages an explicit broadcast (as seen previously) with the keyword into
as it’s the Orders
table that we seek to filter, not the Scalar
one. The Boolean literal true
belongs like all literals to the Scalar
table.
Clone tables
A clone table is a special case of filter table, where no line has actually been filtered. This feature is intended as a scoping mechanism when writing Envision scripts. Working over a cloned table ensures that all its vectors remain contained in this cloned tabled without polluting the namespace of the source table. This can be handy when dealing with complex situations and avoid accidental reuse of, for example, very short names introduced for the sake of readability when dealing with complex mathematical expressions.
The canonical syntax to create a clone table is illustrated by:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table CloneOrders = where Orders.*
show table "Cloned!" with
CloneOrders.Pid
CloneOrders.OrderDate
CloneOrders.Quantity
The where Orders.*
is a syntactic sugar to replace Orders.true
(this shorthand can be used anywhere). This script can also be rewritten in the slightly more verbose but equivalent syntax:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table CloneOrders = where true into Orders
show table "Cloned!" with
CloneOrders.Pid
CloneOrders.OrderDate
CloneOrders.Quantity
However, we recommend to preferably use the where Orders.*
syntax when the intent is to clone a table for the sake of clarity.
Filter inference on filter tables
As detailed previously, Envision has a filter inference that propagates the effect of a filter from a table to its related tables. This mechanism also applies to filter tables. For example:
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 4, 15), 3 |]
[| "apple", date(2020, 4, 16), 7 |]
[| "orange", date(2020, 4, 16), 2 |]
table CloneOrders = where Orders.*
where Orders.Quantity < 6
show table "Filtered!" with
CloneOrders.Pid
CloneOrders.OrderDate
CloneOrders.Quantity
Which displays:
Pid | OrderDate | Quantity |
---|---|---|
apple | Apr 16, 2020 | 3 |
orange | Apr 16, 2020 | 2 |
The script above does not explicitly filter the table CloneOrders
but the table Orders
instead, and yet, the table CloneOrders
ends up filtered. This illustrates Envision’s filter inference mechanism. This mechanism applies for all filter tables, not just clone tables.
in
operator
While Envision offers mechanisms to set up and enforce natural joins between tables, input data might be partially inconsistent. For example, the order history may refer to product identifiers that don’t exist anymore. In order to support this class of scenarios, Envision offers the operator in
which allows to check whether the values found in a vector have a counterpart - or not - in another vector.
Let’s illustrate the in
operator with the following script:
table Products = with
[| as Product, as Price |]
[| "apple", 1.25 |]
[| "banana", 0.75 |]
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 8, 15), 3 |]
[| "apple", date(2020, 8, 16), 7 |]
[| "orange", date(2020, 8, 16), 2 |]
where Orders.Pid in Products.Product
Orders.Amount = Orders.Quantity * single(Products.Price) \
by Products.Product at Orders.Pid
show table "Orders" a1c3 with
Orders.Pid
Orders.OrderDate
Orders.Amount
The resulting table gets displayed:
product | OrderDate | Amount |
---|---|---|
apple | Aug 15, 2020 | 3.75 |
apple | Aug 16, 2020 | 8.75 |
In the above script, the expression Orders.Pid in Products.Product
results in a Boolean vector associated with the table Orders
which is true
if the value of the line is found in the vector Products.Product
.
It would also have been possible to instead write:
Orders.Amount = Orders.Quantity * single(Products.Price) \
by Products.Product at Orders.Pid
where Orders.Pid in Products.Product
show table "Orders" a1c3 with
Orders.Pid
Orders.OrderDate
Orders.Amount
However, the computation of Orders.Amount
takes the needless risk of defaulting when facing empty groups. Placing the definition Orders.Amount
within the block filtered via the in
operator ensures that this problem does not happen.
Filtered dimension assignment
The approach outlined by the in
operator can be refined further with regards of the dimensions and of the natural joins that exist between tables. This syntax, and its alternatives, is rediscussed in greater details in the section about secondary dimensions. Let’s illustrate this operation with the following script, revising the one of the previous section, that creates a natural join between two initially independent tables:
table Products[product] = with
[| as Product, as Price |]
[| "apple", 1.25 |]
[| "banana", 0.75 |]
table Orders = with
[| as Pid, as OrderDate, as Quantity |]
[| "apple", date(2020, 8, 15), 3 |]
[| "apple", date(2020, 8, 16), 7 |]
[| "orange", date(2020, 8, 16), 2 |]
where Orders.product = Orders.Pid
Orders.Amount = Products.Price * Orders.Quantity
show table "Orders" a1c3 with
product
Orders.OrderDate
Orders.Amount
The resulting table gets displayed:
product | OrderDate | Amount |
---|---|---|
apple | Aug 15, 2020 | 3.75 |
apple | Aug 16, 2020 | 8.75 |
In the above script, the expression Orders.product = Orders.Pid
results in:
- a Boolean vector associated with the table
Orders
which istrue
if the value of the line is found in the vectorproduct
- just like thein
operator. - the association of
product
as a foreign dimension to the filtered tableOrders
. This allows the later calculation ofOrders.Amount
withoutby/at
.
In particular, it would not have been possible to instead write:
Orders.Amount = Products.Price * Orders.Quantity // WRONG!
where Orders.product = Orders.Pid
show table "Orders" a1c3 with
product
Orders.OrderDate
Orders.Amount
Indeed, outside the filtered scope, Envision can’t prove that every Orders.Pid
has a counterpart in product
. As a matter of fact, this isn’t even the case.
More generally, the filtered assignment offers the possibility to bind a dimension with the vector of another table. The expression T.dim = T.expr
returns a Boolean vector in T
where the line is true if and only if the corresponding value in T.expr
appears anywhere in the entire dimension dim
. In addition, the foreign dimension dim
becomes a foreign dimension of the table T
. It could be later referred as T.dim
, but the simpler syntax dim
is prefered.
In a later section, we will see how read
statements can be used to create tables from input files while setting-up the dimensions and their relationships via the keyword expect
. However, when dealing with raw input files whose integrity can’t be fully trusted, enforcing relations at the read
level isn’t practical because the Envision execution stops at the first foreign dimension error encountered - i.e. an Orders.Pid
value that doesn’t have a counterpart in product
. Instead, the combination of the in
operator and of the filtered dimension assignement offer a more flexible approach (but slightly more verbose and slightly less performant as well) that allows to process both correct and incorrect input data.