Data filtering is the bread and butter of supply chain analytics. Envision provides extensive support to achieve this. Filtering is supported by the
when conditions, which can alternatively be used to filter either large blocks of script at once, or to filter individual statements. This section provides a hands-on introduction to these concepts.
- An illustrating script
- Filter blocks
- Composing a condition
- Filtering tables by date
- Filtering items
- Filtering arbitrary tables
- Syntactic sugars for filtering
An illustrating script
Once again, let’s start with the sample dataset which should be accessible from the path
/sample in your Lokad account. The script below is moderately complex and illustrates some of the filtering patterns available in Envision. If you have not done so already, we advise you to read Making calculations with Envision first, before working with the information in this section.
read "/sample/Lokad_Items.tsv" read "/sample/Lokad_Orders.tsv" as O show label "Filtering data" a1f1 tomato oend := max(O.Date) when date > oend - 365 LastYearQty = sum(O.Quantity) where StockOnHand + StockOnOrder > LastYearQty show table "Overstocked items, +1 year of stock" a2f3 tomato with Id Name StockOnHand + StockOnOrder as "Stock" LastYearQty where O.NetAmount > 1000 show table "Large transactions over $1000" a4f5 tomato with Id Name O.Date O.Quantity O.NetAmount lastDay := monday(oend) firstDay := lastDay - 52 * 7 when date >= firstDay & date < lastDay Week.sold := sum(O.NetAmount) show linechart "Sold by week" a6f7 tomato unit:"$" with Week.sold
We suggest to start by copy-pasting this script into your Lokad account, and running it once in order to observe the resulting dashboard. If everything works correctly, you should see the following dashboard:
When one begins working on, for example, all historical data no older than last year, then most probably, there will be many calculations involved. In such a situation, the last year only filter should be applied to all such calculations. For this reason, with Envision, data filtering mostly happens through blocks of code. The script fragment below illustrates two nested filter blocks. A first block that starts on line 1 with a
when condition, followed by a second block that starts on line 3 with a
when date > oend - 365 LastYearQty = sum(O.Quantity) where StockOnHand + StockOnOrder > LastYearQty show table "Overstocked items, +1 year of stock" a2f3 tomato with Id Name StockOnHand + StockOnOrder as "Stock" LastYearQty
Envision is a language that is sensitive to whitespaces found at the beginning of each line. We refer to a sequence of lines that begin with the same count of whitespaces as a code block; and code blocks can be nested, i.e. a block can contain another block.
If it’s the first time you’re encountering such a programming pattern, this might feel a bit disconcerting. However, in practice, the Envision script editor gives you a lot of built-in support: when you press Enter at the end of a script line that contains a filter condition, the next line gets an automatic indentation with two extra whitespaces.
Developer’s note. Envision adopts a whitespacing pattern quite similar to the one used in Python. This approach fits well with Envision which also intends to provide a concise syntax to handle data. Since Envision has no loop and no branch, the degree of indentation rarely exceeds 3 levels, which makes these whitespaces very manageable in practice.
A filter block starts with a condition – more details on that below – which can be true or false for every line of every table. Within the filter block, only the lines where the condition is true remain present, and all the other lines get filtered. As the name suggests, the
when condition involves a temporal filtering, and it applies to all the tables indexed with a
Date column. The
where condition is usually used to filters the items, and when it does, the filter is applied to all the tables indexed with an
Once a block starts within an Envision script, it is as if all the table lines satisfying the filter were kept. Otherwise, everything else remains the same, and every piece of script that was possible to write outside the block can be moved within the block. In particular, it is possible to define another filter block within a block. This pattern is called nesting. In the example above, the
where block starting at line 3 is nested within the
when block that starts at line 1.
Composing a condition
A condition is an expression that can be evaluated as true or false. The notion of filtering is based on the composition of conditions, which are evaluated with the input data and possibly with results obtained from intermediate calculations. Envision offers the possibility to compose rich conditions, as illustrated by the following script:
lastDay := monday(oend) firstDay := lastDay - 52 * 7 when date >= firstDay & date < lastDay Week.sold := sum(O.NetAmount) show linechart "Sold by week" a6f7 tomato unit:"$" with Week.sold
In this snippet, the
and operator indicates that both expressions left and right of the operator need to be true. The logical operators supported by Envision are:
not. In addition, numbers can be compared with the numeric operators
<= (lower than or equal),
>= (greater than),
< (lower than),
> (greater than). The following snippet illustrates how these operators are combined and evaluated.
a := 1 > 10 // false b := not a // true c := a | b // true d := a & b // false e := 10 >= 3 | 5 > 7 // true show table "Conditions" with a, b, c, d, e
When a filter block is nested within another filter bock, it is as if an
and operator was used with the two conditions being placed on the left and right of the
In the example above, you might have noticed the presence of the
monday(oend) syntax. This is a function call to the function named
monday. For any date, this function returns the last Monday that happens no later (inclusive) than the date passed as argument. Hence, if a Monday is passed as argument, the function returns the same date.
monday() function is used to define a time window composed of whole weeks (instead of partial weeks). When performing a weekly aggregation, only whole weeks should be considered, otherwise, the first and the last data points might display puzzlingly low values, which merely reflect that only a partial week was used.
Filtering tables by date
Envision offers extensive support to filter tables against temporal conditions. The script we just saw filters all the lines that happen to be older than 1 year. Let’s have a closer look at the relevant piece of script.
oend := max(O.Date) when date > oend - 365 LastYearQty = sum(O.Quantity)
Envision internally treats dates as the integral number of days since January 1st, 2001. This offers numerous advantages including the possibility to perform arithmetic operations on dates. For example, subtracting 7 from a date gives the previous week, while subtracting 365 gives an (approximate) last year date.
date keyword also comes with a special behavior. Implicitly, the
date variable refers to all the tables that contain a
Date column. As there is only one table involved here, namely the
O table, it is as if we had written
O.Date > oend - 365 instead. However, the
date syntax is not only more concise, but it also applies to all the relevant tables at once. If instead of looking for the quantities sold, we would have been interested in the purchased quantities, this could have been written with:
// upload of PurchaseOrders read "/sample/Lokad_PurchaseOrders.tsv" as PO // snipped when date > oend - 365 LastYearQty = sum(PO.Quantity)
Beyond filtering lines by date, Envision also offers the possibility of filtering the items, which is another frequent need whenever one seeks to exclude certain products, locations, categories, etc. The script we saw at the beginning of this section illustrates how the scope can be restricted to items that are most qualified to be referred to as dead inventory. The relevant lines are specified below.
where StockOnHand + StockOnOrder > LastYearQty show table "Overstocked items, +1 year of stock" a2f3 tomato with Id Name StockOnHand + StockOnOrder as "Stock" LastYearQty
where keyword is followed by a condition. The condition applies to three vectors that implicitly belong to the
Items table. Indeed, this is the only table that can be referred to without having the table name as a prefix in the variable name. For example, one needs to use
O.NetAmount and not just
NetAmount when referring to the lines in the orders table
O. Here, the condition can be read as: include only the items where the sum of the stock-on-hand plus the stock-on-order is greater than the number of units sold last year.
Once a condition is defined for the
Items table, every table that includes an
Id column – that is, an identifier referring to the items in line with the Envision conventions – gets filtered similarly. This behavior is identical to temporal filtering introduced previously. As one begins to filter items, it does not make sense to keep sales order lines or purchase order lines that are no longer attached to any item, as these items have been filtered out of the scope.
Filtering items impacts the definition scope of newly computed vectors. Let’s illustrate this with a tiny piece of script.
where StockOnHand > 5 GreaterThanFive = "yes" show table "Hello" with Name GreaterThanFive // CORRECT! // from this line, we are outside the filter block show table "Hello" with Name GreaterThanFive // WRONG!
Line 10 is incorrect because the
GreaterThanFive vector has only been defined for the lines where the
StockOnHand > 5 condition was true. Thus, while this vector is correctly defined within the block, and thus can be used as illustrated by line 5, this vector cannot be used outside the filter block, because some its values would be left undefined. This situation can be fixed by making sure that the vector is properly defined for all its item values, as illustrated below.
GreaterThanFive = "no" where StockOnHand > 5 GreaterThanFive = "yes" show table "Hello" with Name GreaterThanFive // CORRECT! // from this line, we are outside the filter block show table "Hello" with Name GreaterThanFive // CORRECT!
This snippet begins on line 1 with a proper definition of the vector
GreaterThanFive across all items. This definition is revised on line 4 for a subset of the items. However, this revision does not change the fact that the
GreaterThanFive vector is explicitly defined for all items, and as a result, the display on line 12 is now correct.
Filtering arbitrary tables
While filtering dates or items is very useful, sometimes you need to filter one specific table precisely. Such a task can also be accomplished with the
where keyword. Let’s review the lines that illustrate this capability of Envision.
where O.NetAmount > 1000 show table "Large transactions over $1000" a4f5 tomato with Id Name O.Date O.Quantity O.NetAmount O.Client
Here, we are filtering the
O table to exclude all the table lines that are lower than $1000. The lines that are not filtered out are displayed through
show table on lines 2 and 3. This example illustrates how a single table can be filtered. This filter only impacts the
O table and all the other tables are left untouched by such a condition.
If a vector associated to the
O table is calculated within a filter block, then the access to this vector is restricted to the block itself. We have already observed this behavior for the items, now let’s take a look at how it applies to arbitrary tables as well.
where O.NetAmount > 1000 O.LargeTxn = "yes" show table "Large transactions" with Name O.LargeTxn // CORRECT! // last block line // indentation decreased, we are outside the block show table "Large transactions" with Name O.LargeTxn // WRONG!
O.LargeTxn vector is not defined for all the lines of the
O table, only line 5 is correct, while line 10 is incorrect. Like we did for the previous example, this script can be fixed by properly defining a
LargeTxn value for the entire
O table. This can be achieved with the script below.
O.LargeTxn = "no" where O.NetAmount > 1000 O.LargeTxn = "yes" show table "Large transactions" with Name O.LargeTxn // CORRECT! // last block line // indentation decreased, we are outside the block show table "Large transactions" with Name O.LargeTxn // CORRECT!
As a rule of thumb, Envision tries to allow block “leaks” as much as possible: a vector calculated within a block can be used outside the block, as long as this usage does not violate the rule that all the vector values are explicitly defined when the vector appears on the right side of an assignment.
Syntactic sugars for filtering
The syntax pattern used by Envision filter-and-indent for filtering is concise and readable, but when multiple filters are involved, the indentation may become a bit hard to decipher. Thus, Envision provides a couple of syntactic sugars, that is, alternative syntaxes which happen to require fewer indentation. This syntax is reviewed in this section.
Skipping indentations when multiple filters are used
Envision only requires 1 extra level of indentation per filter if filters are intended to be used separately. If only the most inner scope is of interest, then a single indentation level is needed, as illustrated by:
// each 'where' filter brings // its own indentation level where O.Quantity > 10 where StockOnHand < 100 show table "Filtered orders" with O.Quantity // but when multiple filters are used, // a single indentation is required where O.Quantity > 10 where StockOnHand < 100 // no indent here! show table "Filtered orders" with O.Quantity
The second block has the same semantics as the first one, but it requires only a single indentation. More generally, the syntax is:
where A when B where C show table "Filtered by A, B and C" with X // same as where A when B where C show table "Filtered by A, B and C" with X
Merging filters with the keyword
We have already seen that, in Envision, the Boolean operator AND was represented with the symbol
&. However, Envision also offers a keyword
and, which provides a slightly different semantic:
// the two nested 'where' filters where O.NetAmount > 1000 where StockOnHand > 10 show table "Filtered transactions" with Name O.Quantity // can be rewritten as a single filter with 'and' where O.NetAmount > 1000 and StockOnHand > 10 show table "Filtered transactions" with Name O.Quantity
and keyword is strictly equivalent to nesting
where filters. Through the
and keyword, it is possible to introduce several filters in sequence, with only a single indentation level. More generally, we have:
where A where B where C // snipped // can be rewritten where A and B and C // snipped
In practice, the
and keyword offers the possibility to merge multiple filters which are not intended to be used separately.
No-indent with the keyword
A frequent coding pattern consists of introducing filters at the very top of the script, in order to restrict the data analysis to a specific scope. While the filter syntax of Envision works relatively well for this scenario, the entire Envision script ends-up written with one or two level of indentations. The keyword
keep offers a way to remove these indentations:
// the 'where' filter introduces an indented block where O.Quantity > 10 // beginning of block show table "Inside the filter" with sum(O.Quantity) // end of block show table "Outside the filter" with sum(O.Quantity) // but when 'keep' is used, // the filter applies without indentation keep where O.Quantity > 10 show table "Inside the filter" with sum(O.Quantity)
keep should be placed before
when, and indicates that the filter takes effect without an indentation. The filter keeps being active until the end of the scope.
where A keep where B show table "Filtered by A and B" with X // end of A and B filters show table "Not filtered" with X
Thus, if the
keep is placed on a script line without any indentation, then the filter applies until the very end of the Envision script.
Suffixed inline filters
So far, all the filters that we have observed were written as filter blocks. However, Envision also provides an alternative syntax, more compact, which is known as condition suffixes. Let’s go back to the calculation of the quantities sold last year.
when date > oend - 365 LastYearQty = sum(O.Quantity)
This script can be rewritten as:
LastYearQty = sum(O.Quantity) when date > oend - 365
For readers familiar with relational databases, this syntax might appear to be somewhat closer to the way the
where conditions are written in SQL. Within Envision, this syntax is primarily a syntactic sugar to avoid introducing 1-line blocks when there is only a single statement to be written within the block. Both
when can be “suffixed” on the rightmost side of an assignment.
The aggregators can be filtered inline with
where conditions. Envision also offers the possibility to add an
else modifier on the condition. For example, it is incorrect to write:
oend := max(O.Date) lastDay := monday(oend) Week.sold := sum(O.NetAmount) when date < lastDay show linechart "Sold by week" with Week.sold // WRONG
Week.sold is not defined on the entire scope, as it has been filtered at the line above. Yet, by adding an
else option, we are correctly defining
Week.sold everywhere, which can then be displayed:
oend := max(O.Date) lastDay := monday(oend) Week.sold := sum(O.NetAmount) when date < lastDay else 0 show linechart "Sold by week" with Week.sold // CORRECT