# Filtering data

Data filtering is the bread and butter of supply chain analytics. Envision provides extensive support to achieve this. Filtering is supported by the where and when conditions, which can alternatively be used to filter either large blocks of script at once, or to filter individual statements. This section provides a hands-on introduction to these concepts.

## An illustrating script

Once again, let’s start with the sample dataset which should be accessible from the path /sample in your Lokad account. The script below is moderately complex and illustrates some of the filtering patterns available in Envision. If you have not done so already, we advise you to read Making calculations with Envision first, before working with the information in this section.

read "/sample/Lokad_Items.tsv" as Items with
Id : text
Name : text
StockOnHand : number
StockOnHand : number

Id : text
Date : date
Quantity : number
NetAmount : number

show label "Filtering data" a1f1 tomato

oend = max(Orders.Date)

when date > oend - 365
Items.LastYearQty = sum(Orders.Quantity)
where Items.StockOnHand + Items.StockOnOrder > Items.LastYearQty
show table "Overstocked items, +1 year of stock" a2f3 tomato with
Id
Items.Name
Items.StockOnHand + Items.StockOnOrder as "Stock"
Items.LastYearQty

where Orders.NetAmount > 1000
show table "Large transactions over $1000" a4f5 tomato with Id Items.Name Orders.Date Orders.Quantity Orders.NetAmount lastDay = monday(oend) firstDay = lastDay - 52 * 7 when date >= firstDay and date < lastDay Week.sold = sum(Orders.NetAmount) show linechart "Sold by week" a6f7 tomato unit:"$" with
Week.sold


We suggest to start by copy-pasting this script into your Lokad account, and running it once in order to observe the resulting dashboard. If everything works correctly, you should see the following dashboard:

## Filter blocks

When one begins working on, for example, all historical data no older than last year, then most probably, there will be many calculations involved. In such a situation, the last year only filter should be applied to all such calculations. For this reason, with Envision, data filtering mostly happens through blocks of code. The script fragment below illustrates two nested filter blocks. A first block that starts on line 1 with a when condition, followed by a second block that starts on line 3 with a where condition.

when date > oend - 365
Items.LastYearQty = sum(Orders.Quantity)
where Items.StockOnHand + Items.StockOnOrder > Items.LastYearQty
show table "Overstocked items, +1 year of stock" a2f3 tomato with
Id
Items.Name
Items.StockOnHand + Items.StockOnOrder as "Stock"
Items.LastYearQty


Envision is a language that is sensitive to whitespaces found at the beginning of each line. We refer to a sequence of lines that begin with the same count of whitespaces as a code block; and code blocks can be nested, i.e. a block can contain another block.

If it’s the first time you’re encountering such a programming pattern, this might feel a bit disconcerting. However, in practice, the Envision script editor gives you a lot of built-in support: when you press Enter at the end of a script line that contains a filter condition, the next line gets an automatic indentation with two extra whitespaces.

Developer’s note. Envision adopts a whitespacing pattern quite similar to the one used in Python. This approach fits well with Envision which also intends to provide a concise syntax to handle data. Since Envision has no loop and no branch, the degree of indentation rarely exceeds 3 levels, which makes these whitespaces very manageable in practice.

A filter block starts with a condition – more details on that below – which can be true or false for every line of every table. Within the filter block, only the lines where the condition is true remain present, and all the other lines get filtered. As the name suggests, the when condition involves a temporal filtering, and it applies to all the tables indexed with a Date column. The where condition is usually used to filters the items, and when it does, the filter is applied to all the tables indexed with an Id column.

Once a block starts within an Envision script, it is as if all the table lines satisfying the filter were kept. Otherwise, everything else remains the same, and every piece of script that was possible to write outside the block can be moved within the block. In particular, it is possible to define another filter block within a block. This pattern is called nesting. In the example above, the where block starting at line 3 is nested within the when block that starts at line 1.

## Composing a condition

A condition is an expression that can be evaluated as true or false. The notion of filtering is based on the composition of conditions, which are evaluated with the input data and possibly with results obtained from intermediate calculations. Envision offers the possibility to compose rich conditions, as illustrated by the following script:

lastDay = monday(oend)
firstDay = lastDay - 52 * 7
when date >= firstDay and date < lastDay
Week.sold = sum(Orders.NetAmount)
show linechart "Sold by week" a6f7 tomato unit:"$" with Week.sold  In this snippet, the and operator indicates that both expressions left and right of the operator need to be true. The logical operators supported by Envision are: and, or and not. In addition, numbers can be compared with the numeric operators == (equality), != (inequality), <= (lower than or equal), >= (greater than), < (lower than), > (greater than). The following snippet illustrates how these operators are combined and evaluated. a = 1 > 10 // false b = not a // true c = a or b // true d = a and b // false e = 10 >= 3 or 5 > 7 // true show table "Conditions" with a, b, c, d, e  When a filter block is nested within another filter bock, it is as if an and operator was used with the two conditions being placed on the left and right of the and operator. In the example above, you might have noticed the presence of the monday(oend) syntax. This is a function call to the function monday(). For any date, this function returns the last Monday that happens no later (inclusive) than the date passed as argument. Hence, if a Monday is passed as argument, the function returns the same date. This monday() function is used to define a time window composed of whole weeks (instead of partial weeks). When performing a weekly aggregation, only whole weeks should be considered, otherwise, the first and the last data points might display puzzlingly low values, which merely reflect that only a partial week was used. ## Filtering tables by date Envision offers extensive support to filter tables against temporal conditions. The script we just saw filters all the lines that happen to be older than 1 year. Let’s have a closer look at the relevant piece of script. oend = max(Orders.Date) when date > oend - 365 Items.LastYearQty = sum(Orders.Quantity)  Envision internally treats dates as the integer number of days since January 1st, 2001. This offers numerous advantages including the possibility to perform arithmetic operations on dates. For example, subtracting 7 from a date gives the previous week, while subtracting 365 gives an (approximate) last year date. The date keyword also comes with a special behavior. Implicitly, the date variable refers to all the tables that contain a Date column. As there is only one table involved here, namely the Orders table, it is as if we had written Orders.Date > oend - 365 instead. However, the date syntax is not only more concise, but it also applies to all the relevant tables at once. If instead of looking for the quantities sold, we would have been interested in the purchased quantities, this could have been written with: // upload of PurchaseOrders read "/sample/Lokad_PurchaseOrders.tsv" as PO expect [Id, Date] with Id : text Date : date Quantity : number // snipped when date > oend - 365 Items.LastYearQty = sum(PO.Quantity)  ## Filtering items Beyond filtering lines by date, Envision also offers the possibility of filtering the items, which is another frequent need whenever one seeks to exclude certain products, locations, categories, etc. The script we saw at the beginning of this section illustrates how the scope can be restricted to items that are most qualified to be referred to as dead inventory. The relevant lines are specified below.  where StockOnHand + StockOnOrder > LastYearQty show table "Overstocked items, +1 year of stock" a2f3 tomato with Id Items.Name Items.StockOnHand + Items.StockOnOrder as "Stock" Items.LastYearQty  The where keyword is followed by a condition. The condition applies to three vectors that implicitly belong to the Items table. Indeed, this is the only table that can be referred to without having the table name as a prefix in the variable name. For example, one needs to use Orders.NetAmount and not just NetAmount when referring to the lines in the orders table Orders. Here, the condition can be read as: include only the items where the sum of the stock-on-hand plus the stock-on-order is greater than the number of units sold last year. Once a condition is defined for the Items table, every table that includes an Id column – that is, an identifier referring to the items in line with the Envision conventions – gets filtered similarly. This behavior is identical to temporal filtering introduced previously. As one begins to filter items, it does not make sense to keep sales order lines or purchase order lines that are no longer attached to any item, as these items have been filtered out of the scope. Filtering items impacts the definition scope of newly computed vectors. Let’s illustrate this with a tiny piece of script. where Items.StockOnHand > 5 Items.GreaterThanFive = "yes" show table "Hello" with Items.Name Items.GreaterThanFive // CORRECT! // from this line, we are outside the filter block show table "Hello" with Items.Name Items.GreaterThanFive // WRONG!  Line 10 is incorrect because the GreaterThanFive vector has only been defined for the lines where the StockOnHand > 5 condition was true. Thus, while this vector is correctly defined within the block, and thus can be used as illustrated by line 5, this vector cannot be used outside the filter block, because some its values would be left undefined. This situation can be fixed by making sure that the vector is properly defined for all its item values, as illustrated below. Items.GreaterThanFive = "no" where Items.StockOnHand > 5 Items.GreaterThanFive = "yes" show table "Hello" with Items.Name Items.GreaterThanFive // CORRECT! // from this line, we are outside the filter block show table "Hello" with Items.Name Items.GreaterThanFive // CORRECT! //equivalent to show table "Hello" with Items.Name Items.StockOnHand > 5 ? "yes" : "no"  This snippet begins on line 1 with a proper definition of the vector GreaterThanFive across all items. This definition is revised on line 4 for a subset of the items. However, this revision does not change the fact that the GreaterThanFive vector is explicitly defined for all items, and as a result, the display on line 12 is now correct. ## Filtering arbitrary tables While filtering dates or items is very useful, sometimes you need to filter one specific table precisely. Such a task can also be accomplished with the where keyword. Let’s review the lines that illustrate this capability of Envision.  where Orders.NetAmount > 1000 show table "Large transactions over$1000" a4f5 tomato with
Id
Items.Name
Orders.Date
Orders.Quantity
Orders.NetAmount
Orders.Client


Here, we are filtering the Orders table to exclude all the table lines that are lower than \$1000. The lines that are not filtered out are displayed through show table on lines 2 and 3. This example illustrates how a single table can be filtered. This filter only impacts the Orders table and all the other tables are left untouched by such a condition.

If a vector associated to the Orders table is calculated within a filter block, then the access to this vector is restricted to the block itself. We have already observed this behavior for the items, now let’s take a look at how it applies to arbitrary tables as well.

where Orders.NetAmount > 1000
Orders.LargeTxn = "yes"
show table "Large transactions" with
Items.Name
Orders.LargeTxn // CORRECT!
// last block line
// indentation decreased, we are outside the block
show table "Large transactions" with
Items.Name
Orders.LargeTxn // WRONG!


Since the Orders.LargeTxn vector is not defined for all the lines of the Orders table, only line 5 is correct, while line 10 is incorrect. Like we did for the previous example, this script can be fixed by properly defining a LargeTxn value for the entire Orders table. This can be achieved with the script below.

Orders.LargeTxn = "no"
where Orders.NetAmount > 1000
Orders.LargeTxn = "yes"
show table "Large transactions" with
Items.Name
Orders.LargeTxn // CORRECT!
// last block line
// indentation decreased, we are outside the block
show table "Large transactions" with
Items.Name
Orders.LargeTxn // CORRECT!


As a rule of thumb, Envision tries to allow block “leaks” as much as possible: a vector calculated within a block can be used outside the block, as long as this usage does not violate the rule that all the vector values are explicitly defined when the vector appears on the right side of an assignment.

## Syntactic sugars for filtering

The syntax pattern used by Envision filter-and-indent for filtering is concise and readable, but when multiple filters are involved, the indentation may become a bit hard to decipher. Thus, Envision provides a couple of syntactic sugars, that is, alternative syntaxes which happen to require fewer indentation. This syntax is reviewed in this section.

### Skipping indentations when multiple filters are used

Envision only requires 1 extra level of indentation per filter if filters are intended to be used separately. If only the most inner scope is of interest, then a single indentation level is needed, as illustrated by:

// each 'where' filter brings
// its own indentation level
where Orders.Quantity > 10
where Items.StockOnHand < 100
show table "Filtered orders" with
Orders.Quantity

// but when multiple filters are used,
// a single indentation is required
where Orders.Quantity > 10
where Items.StockOnHand < 100 // no indent here!
show table "Filtered orders" with Orders.Quantity


The second block has the same semantics as the first one, but it requires only a single indentation. More generally, the syntax is:

where A
when B
where C
show table "Filtered by A, B and C" with X
// same as
where A
when B
where C
show table "Filtered by A, B and C" with X


### No-indent with the keyword keep

A frequent coding pattern consists of introducing filters at the very top of the script, in order to restrict the data analysis to a specific scope. While the filter syntax of Envision works relatively well for this scenario, the entire Envision script ends-up written with one or two level of indentations. The keyword keep offers a way to remove these indentations:

// the 'where' filter introduces an indented block
where Orders.Quantity > 10
// beginning of block
show scalar "Inside the filter" with
sum(Orders.Quantity)
// end of block
show scalar "Outside the filter" with
sum(Orders.Quantity)

// but when 'keep' is used,
// the filter applies without indentation
keep where Orders.Quantity > 10
show scalar "Inside the filter" with
sum(Orders.Quantity)


The keyword keep should be placed before where or when, and indicates that the filter takes effect without an indentation. The filter keeps being active until the end of the scope.

where A
keep where B
show table "Filtered by A and B" with X
// end of A and B filters
show table "Not filtered" with X


Thus, if the keep is placed on a script line without any indentation, then the filter applies until the very end of the Envision script.

### Suffixed inline filters

So far, all the filters that we have observed were written as filter blocks. However, Envision also provides an alternative syntax, more compact, which is known as condition suffixes. Let’s go back to the calculation of the quantities sold last year.

when date > oend - 365
Items.LastYearQty = sum(Orders.Quantity)


This script can be rewritten as:

Items.LastYearQty = sum(Orders.Quantity) when date > oend - 365


For readers familiar with relational databases, this syntax might appear to be somewhat closer to the way the where conditions are written in SQL. Within Envision, this syntax is primarily a syntactic sugar to avoid introducing 1-line blocks when there is only a single statement to be written within the block. Both where and when can be “suffixed” on the rightmost side of an assignment.

The aggregators can be filtered inline with when or where conditions. Envision also offers the possibility to add an else modifier on the condition. For example, it is incorrect to write:

oend = max(Orders.Date)
lastDay = monday(oend)
Week.sold = sum(Orders.NetAmount) when date < lastDay
show linechart "Sold by week" with
Week.sold // WRONG


Because Week.sold is not defined on the entire scope, as it has been filtered at the line above. Yet, by adding an else option, we are correctly defining Week.sold everywhere, which can then be displayed:

oend = max(Orders.Date)
lastDay = monday(oend)
Week.sold = sum(Orders.NetAmount) when date < lastDay else 0
show linechart "Sold by week" with
Week.sold // CORRECT