Most business operations of any given commerce can be accurately represented with time-series: historical sales, historical purchase orders, historical prices, etc. Since working with time-series is of primary importance for companies, Envision provides extensive native support for this type of scenario. In particular, Envision allows to aggregate data per day, week and month. Envision also supports more complex time-series analysis such as lagging data, or calculating moving averages. These features are illustrated and documented in this page.
An illustrating example
Once more, we use our sample dataset to illustrate the capabilities of Envision. As this section represents a slightly more advanced usage of Envision, we suggest reading Making calculations and Aggregating data before proceeding to this section.
read "/sample/Lokad_Items.tsv" as Items[Id] with Id : text read "/sample/Lokad_Orders.tsv" as Orders expect [Id, Date] with Id : text Date : date Quantity : number NetAmount : number read "/sample/Lokad_PurchaseOrders.tsv" as PO expect [Id, Date] with Id : text Date : date Quantity : number NetAmount : number show label "Time-series calculations" a1f1 tomato oend = max(Orders.date) lastMonday = monday(oend) Week.sold = sum(Orders.NetAmount) when date >= oend - 52 * 7 show linechart "Weekly sales" a2f4 tomato unit:"$" with Week.sold as "This year" Week.sold[-52] as "Last year" Week.ma = sum(Orders.NetAmount / 4) over [-3 .. 0] when date >= oend - 52 * 7 show linechart "Weekly sales with 4 weeks moving average" a5f7 tomato unit:"$" with Week.ma as "This year" Week.ma[-52] as "Last year" Day.cashFlow = sum(Orders.NetAmount) - sum(PO.NetAmount) Day.balance = avg(Day.cashFlow) over [-13 .. 0] when date >= lastMonday - 6 * 7 and date < lastMonday show linechart "Cash flow over the last 6 weeks" a8f10 tomato unit:"$" with Day.balance as "Balance" PO.Price = PO.NetAmount / PO.Quantity Orders.PurchasePrice = latest(PO.Price) Orders.Cost = Orders.PurchasePrice * Orders.Quantity Orders.Profit = Orders.NetAmount - Orders.Cost Week.profitability = sum(Orders.Profit) / sum(Orders.Cost) default 1 when date >= lastMonday - 13 * 7 and date < lastMonday show linechart "Profitability over the last 13 weeks" a11f13 tomato unit:"%" with Week.profitability
After this script is run with the sample dataset, the following dashboard is obtained.
Virtual calendar tables
Since daily, weekly and monthly aggregations are ubiquitous, Envision was designed to natively support these periodic calendar patterns. More specifically, Envision benefits from three virtual tables named
Month respectively. These tables are referred to as “virtual” because they do not have tabular file counterparts, or in other words, these tables only exist for the duration of the script’s execution. The script above leverages these virtual tables in order to display the line charts. Let’s review the relevant lines re-copied below.
oend = max(Orders.date) lastMonday = monday(oend) Week.sold = sum(Orders.NetAmount) when date >= oend - 52 * 7 show linechart "Weekly sales" a2f4 tomato unit:"$" with Week.sold as "This year" Week.sold[-52] as "Last year"
At line 4, the content of the
Orders table is summed into the
Week table. There is a single value computed per week. At line 6, a filter is defined to exclude data that is more than 52 weeks old. Finally at lines 7 to 9, a line chart with two time-series displayed is defined. The second time-series
Week.sold[-52] comes with a lag operator, which is reviewed in the next section. This script can be easily modified for a daily or a monthly aggregation instead. For example, it is possible to add the following lines at the very end of the script:
Day.sold = sum(Orders.NetAmount) show linechart "Daily sales" a14f16 tomato unit:"$" with Day.sold Month.sold = sum(Orders.NetAmount) show linechart "Monthly sales" a17f19 tomato unit:"$" with Month.sold
This block of code displays two more line charts, featuring time-series aggregated at the daily and monthly levels respectively. The variable
Day.sold can be interpreted as the content of the column named
sold within the
Day table, but it can also be interpreted as an equally spaced time-series of the “1 day” period – as opposed to the regular Envision tables that contain a
Date column such as
Orders and that can be interpreted as unevenly spaced time-series.
Displaying lagged time-series
In time-series analysis, “lagging” refers to the operation which consists of moving time-wise the elements of the time-series. One of the most elementary purposes of time-series lagging consists of comparing two distinct periods of time. Envision supports a lag operator precisely targeted at dealing with such situations. Let’s revisit the previous code snippet.
Week.sold = sum(Orders.NetAmount) when date >= oend - 52 * 7 show linechart "Weekly sales" a2f4 tomato unit:"$" with Week.sold as "This year" Week.sold[-52] as "Last year"
We have two time-series defined at lines 4 and 5. The first time-series
Week.sold is the original weekly aggregated sales. The second series comes with an extra suffix
[-52]. This suffix is the lag operator itself. Please remark that the argument must be a literal integer. In this example, the data that goes back 52 weeks in the past is moved forward and finally displayed in the line chart. When the lag operator is applied to the
Week table, then the lag argument is an integer expressed in weekly units. Similarly, the applicable units are days and months for the
Month tables respectively.
The lag operator benefits from the smart cooperative behavior of the filter operator
when. In fact, without this cooperative behavior, the filter
when would have already excluded all the data older than 52 weeks, and as a result, lagging the time-series of 52 weeks would have resulted in moving zeroes forward. Yet, as illustrated by the screenshot of the dashboard seen previously, the lag operator is correctly moving the actual one-year old data forward, and not merely the zeroes. This behavior is achieved through the built-in cooperation between the
when filter and the lag operator.
If you hover your mouse over the line chart within your dashboard, you should notice that dates and values get displayed. In particular, with the script at the beginning of this section, the dates reported for the time-series named “Last year” are indeed short of one year compared to the dates of the time-series named “This year”. However, it is important to note that the lag operator does not really preserve the original dates of the time-series. Instead, Envision simply uses a convention to make it work: if the lag operator is defined within the declaration of a line chart, then, and only then, the original dates remain preserved.
Let’s modify the script in order to apply the lag operator outside the linechart. This can be done by introducing a variable named
Week.sold = sum(Orders.NetAmount) Week.lastYear = Week.sold[-52] when date >= oend - 52 * 7 show linechart "Weekly sales" a2f4 tomato unit:"$" with Week.sold as "This year" Week.lastYear as "Last year"//date display issue
If you run this modified script, and if you hover your mouse over the time-series dots, you should notice that both time-series are reported with the same dates. In this case, there is no non-ambiguous semantic for the dates to be displayed. For example, the
Week.lastYear time-series could have actually been interpreted as a year-to-year forecast, and as a result, having the same dates for the two time-series would have been what we were actually looking for in this case. In conclusion, if you wish to preserve the original dates in your line chart for side-by-side comparison of lagged time-series, then the lag operator should be defined within the
Aggregating data over a time window
In one of our previous guides, we have reviewed how to aggregate data with Envision. With time-series, there is another type of aggregation that is very desirable: the aggregation over a time window. The script from the beginning of this section illustrates how to compute a moving average of the sales over 4 weeks. The relevant lines of script are re-copied below for the sake of clarity.
Week.ma = sum(Orders.NetAmount / 4) over [-3 .. 0] when date >= oend - 52 * 7 show linechart "Weekly sales with 4 weeks moving average" a5f7 tomato unit:"$" with Week.ma as "This year" Week.ma[-52] as "Last year"
At line 1, an aggregation with the
sum() aggregator is performed, and this aggregation comes with a statement starting with the keyword
over at the end. At line 2, data is filtered, specifically taking the last 52 weeks of data. Finally, at lines 3 to 5, two time-series are displayed within the line chart. Both time-series are “smoothed” as they are averaged over 4 weeks.
over is used to define the applicable time window, and is expected to be written as
[a .. b] where
b are integers with
a being lower than or equal to
b. The unit used for
b depends on the expression on the left side of the assignment. In the present case, we have the
Week table on the left side of the assignment, and as result,
0 are expressed in weeks.
Gotcha: between the week of index -3 and the week of index 0, there are 4 weeks, not 3 weeks. As a result, we have the following weeks of indices -3, -2, -1 and 0.
over can be used with all aggregators. When this option is used, the left side of the assignment is usually a virtual calendar table such as
Month. However, this is not a requirement, and any table can be used as long as it is indexed by a
Date. Also, by definition, when calendar tables are used, using
over [0 .. 0] gives the same results as the default aggregation:
Week.sold = sum(Orders.NetAmount) // same result! Week.same = sum(Orders.NetAmount) over [0 .. 0]
More complex time-series aggregations
The syntax of Envision offers the possibility to perform more elaborate time-series calculations. For example, it is possible to compute time-series and then, to make further calculations based on those initial time-series. The third code block in the script at the top of this page illustrates this. The relevant lines of script are copied below.
Day.cashFlow = sum(Orders.NetAmount) - sum(PO.NetAmount) Day.balance = avg(Day.cashFlow) over [-13 .. 0] when date >= lastMonday - 6 * 7 and date < lastMonday show linechart "Cash flow over the last 6 weeks" a8f10 tomato unit:"$" with Day.balance as "Balance"
At line 1, the time-series
Day.cashFlow is defined as the difference between the total of sales and the total of purchases. At line 2, the time-series
Day.balance is calculated as the moving average over 14 days of
Day.cashFlow. At line 3, a filter with two specific conditions is defined: data should not be more than 7 weeks old – counting from last Monday, and data should not be “fresher” than last Monday. Here the use of the
monday() function ensures that the scope includes exactly 6 whole weeks. Finally, the time-series
Day.balance is displayed as the sole time-series of the line chart defined at lines 4 and 5.
The aggregation taking place at line 2 leverages the
over option that we have detailed in the previous section. Here, the time-series is averaged over a time window of 14 days – from the index -13 to index 0 there are 14 distinct indices. This aggregation is somewhat unlike the previous aggregations we have seen so far, because the table
Day appears both on the left and the right sides of the assignment, while the more usual aggregations that do not have an
over suffix typically aggregate data from one table into another table.
It is also possible to re-aggregate daily time-series into weekly time-series. The script below illustrates how this can be done to calculate a weekly cash flow instead of a daily one.
Day.cashFlow = sum(Orders.NetAmount) - sum(PO.NetAmount) Week.balance = sum(Day.cashFlow / 2) over [-1 .. 0] when date >= lastMonday - 6 * 7 and date < lastMonday show linechart "Cash flow over the last 6 weeks" a8f10 tomato unit:"$" with Week.balance as "Balance"
Dealing with event-driven data
Event-driven data refers to a certain representation of the historical data that focuses on “changes” . For example, instead of having all the historical prices for every single day of the data history, it is much more practical to only collect the list of historical price changes: each price change gives a new price and a date, and it is assumed that the new price remains unchanged until a new price change is observed. Envision supports scenarios where the historical data is represented as a list of changes.
The sample dataset does not include a table with the historical prices, however, it is possible to approximate this data by looking at the past purchase transactions, and assuming that prices stay unchanged until the next transaction is observed. This is exactly what is taking place in the last block of script in the example at the start of this section. Let’s have a closer look at the relevant lines copied below.
PO.Price = PO.NetAmount / PO.Quantity Orders.PurchasePrice = latest(PO.Price) Orders.Cost = Orders.PurchasePrice * Orders.Quantity Orders.Profit = Orders.NetAmount - Orders.Cost Week.profitblty = sum(Orders.Profit) / sum(Orders.Cost) default 1 when date >= lastMon - 13 * 7 and date < lastMon show linechart "Profitability over the last 13 weeks" a11f13 tomato unit:"%" with Week.profitblty
At line 1, the purchase price per unit is computed for every line of the
PO table through a vector operation in the usual Envision style. At line 2, the
latest function is used, and this function has a rather specific behavior: for every line of the
Orders table – the table being assigned to – the
latest function looks up the most recent
PO.Price line available, no more recent that the
Orders line being considered, and copies this value on the left side of the assignment. At lines 3 and 4, we have more calculations involving vectors and aggregations. Finally, at line 5, a filter is defined to restrict the scope of operations to the last 13 whole weeks. Within this filter block, a line chart is displayed, leveraging the
Week.profitability time-series that has been previously calculated at line 4.
latest function is the specific ingredient that supports the calculation being considered here. This function is precisely intended to capture the semantics of an event stream, where a value is assumed to be constant until it is overridden by a new event. In particular, it is also possible to “densify” purchase prices, calculating a price for every single day of the history with:
Day.PurchasePrice = latest(PO.Price)
In practice, the
latest function can be used to deal with many situations such as stock-outs, promotions, product lifecycles, etc.