List of virtual tables

The “virtual” tables are generated by the Envision runtime itself, instead of reflecting tabular input files. Here, we list the virtual tables supported by Envision.

Table of contents

Calendar tables

In the order to facilitate time-series analysis, Envision provides several calendar tables:

These tables facilitate calendar-based aggregations. Below, an illustrating script for a weekly aggregation:

read "/sample/Lokad_Items.tsv" as Items[Id] with
  Id : text
  Name : text
read "/sample/Lokad_Orders.tsv" as Orders expect [Id, Date] with
  Id : text
  Date : date
  NetAmount : number

Week.sold = sum(Orders.NetAmount)
show linechart "Weekly sales" a1f4 {unit: "$"} with Week.sold

These tables exist in two versions, the scalar version of type [Date] (as illustrated in the script above), and the vector version of type [Id, Date].

These tables are all populated with the same date range that matches the full range of the input data, as defined by the read statements that specify a Date column. If a filter when is applied, these tables are filtered accordingly.

Files table

The Files table contains the list of all the files that have been captured by the read statements in the Envision script. This table is intended to support the design data integrity checks where, for example, files are tested against conditions related to their expected sizes.

show table "My Files" with
  Files.Path
  Files.ModifiedDate
  Files.ModifiedHour
  Files.ModifiedMinute
  Files.Alias
  Files.Bytes
  Files.Success
  Files.RawLines
  Files.BadLines
  Files.BadDates
  Files.BadNumbers
  Files.MissingValues

The fields are defined as follow:

Troubleshooting incorrect data

The Files.Path field can be used to pinpoint issues in files, when a table is read from multiple underlying files using the wildcard * in the read statement. Envision preserves the relationship between every single line being read in any table and its originating file. The following script illustrates this capability:

read "/foo/Orders*.tsv" as MyOrders with
  Id : text
  Date : date
  Quantity : number

// The following line is implicit for every table.
// MyOrders.Path = Files.Path

where MyOrders.Quantity < 0
  show table "Files with negative order quantities" with
      MyOrders.Path
      group by MyOrders.Path

Files.file is a primary dimension, whereas, e.g., Orders.file is a foreign dimension. This relationship makes a broadcasting from Files to Orders.file and allows to perform a direct assigment, i.e.:

MyOrders.ModifiedDate = Files.ModifiedDate

As a syntactic sugar, the column Path is special-cased to be accessible without an explicit assigment MyOrders.Path = Files.Path.

More generally, any filter identifying data issues can be turned into a list of files with concerns; facilitating data pipeline troubleshooting.

Extracting data from file names

The Files.Path field allows to extract data from the filenames. This approch also relies on the relationship between the Files table and all the other tables originating from Files (see previous sub-section).

Considering a collection of stock level snapshots where each file contains the stock position at the beginning of the day. The following script consolidates a unified view of the historical data into a single table:

read "/foo/Snaphot_Stocks*.tsv" as MyStocks with
  Id : text
  StockOnHand  : number

// Ad-hoc extraction of the date from the file name
Files.SnapshotDate = parsedate(replace(field.r(Files.Path,"_",0),".tsv",""),"yyyy-MM-dd")

// 'MyStocks' is an inflation table with respect to 'Files'
MyStocks.SnapshotDate = Files.SnapshotDate