List of virtual tables
The “virtual” tables are generated by the Envision runtime itself, instead of reflecting tabular input files. Here, we list the virtual tables supported by Envision.
Table of contents
Calendar tables
In the order to facilitate time-series analysis, Envision provides several calendar tables:
Day
: per dayWeek
: per weekMonth
: per month
These tables facilitate calendar-based aggregations. Below, an illustrating script for a weekly aggregation:
read "/sample/Lokad_Items.tsv" as Items[Id] with
Id : text
Name : text
read "/sample/Lokad_Orders.tsv" as Orders expect [Id, Date] with
Id : text
Date : date
NetAmount : number
Week.sold = sum(Orders.NetAmount)
show linechart "Weekly sales" a1f4 {unit: "$"} with Week.sold
These tables exist in two versions, the scalar version of type [Date]
(as illustrated in the script above), and the vector version of type [Id, Date]
.
These tables are all populated with the same date range that matches the full range of the input data, as defined by the read
statements that specify a Date
column. If a filter when
is applied, these tables are filtered accordingly.
Files table
The Files
table contains the list of all the files that have been captured by the read
statements in the Envision script. This table is intended to support the design data integrity checks where, for example, files are tested against conditions related to their expected sizes.
show table "My Files" with
Files.Path
Files.ModifiedDate
Files.ModifiedHour
Files.ModifiedMinute
Files.Alias
Files.Bytes
Files.Success
Files.RawLines
Files.BadLines
Files.BadDates
Files.BadNumbers
Files.MissingValues
The fields are defined as follow:
Files.Path
(text): the original path of the fileFiles.ModifiedDate
(date): the “last modified” date of the fileFiles.ModifiedHour
(number): the “last modified” hour of the file, in the UTC+00 time zone.Files.ModifiedMinute
(number): the “last modified” minute of the fileFiles.Alias
(text): the namespace associated with the fileFiles.Bytes
(number): the original file size, in bytesFiles.Success
(Boolean): whether the file was successfully loadedFiles.RawLines
(number): the number of lines in the file, including those that were dropped (e.g. missingid
ordate
values)Files.BadLines
(number): the number of lines dropped - soRawLines - BadLines
is the size of the actual file processedFiles.BadDates
(number): the number of bad date errorsFiles.BadNumbers
(number): the number of bad number errorsFiles.MissingValues
(number): the number of missing value errors
Troubleshooting incorrect data
The Files.Path
field can be used to pinpoint issues in files, when a table is read from multiple underlying files using the wildcard *
in the read
statement. Envision preserves the relationship between every single line being read in any table and its originating file. The following script illustrates this capability:
read "/foo/Orders*.tsv" as MyOrders with
Id : text
Date : date
Quantity : number
// The following line is implicit for every table.
// MyOrders.Path = Files.Path
where MyOrders.Quantity < 0
show table "Files with negative order quantities" with
MyOrders.Path
group by MyOrders.Path
Files.file
is a primary dimension, whereas, e.g., Orders.file
is a foreign dimension. This relationship makes a broadcasting from Files
to Orders.file
and allows to perform a direct assigment, i.e.:
MyOrders.ModifiedDate = Files.ModifiedDate
As a syntactic sugar, the column Path
is special-cased to be accessible without an explicit assigment MyOrders.Path = Files.Path
.
More generally, any filter identifying data issues can be turned into a list of files with concerns; facilitating data pipeline troubleshooting.
Extracting data from file names
The Files.Path
field allows to extract data from the filenames. This approch also relies on the relationship between the Files
table and all the other tables originating from Files
(see previous sub-section).
Considering a collection of stock level snapshots where each file contains the stock position at the beginning of the day. The following script consolidates a unified view of the historical data into a single table:
read "/foo/Snaphot_Stocks*.tsv" as MyStocks with
Id : text
StockOnHand : number
// Ad-hoc extraction of the date from the file name
Files.SnapshotDate = parsedate(replace(field.r(Files.Path,"_",0),".tsv",""),"yyyy-MM-dd")
// 'MyStocks' is an inflation table with respect to 'Files'
MyStocks.SnapshotDate = Files.SnapshotDate