Data Modeling in Envision

This page gathers the core data-modeling tutorials in one place so you can move from raw files to clean, reusable tables. If you are starting here, open the 1-echelon 2017 dataset session; otherwise keep your current session with the dataset loaded.

Table of contents

Read TSV and CSV Files into Tables

Paste the script, run it, and look for a quick proof that both TSV and CSV-style inputs are in memory.

  1. Keep the default /sample folder in your session.
  2. Run the script.
  3. You should now see a summary tile with row counts plus a table showing the last five order lines.
read "/sample/Lokad_Items.tsv" as Items[id] with
  "Id" as id : text
  Name : text
  Category : text
  SellPrice : number
  SellCurrency : text

read "/sample/Lokad_Orders.tsv.gz" as Orders expect [id, date] with
  "Id" as id : text
  "Date" as date : date
  Quantity : number
  NetAmount : number
  Currency : text

itemsCount = count(Items.*)
Orders.Row = rank() scan [Orders.date desc]

show summary "Loaded files" with
  itemsCount as "Items"
  count(Orders.*) as "Order lines"
  min(date) as "Earliest order date"
  max(date) as "Latest order date"

where Orders.Row <= 5
  show table "Last 5 orders" with
    Orders.date
    Orders.id
    Orders.Quantity
    Orders.NetAmount { unit: "#(Orders.Currency)" }

Model Tables with Primary Dimensions

This script sets primary dimensions on both tables and gives you quick checks that the keys are behaving as expected.

  1. Keep the default /sample folder.
  2. Run the script.
  3. You should now see a summary confirming counts and a sample of the keyed order lines.
read "/sample/Lokad_Items.tsv" as Items[id] with
  "Id" as id : text
  Name : text
  Category : text

read "/sample/Lokad_Orders.tsv.gz" as Orders expect [id, date] with
  "Id" as id : text
  "Date" as date : date
  Quantity : number
  NetAmount : number
  Currency : text

Orders.Row = rank() scan [Orders.date desc]

itemsCount = count(Items.*)
ordersCount = count(Orders.*)

show summary "Primary dimensions ready" with
  itemsCount as "Items (id)"
  ordersCount as "Order lines (id, date)"

where Orders.Row <= 5
  show table "Sample keyed orders" with
    Orders.date
    Orders.id
    Orders.Quantity
    Orders.NetAmount { unit:"#(Orders.Currency)" }

If the counts look right, you now have primary dimensions set for both tables and can move on to combining them safely.

Join Orders to Items Without Writing a SQL JOIN

Orders already carry id as a primary dimension, so you can pull item attributes directly without an explicit join.

  1. Run the script.
  2. You should now see a table where each order line carries the item name and revenue.
  3. The summary tile confirms the total revenue aggregates across both tables.
read "/sample/Lokad_Items.tsv" as Items[id] with
  "Id" as id : text
  Name : text
  SellPrice : number
  SellCurrency : text

read "/sample/Lokad_Orders.tsv.gz" as Orders expect [id, date] with
  "Id" as id : text
  "Date" as date : date
  Quantity : number
  Currency : text

Orders.ItemName = Items.Name          // Pulls from Items via shared id
Orders.UnitPrice = Items.SellPrice
Orders.Revenue = Orders.Quantity * Items.SellPrice
Orders.Row = rank() scan [Orders.date desc]

show summary "Joined revenue" with
  sum(Orders.Revenue) as "Total revenue" { unit:"#(Orders.Currency)" }

where Orders.Row <= 6
  show table "Orders with item attributes" with
    Orders.date
    Orders.id
    Orders.ItemName
    Orders.Quantity
    Orders.UnitPrice { unit:"#(Orders.SellCurrency)" }
    Orders.Revenue { unit:"#(Orders.Currency)" }

If you can see names and prices without any SQL, the dimension-driven join is working.

Filter Orders to a Time Window

Work from the sample dataset and keep the orders table, but slice it to a fixed window so downstream aggregations stay focused.

  1. Run the script.
  2. You should now see a summary tile with the total order lines versus the filtered slice.
  3. The table shows only orders inside the chosen window.
read "/sample/Lokad_Items.tsv" as Items[id] with
  "Id" as id : text
  Name : text

read "/sample/Lokad_Orders.tsv.gz" as Orders expect [id, date] with
  "Id" as id : text
  "Date" as date : date
  Quantity : number
  NetAmount : number
  Currency : text

startDate = date(2017, 7, 1)
endDate = date(2017, 12, 31)

Orders.InWindow = Orders.date >= startDate and Orders.date <= endDate
Orders.Row = rank() scan [Orders.date desc]

ordersTotal = count(Orders.*)
ordersInWindow = sum(if Orders.InWindow then 1 else 0)

show summary "Windowed orders" with
  ordersTotal as "Total lines"
  ordersInWindow as "Lines in window"

where Orders.InWindow and Orders.Row <= 8
  show table "Orders between July and Dec 2017" with
    Orders.date
    Orders.id
    Orders.Quantity
    Orders.NetAmount { unit:"#(Orders.Currency)" }

Change startDate and endDate to practice slicing different periods and watch the counts adjust.

Aggregate Orders into Weekly and Monthly Buckets

Aggregate demand from daily orders into weekly and monthly buckets. The expect [date] statement creates the Day, Week, and Month tables.

  1. Run the script.
  2. You should now see weekly and monthly tables with total units and revenue.
read "/sample/Lokad_Orders.tsv.gz" as Orders expect [date] with
  "Date" as date : date
  Quantity : number
  NetAmount : number
  Currency : text

Day.Units = sum(Orders.Quantity)
Day.Revenue = sum(Orders.NetAmount)

Week.Units = sum(Day.Units)
Week.Revenue = sum(Day.Revenue)

Month.Units = sum(Day.Units)
Month.Revenue = sum(Day.Revenue)

show table "Weekly totals" with
  Week.week
  Week.Units
  Week.Revenue { unit:"$" }

show table "Monthly totals" with
  Month.month
  Month.Units
  Month.Revenue { unit:"$" }

Build a Top-N Ranking Table (Top Sellers)

Rank items by revenue to get a compact top sellers list.

  1. Run the script.
  2. You should now see the top 10 SKUs ranked by revenue.
read "/sample/Lokad_Items.tsv" as Items[id] with
  "Id" as id : text
  Name : text

read "/sample/Lokad_Orders.tsv.gz" as Orders expect [id, date] with
  "Id" as id : text
  "Date" as date : date
  Quantity : number
  NetAmount : number

table SkuStats[sku] = by Orders.id
SkuStats.Name = Items.Name
SkuStats.Revenue = sum(Orders.NetAmount)
SkuStats.Units = sum(Orders.Quantity)
SkuStats.Rank = rank() sort -SkuStats.Revenue

where SkuStats.Rank <= 10
  show table "Top sellers" with
    SkuStats.Rank
    SkuStats.sku
    SkuStats.Name
    SkuStats.Units
    SkuStats.Revenue
    order by SkuStats.Rank

Build a Two-Axis KPI Table with a Cross Table

Build a small KPI matrix that compares two dimensions side by side.

  1. Run the script.
  2. You should now see a cross table of KPIs by channel and region.
table Channels[channel] = with
  [| as channel |]
  [| "Online" |]
  [| "Store" |]

table Regions[reg] = with
  [| as reg |]
  [| "North" |]
  [| "South" |]

table Orders = cross(Channels, Regions)
Orders.Units = random.poisson(40 into Orders)
Orders.Revenue = Orders.Units * random.uniform(20 into Orders, 40)

show table "KPI matrix" with
  Channels.channel
  Regions.reg
  Orders.Units
  Orders.Revenue { unit:"$" }
  order by [Channels.channel, Regions.reg]

Create Derived Tables with Table Comprehensions

Create a derived table so you can reuse a filtered or enriched dataset.

  1. Run the script.
  2. You should now see a derived table with only premium items.
table Items = with
  [| as Sku, as Category, as Price |]
  [| "A-100", "Core", 12.5 |]
  [| "B-200", "Core", 18.0 |]
  [| "C-300", "Premium", 35.0 |]
  [| "D-400", "Premium", 42.0 |]

table Premium = extend.range(if Items.Category == "Premium" then 1 else 0)
Premium.Sku = Items.Sku
Premium.Price = Items.Price

show table "Premium items" with
  Premium.Sku
  Premium.Price { unit:"$" }

You now have a small derived table that can power downstream dashboards without re-filtering.

User Contributed Notes
0 notes + add a note