read

Table of contents

read, table creation

The keyword read offers versatile mechanisms to read data from files.

table Products = with
  [| as Product, as Color, as Price |]
  [| "shirt", "white,grey", 10.50 |]
  [| "pants", "blue", 15.00 |]
  [| "hat", "red", 5.25 |]

write Products as "/sample/products.csv" with
  Product = Products.Product
  Color = Products.Color
  Price = Products.Price

Then,

read "/sample/products.csv" as Products with
  Product : text
  Color : text
  Price : number

show table "My Products" with
  Products.Product
  Products.Color
  Products.Price

A single script is limited to 5000 input files.

All the read blocks must appear at the top of the Envision script, above any statement that cannot be processed at compile-time.

// only compile-time constants can be placed above 'read'
const path = "/mytable.csv" 

read (path) as T with // parenthesis are needed here
  ColumnA : text
  ColumnB : number

The order of fields inside a read block does not matter, and fields can be omitted if they are not needed by the script.

Table size constraints

max, min, and small can appear after the table name to enforce size limits for the loaded table. These are separate from the file pattern filters below. When no explicit max is provided, read uses a default maximum of 1m lines. This default can be overridden on the read statement or on the matching path schema.

read "/sample/products.csv" as Products max 100 with
  Product : text

Calendar auto-unfiltering

When a column is declared as date and expect [date] is used, Envision auto-creates the Day, Week, and Month tables and unfilters the date dimension over the observed range.

read "/sample/orders.csv" as Orders expect [date] with
  "OrderDate" as date : date
  Quantity : number

Day.Qty = sum(Orders.Quantity)
show linechart "Daily Orders" with Day.Qty

If multiple tables are read with date, the unfiltered range spans the earliest to latest dates across all those tables.

File patterns and filters

Wildcards can read multiple files into one table:

read "/sample/products-*.csv" as Products with
  Product : text
  Price : number

The max, min, and latest filters restrict the captured files:

read max 2 "/sample/products-*.csv" as Products with
  Product : text
  Price : number

max and min sort by file name, while latest sorts by last update time. The count defaults to 1 when omitted. The wildcard * matches any character except /, and multiple wildcards are allowed in the same pattern.

Pattern unions and column inference

Multiple read statements can target the same table. If the formats match, later read blocks can omit columns and rely on inference.

read "/sample/products-1.csv" as Products with
  Product : text
  Price : number

read "/sample/products-*.csv" as Products

Column inference is the only ordering-sensitive aspect of read statements. If a file is captured multiple times, its content is loaded multiple times.

Pattern expressions

The file pattern can be a compile-time constant expression:

const myFolder = "/sample"
const myPath = "\{myFolder}/products.csv"

read (myPath) as Products with
  Product : text
  Price : number

Parameterized paths

Parameterized paths require a matching path schema. The expression inside \{...} must be a compile-time constant (literal, const, list, or bounds). Use .. to match any value for the parameter.

schema '/clean/Orders\{OrderMonth}.ion' with
  OrderMonth : month
  Id : text

const m = month(2023, 7)
read '/clean/Orders\{m}.ion' as Orders
read '/clean/Orders\{..}.ion' as Orders

Constant column

A constant column can be introduced through an assignment at the column level within the read block:

read "/sample/products.csv" as Products with
  Product : text
  Color : text
  Price : number
  VAT : number = 0.2 // constant column

show table "My Products" with
  Products.Product
  Products.Color
  Products.Price
  Products.VAT

Renamed column

A column can be renamed through an assignment and the use of the read function at the column level within the read block:

read "/sample/products.csv" as Products with
  Product : text
  Colour : text = read("Color")
  Price : number

show table "My Products (UK)" with
  Products.Product
  Products.Colour
  Products.Price

The function read is intended to cope with raw column names that are not valid Envision variable names.

Reading by column position is also possible but brittle:

read "/sample/products.csv" as Products with
  Product : text = read(1)
  Price : number = read(2)

Read aliasing

A read alias introduces the same column under a new name, typically to bind primary dimensions.

read "/sample/products.csv" as Products[id] with
  Product : text
  Color : text
  Price : number
  id = Products.Product

Prefer a post-read assignment unless the alias is needed for dimensions or schemas.

split column-level option

The option split splits the content of a text cell according to the specified delimiter, resulting in multiple lines being for each value obtained by the split.

A B
1 X,Y
2 Z

With a split:"," on B gives:

A B
1 X
1 Y
2 Z

The script below creates a TSV file with 2 lines:

table T = with
  [| as A, as B  |]
  [| 1,    "X,Y" |]
  [| 2,    "Z"   |]

write T as "/sample/split.tsv" with
  A = T.A
  B = T.B

And, the script below re-reads the TSV file generating a table with 3 lines:

read "/sample/split.tsv" as T with
  A : number
  split:"," as B : text

show table "T" with T.A, T.B

The option split can only appear once within a read block.

Transpose column-level option

When multiple values are passed to the read function within a read block, it tranposes the corresponding columns - as found in the file - in the same number of lines. The columns that are not transposed have their values repeated. For example, by transposing (B1, B2, B3) in:

A B1 B2 B3
X X1 X2 X3
Y Y1 Y2 Y3

We obtain:

A B
X X1
X X2
X X3
Y Y1
Y Y2
Y Y3

The following script creates a TSV file with 2 lines:

table T = with
  [| as A, as B1, as B2, as B3 |]
  [| "x",  "x1",  "x2",  "x3"  |]
  [| "y",  "y1",  "y2",  "y3"  |]

write T as "/sample/transpose.tsv" with
  A = T.A
  B1 = T.B1
  B2 = T.B2
  B3 = T.B3

And, the script below re-reads the TSV file generating a table with 6 lines:

read "/sample/transpose.tsv" as T with
  A : text
  B : text = read("B1", "B2", "B3")

show table "T" with T.A, T.B

Discarded table

Prefixing the table name with an underscore (_) means that the table is discarded. However, the captured files are accessible from the Files built-in table.

read "/sample/*.csv" as _Samples

show table "Files" with 
  Files.Alias // '_Samples'
  Files.Path

The table marked as discarded cannot be used in the script, and thus, this table does not trigger a not used error at compile time when it is, effectively, not used in the script.

Using a stand-alone discard symbol (_) is also possible:

read "/sample/*.csv" as _

show table "Files" with 
  Files.Alias // '_'
  Files.Path

This feature is typically intended for scripts that are analyzing the presence or absence of files, ignoring of their content.

Read options and formats

Table-level read options control parsing. Values must be literals.

read "/sample/products.csv"
    fileformat:"csv" quotes:false skip:0 separator:"," as Products with
  Product : text
  Price : number

Options include fileformat, skip, encoding, headers, quotes, separator, spreadsheet, date, and number. fileformat overrides extension-based format detection.

When overriding headers, pair headers with skip:1 to ignore the original header line.

read "/manualfc" headers:"Id,M1,M2" skip:1 as ManualFC with
  "Id" as Id : text
  Values : number = read("M1", "M2")

For .xlsx files, the spreadsheet option selects a sheet by name (case-insensitive). When omitted, the first sheet is read. An unknown sheet name produces a runtime error.

read "/sample/items.xlsx" spreadsheet:"Sheet2" as Items with
  Id : number
  Label : text

File formats

Formats are detected by file extension. Supported formats include: .csv, .tsv, .txt, .csv.gz, .tsv.gz, .xlsx, .ion, and .zip. ZIP archives must contain exactly one file.

For flat text files, Envision auto-detects separators and number/date formats unless overridden. Header-free files are supported but fragile; prefer headers.

Accepted encoding values:

Date formats use tokens like d, dd, M, MM, MMM, MMMM, yy, yyyy. Number formats use 0 and separators, for example "0,0" or "0.0".

Sanitizing input values

When input values are inconsistent, read them as text and parse in-script:

read "/sample/products.csv" as Products with
  Product : text
  SizeRaw : text = read("Size")

Products.IsSizeDefined, Products.Size =
  tryParseNumber(Products.SizeRaw, "", ",")

Filesystem atomicity

Reads observe a snapshot of the filesystem. Writes happen at script end and are not visible during the same run. Two write blocks cannot target the same path in a single script.

Datatype read/write support

The following table lists the database that can be read or written, depending on the intended file format.

Type TSV read/write Ionic read/write
boolean yes yes
date yes yes
embedding no yes
markdown no no
month yes yes
number yes yes
ranvar no yes
text yes yes
week yes yes
text yes yes
zedfunc no yes

Only .ion supports ranvar, zedfunc, and embedding.

User Contributed Notes
0 notes + add a note