# List of read options

The read statement in Envision offers the possibility to load one or more files into a table. Below, we gather the different options relating to the read statement.

Table of contents

## Syntax overview

The usual syntax for reading a file in Envision is as follows:

read "/sample/Lokad_Orders.tsv" as Orders expect [Id, Date] with
Id : text
Date : date
"Quantity" as Qty : number


The text that comes just after the read keyword is referred to as the path. The path is intended to match the files that exist in the file storage associated with a given Lokad account. The columns are not required to be explicitly listed after the with keyword. In this case, the columns are available within the Envision script as long as the column names are valid variable names (for example, no spaces allowed in the column name). The primary purpose of the keyword as is to rename both tables and columns.

If the as option that follows the read is omitted, then Envision reads this file as part of the Items table.

It is useful to remark that the file path of a read statement can contain text constants and, as it will be discussed in the following sections, also variables defined in a form tile or through the today() function:

channel = "WebSite"
read "/sample/Lokad_Orders-\{channel}.tsv" as Orders expect [Id, Date] with
Id : text
Date : date
"Quantity" as Qty : number


## Reading multiple files

It is possible to read multiple files at once through the use of the wildcard (*) in the specified path. The wildcard can replace any string of characters, except those containing a /. It may be used in the file name as well as in a folder name, and may include the file format. An empty string would also match the wildcard. Finally, multiple wildcards within the same path are supported. As an example,

read "/sample/Lokad_Orders*" as Orders expect [Id, Date] with
Id : text
Date : date
"Quantity" as Qty : number


could match “/sample/Lokad_Orders-2018-07-25.csv.gz” as well as “/sample/Lokad_Orders.tsv”.

### Filter one file among many

When reading files using a wildcard or another allowed variable in the file path, it is possible to select one or more files, using one of the syntaxes described in the following.

In order to restrict the set of files originally captured by the wildcard to a single file, the read statement supports 3 distinct modifiers:

• min: only the first file within the set is retained, ordering is based on the file names.
• max: only the last file within the set is retained, ordering is based on the file names.
• latest: only the most recently updated file within the set is retained.

In addition, Envision supports the possibility of reading a specified number of files. This can be expressed by adding a literal positive integer just after the wildcard filter (e.g. max 3).

Using the above modifiers, filtering can hence be done as follows:

read max 3 "/sample/Lokad_Orders*.gz" as Orders with
Id : text
Date : date
"Quantity" as Qty : number


In a list of files containing a date in their path, a file or subset of files can be selected everyday using a sliding temporal window defined through the today() function, as in:

todayDate = today(+1) - 30
fileDate = "\{todayDate:yyyy-MM-dd}"
read "/clean/StockHistory-\{fileDate}-*.ion" as Stock


Finally, the file(s) to be read can be selected manually by the user, through a form tile displayed in the dashboard:

show form "Please select a country" with
Form.country as "Country"

read "/sample/Lokad_Orders-\{Form.country}-*.tsv" as Orders expect [Id, Date] with
Id : text
Date : date
"Quantity" as Qty : number


## Reading a single file as several ones

One can read a particular file, e.g., Lokad_Orders.tsv, as two distinct ones, e.g., Orders and NewOrders. Various variables can be read.

read "/sample/Lokad_Orders.tsv" as Orders expect [Id, Date] with
Id : text
Date : date
Quantity : number
NetAmount : number
Client : text
Currency : text
and as NewOrders expect [Id, Date] with
Id : text
Date : date
"Client" as CustomerID : text
"Quantity" as QtyOrdered : number


## Table-level options

Multiple options are supported in relation to the read statement:

• fileformat (text): transforms the unrecognizable file format of a file into the recognizable one. The format can be any string that would be a recognizable extension, e.g., csv.gz, tsv, zip. If present, fileformat overrides the automatic inference based on the file name. The option should be a compile-time constant.
• skip (number): indicates how many lines should be skipped at the beginning of the file while the file is being read. If omitted, the default value is zero.

The following are supported for the text-based file formats (csv, tsv, csv.gz, tsv.gz, zip and 7z):

• separator (text): Indicates the separator between cell values to be used when parsing the file. If omitted, Envision will use auto-detection heuristics to determine the most likely separator.
• quotes (Boolean): if true is used, the extreme quotes found in a cell token are considered to be part of the token and not escaped. If omitted, the default value is false and Envision escapes extreme quotes, as is usually done with CSV (comma-separated values) files.
• encoding (text): the text encoding to be used while parsing the file. The accepted encoding values are: UTF-8, UTF-16 and latin1. If omitted, Envision will use auto-detection heuristics to determine the most likely encoding.
• headers (text): allows to specify column headers that are missing in the input file; the data type of such columns can be indicated, as usual, after the with keyword, using the assigned header.

The script below illustrates how the above options can be specified:

read "/sample/Lokad_Orders.tsv" skip:0 separator:"\t" quotes:false \
encoding:"UTF-8" headers:"Id, Date, Quantity" as Orders with
"Id" as Ref : text
"Date" as OrderDate : date
Quantity : number

read "/foo/mybadfile.part.gzip" fileformat: "csv.gz" as MyFile with
//Comment: The fileformat read option parses the files with non-matching extension,
//e.g. .part.gzip as .csv.gz files.



### Table type options

Envision offers the possibility to define the primary keys for a table, by listing them between parentheses right after specifying the table name. There are only 5 primary key combinations that are allowed:

read "a.csv" as A with ..
read "b.csv" as C expect [Id] with ..
read "c.csv" as E expect [Date] with ..
read "d.csv" as G expect [Id, Date] with ..


If the expect option is omitted, then Envision does not expect anything from the table.

## Column-level options

Multiple options are also supported for each column to be read:

• The data type of the column, which can be text, number, date, boolean, ranvar or zedfunc. The data type is introduced after a colon (:) that follows the column name.
• date: the date format string; beware the keyword date is going to appear twice, first to specify the format, second to specify the data type.
• number: the number format string
• const: attributes a constant value (of type text, boolean, number or date) to a column that does not exist in the file; this is particularly helpful when reading multiple files as one single table (see also Read and write files)

The following script illustrates the usage of the column options:

read "/sample/Lokad_Orders.tsv" as Orders[Id] with
date:"dd-MM-yyyy" as OrderDate : date
number:"1,000.00" as Qty : number
const(true) as IsClosed


Every column later used in the Envision script must be declared in the read statement. Yet, only the column name and its data type are mandatory. The other elements, e.g. format, are optional.

Ranvars and zedfuncs are both advance data types that cannot be represented in plain text. In order to write (resp. read) those data types to (resp. from) files, Lokad’s Ionic file format has to be used.

Both dates and numbers benefit from a format option intended to remove ambiguities that could exist when the data is represented as plain text. The Lokad Ionic file format is strong-typed, and thus, those options are typically only required when reading flat text files.

For example the plain-text date 9/7/2019 can be interpreted both as July 9th, 2019 or Sep 7th, 2019. The following script removes this ambiguity enforcing the Sep 7th, 2019 semantic:

read "/sample/Lokad_Orders.tsv" as Orders expect [Id] with
Id : text
date: "M/d/yyyy" as OrderDate : date
Qty : number


Following up on this example, to additionally rename the “OrderDate” column and use it as “Date”, the script reads as follows:

read "/sample/Lokad_Orders.tsv" as Orders expect [Id,Date] with
Id : text
"OrderDate" date: "M/d/yyyy" as Date : date
Qty : number


### Number options

The number format string imitates the actual desired output for the numbers:

• The character after the initial 1 is the thousand separator. It can be omitted.
• The character before the last zeroes is the decimal separator. It can be omitted.

### Date options

The format string for the date option can be built from the following tokens:

• d: day of the month from 1 through to 31
• dd: same as d but 0 prefixed
• ddd: abbreviated day of the week (i.e. Mon, Tue, …)
• dddd: full day of the week (Monday, …)
• MM: month number from 1 through to 12
• MMM: abbreviated name of the month (Jun)
• MMMM: full name of the month (June)
• yy: year number from 00 to 99; ignore the hundreds and thousands
• yyyy: year with four digits

### Default values

When missing values are identified, the parser replaces empty values according to the declared data type of the column as follows:

• 0 (zero) for number data type
• false for Boolean data type
• January 1st, 2001 for date data type
• the empty string for text data type
• the Dirac delta at 0 for ranvars or the constant function at 0 for zedfuncs