Table schemas
A table schema is a list of named and typed fields, possibly documented too. Schemas can be used for reading tables from files or writing tables to files, without having to repeat the list of names every time.
Table of contents
Schema overview
Schemas are intended as a means to stabilize and document the format of the tabular files either written or read by Envision scripts. Schemas help to avoid entire classes of programming mistakes rooted in the file-based interactions between scripts. For example, changing the name of a field in an output file can inadvertently break another script, executed downstream, as an expected field isn’t going to be found anymore. Moreover, if the same file is read in multiple scripts, then the use of a schema can significantly reduce the verbosity of those scripts.
Let’s introduce our first schema, and use it to produce a more concise write statement:
schema Products with
Product : text
Color : text
Price : number
table Products = with
[| as Product, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
write Products as "/sample/products.csv" with
schema Products
In the above script, the schema Products
is defined with three named and typed fields. The write
statement refers to the schema, and thus, omits the fields entirely.
A schema definition is introduced by the keyword schema
followed by the name of the schema, and by the keyword with
with opens a block. The expected fields are listed, with an extra level of indentation, within the schema
block.
The name of the schema does not have to match the name of the table. Although, as a convention, when a schema exactly matches a table, it is suggested to use the same name for both the table and the schema.
It is also possible, but optional, to document every field of the schema with:
schema Products with
/// The identifier of the product.
Product : text
Color : text
/// The unit price without tax.
Price : number
table Products = with
[| as Product, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
write Products as "/sample/products.csv" with
schema Products
In the above script, the two fields Product
and Price
benefit from an inline documentation. The comments introduced by ///
and they are situated the line above the one of the field declaration.
Conversely, we can use the schema to produce a more concise read
statement:
schema Products with
/// The identifier of the product.
Product : text
Color : text
/// The unit price without tax.
Price : number
read "/sample/products.csv" as Products with
schema Products
show table "My Products" a1b3 with
Products.Product
Products.Color
Products.Price
In the script above, the columns of the read
statement are omitted and replaced by the reference to the schema named Products
. The inline documentation introduced in the schema can be observed by hovering the mouse over the inner lines of the show
block.
Read and write statements
Schemas are also acceptable with read and write statements - as opposed to read and write blocks. This form is more concise. The read block introduced in the previous section can be rewritten as a read statement as follow:
schema Products with
/// The identifier of the product.
Product : text
Color : text
/// The unit price without tax.
Price : number
table Products = with
[| as Product, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
write Products as "/sample/products.csv" with schema Products
Similarly, the write block also introduced in the previous section can be rewritten as follow:
schema Products with
/// The identifier of the product.
Product : text
Color : text
/// The unit price without tax.
Price : number
read "/sample/products.csv" as Products with schema Products
show table "My Products" a1b3 with
Products.Product
Products.Color
Products.Price
Modules and schemas
A schema is typically intended to be used multiple times - at the very least used twice, once to write the table and once to read the table. In order to achieve the code reuse, schemas can defined in modules and consumed in scripts.
Let’s create a module named /sample/my-module
with:
export schema Products with
/// The identifier of the product.
Product : text
Color : text
/// The unit price without tax.
Price : number
The module contains the schema definition. The definition is prefixed by the keyword export
in order to make the schema accessible outside the module itself. This schema can then be imported from a script:
import "/sample/my-module" as MyModule
read "/sample/products.csv" as Products with
schema MyModule.Products
show table "My Products" a1b3 with
Products.Product
Products.Color
Products.Price
In the above script, the module is imported and referenced as MyModule
. The declaration schema MyModule.Products
refers to the Products
schema to be found in the module referred by the namespace MyModule
.
Roadmap: Envision does not support yet an alternative, and more concise syntax, to reference schemas found in modules. The module has to be explicitly referenced, and schema references require the module reference to be provided as a prefix. However, we are planning to introduce a prefix-free alternative syntax in the future.
Composing schemas
A schema can include another schema, and both read and write blocks offer the possibility to interleave field references and schema references. Those composition capabilities are intended to address complex scenarios where persisted tables may share subsets of fields.
schema JustProduct with
Product : text
schema JustColor with
Color : text
schema Products with
schema JustProduct
schema JustColor
Price : number
table Products = with
[| as Product, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
write Products as "/sample/products.csv" with
schema Products
In the above script, the schema definition block of Products
includes the two other schemas JustProduct
and JustColor
along with the field Price
listed directly in the block.
The definition of block of a schema can interleave schema-lines and field-lines. However, no duplicate field names are allowed. In particular, this prevents the possibility of redefining a field through successive schema definitions.
This composition mechanism is also available - under a fairly similar syntax - both for read blocks and write blocks. A read block can be composed with a mix of schema-lines and field-lines:
schema JustProduct with
Product : text
schema JustColor with
Color : text
read "/sample/products.csv" as Products with
schema JustProduct
schema JustColor
Price : number
show table "My Products" a1b3 with
Products.Product
Products.Color
Products.Price
In the above script, in the read
block, the fields Product
and Color
are referred through the schema, while the last field Price
is explicitly on its own line.
Conversely, a write block can also be composed with a mix of schema-lines and field-lines:
schema JustProduct with
Product : text
schema JustColor with
Color : text
table Products = with
[| as Product, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
write Products as "/sample/products.csv" with
schema JustProduct
schema JustColor
Price = Products.Price
Duplicated fields are now allowed neither for read blocks nor for write blocks. This restriction extends the restriction defined for schema definition blocks.
Read overrides
An existing file may diverge, field-wise, from the expectations set by a schema. This can happen whenever a schema is modified (but the relevant files are not), or because the file is produced by a third-party. For those situations, the schema-lines within a read block can be turned into inner schema blocks specifying overrides.
schema PartialProducts with
ProductId : text
Color : text
Size : text
read "/sample/products.csv" as Products with
schema PartialProducts with
ProductId = read "Product"
Size = "extra large"
Price : number
show table "My Products" a1c3 with
Products.ProductId
Products.Color
Products.Price
Products.Size
In the above script, the schema specifies two fields ProductId
and Size
that are not going to be found in the file /sample/products.csv
as produced in a previous section. Within the read
block, the field ProductId
is assigned the field "Product"
which requires prefixing the concrete field name with the keyword read
. Below, the field Size
is assigned a constant text litteral "extra large"
.
Turning a schema-line within a read block into an inner schema block offers the possibility to define field overwrittes. The use of the read
keyword indicates to look for a field in the file under a different name. In absence of read
keyword, the value specified on the right side of the assignment is used a constant literal.
In-schema overrides
The binding override can be specified within the schema. The purpose remains the same: binding a vector with a column that presents a different name. The syntax is aligned with the read override.
schema PartialProducts with
ProductId : text = read "Product"
Color : text
Size : text
read "/sample/products.csv" as Products with
schema PartialProducts
Price : number
show table "My Products" a1c3 with
Products.ProductId
Products.Color
Products.Price
Products.Size
In the above script, the keyword read
is used inside the schema
block in order to bind the ProductId
field to the field named "Product"
.
When a binding override is specified inside the schema via the keyword read
, the override governs the write behavior as well as the read behavior (the write behavior is detailed in the next section). This ensures that a file written by the schema can be read again through the same schema.
In-schema overrides are typically intended to cope with field names that are incompatible with the variable naming rules of Envision. For example, such a binding can handle a column named "Product Id"
(notice the whitespace) within the flat files. In-schema overrides also facilitate refactoring operations of the Envision code, removing the need to immediately rewrite the files when the corresponding variable names are changed.
Write overrides
Write statements performed with schema can also be overritten, not in the sense of producing a file that wouldn’t match the schema but in the sense of re-binding an arbitrary expression with the field schema:
schema PartialProducts with
Product : text
Color : text
table Products = with
[| as Name, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
write Products as "/sample/products.csv" with
schema PartialProducts with
Product = Products.Name
Price = Products.Price
In the above script, the schema PartialProducts
is referenced within the write
block. However, the table Products
does not contain a vector named Product
, this vector is named Name
. The assignment Product = Products.Name
overrides the automatic binding of a vector that would be expected to be named Products.Product
.
In most situations, the write overrides can be avoided altogether by performing the relevant assignment before the write block, as illustrated by:
schema PartialProducts with
Product : text
Color : text
table Products = with
[| as Name, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
Products.Product = Products.Name
write Products as "/sample/products.csv" with
schema PartialProducts
Price = Products.Price
However, such a (re)assignment may not be possible if the vector is already assigned with a conflicting data type:
schema PartialProducts with
Product : text
Color : text
table Products = with
[| as Name, as Color, as Price |]
[| "shirt", "white,grey", 10.50 |]
[| "pants", "blue", 15.00 |]
[| "hat", "red", 5.25 |]
Products.Product = 42
// not possible, mismatching type, number != text
// Products.Product = Products.Name
write Products as "/sample/products.csv" with
schema PartialProducts with
Product = Products.Name
Price = Products.Price
The intent of the write overrides is to allow the use of the schema in write blocks even if only of the field of the schema happens to collide with an existing vector in the script which does not have the correct semantic with regards to the expectation of the schema. As a guideline, we suggest to try to avoid those situations. It’s better if vectors manipulated in the script happen to be consistent with the fields exported through the schema. However, the write override offers a local fix if it is not the case.