Navigation :

schema

Table of contents

schema, keyword

Schemas bind a list of typed fields to a named schema or to a path. They reduce repetition and stabilize file formats across scripts. They can also be used to shape the output of union tables.

Path schemas

A path schema binds a file path to a list of fields. The path literal uses single quotes.

schema '/sample/products.csv' with
  Product : text
  Color : text
  Price : number

table Products = with
  [| as Product, as Color, as Price |]
  [| "shirt", "white,grey", 10.50 |]
  [| "pants", "blue", 15.00 |]
  [| "hat", "red", 5.25 |]

write Products as '/sample/products.csv'

The same schema can be used on the read side:

schema '/sample/products.csv' with
  Product : text
  Color : text
  Price : number

read '/sample/products.csv' as Products

show table "My Products" with
  Products.Product
  Products.Color
  Products.Price

Field documentation

Triple-slash comments are attached to fields and surfaced in the editor:

schema '/sample/products.csv' with
  /// The product identifier.
  Product : text
  /// The 3-letter color code.
  Color : text
  /// The VAT-included unit price.
  Price : number

Field renaming

Use read("Column Name") to bind a schema field to a raw column name:

schema '/sample/products.csv' with
  VAT : number = read("value added tax")

table Products = with
  [| as VAT |]
  [| 0.2 |]

write Products as '/sample/products.csv'

Field rebinding on write

Rebinding overrides the default binding for a path schema:

schema '/sample/products.csv' with
  Product : text
  Color : text
  Size : text
  Price : number

table Products = with
  [| as Product, as Color, as Price |]
  [| "shirt", "white,grey", 10.50 |]
  [| "pants", "blue", 15.00 |]

write Products as '/sample/products.csv' with
  Size = "XL"
  Color = uppercase(Products.Color)

Path schemas do not support isolated rebinding on read; use a named schema for read-side rebinding.

Named schemas

Named schemas define field lists without a path. They can be embedded inside path schemas or used directly in read and write blocks.

schema Products with
  Product : text
  Color : text
  Price : number

schema '/sample/products.csv' with
  schema Products

Named schemas can be used stand-alone:

schema Products with
  Product : text
  Color : text
  Price : number

read "/sample/products.csv" as Products with
  schema Products

Named schemas are incomplete by default: read/write blocks may add extra fields.

schema Products with
  Product : text
  Color : text

read "/sample/products.csv" as Products with
  schema Products
  Price : number

Composing schemas

Schemas can include other schemas. Duplicate field names are rejected.

schema JustProduct with
  Product : text

schema JustColor with
  Color : text

schema Products with
  schema JustProduct
  schema JustColor
  Price : number

Field rebinding on read

Named schemas can be rebound inside a read block:

schema PartialProducts with
  ProductId : text
  Color : text
  Size : text

read "/sample/products.csv" as Products with
  schema PartialProducts with
    ProductId = read("Product")
    Size = "extra large"
  Price : number

Field rebinding on write

schema PartialProducts with
  Product : text
  Color : text

table Products = with
  [| as Name, as Color, as Price |]
  [| "shirt", "white,grey", 10.50 |]

write Products as "/sample/products.csv" with
  schema PartialProducts with
    Product = Products.Name
  Price = Products.Price

Path literals and prefixing

Path literals are single-quoted. Prefixing uses \{..} at the start of the path.

schema '/sample/products.csv' with
  Product : text
  Color : text
  Price : number

const myFolder = '/sample'
read '\{myFolder}/products.csv' as Products

Path schema cloning

Cloning creates a new path schema that reuses a path schema definition:

schema '/production/products.csv' with
  Product : text
  Color : text
  Price : number

schema '/sample/products.csv' = '/production/products.csv'

Prefix cloning applies to all schemas under a folder:

schema '/production/products.csv' with
  Product : text
  Color : text
  Price : number

schema '/sample' = '/production'

Parameterized paths and partitions

Path parameters allow partitioned reads and writes: This mechanism is intended for internal partitioned flows, not for raw multi-file extractions from third-party systems.

schema '/sample/products-\{Bucket}.csv' with
  Bucket : number
  Product : text
  Color : text
  Price : number

table Products = with
  [| as Product, as Color, as Price, as Bucket |]
  [| "shirt", "white", 10.50, 1 |]
  [| "pants", "blue", 15.00, 2 |]

write Products partitioned as '/sample/products-\{..}.csv'

Reading uses the same parameterized path:

schema '/sample/products-\{Bucket}.csv' with
  Bucket : number
  Product : text
  Color : text
  Price : number

read '/sample/products-\{..}.csv' as Products

Bounds restrict the captured range:

schema '/sample/products-\{Bucket}.csv' with
  Bucket : number
  Product : text
  Color : text
  Price : number

const lowerIncl = 1
const higherIncl = 2
read '/sample/products-\{lowerIncl..higherIncl}.csv' as Products

Path parameters must be named once in the path and must appear as schema fields. Supported parameter types are text, number, date, week, and month. Multiple parameters are allowed when delimited by separators.

Parameterized reads can also bind constants or lists:

schema '/archives/\{Category}/Stock.ion' with
  Category : text
  Id : text
  Quantity : number

read '/archives/\{"Electronics", "Home"}/Stock.ion' as Stock

Partitioned writes use \{..} to inject the schema parameter value. Use partitioned as for parametric paths:

schema '/archives/\{Category}/Stock.ion' with
  Category : text
  Id : text
  Quantity : number

table Stock = with
  [| as Category, as Id, as Quantity |]
  [| "Electronics", "A", 1 |]

write Stock partitioned as '/archives/\{..}/Stock.ion'

For writes, parameter values may be runtime scalars. Use write ... as when the parameter is scalar; use partitioned as only when it varies per row.

schema '/Items_\{Max}.ion' with
  Cat : text
  Max : text

table Items = with
  [| as Cat |]
  [| "A" |]
  [| "B" |]
  [| "C" |]

Max = max(Items.Cat)
Items.Max = Max
write Items as '/Items_\{Max}.ion'

Path schema collisions are rejected at compile time. Path parameters act as wildcards within their segment, so /archives/\{A}/Stock.ion collides with /archives/foo/Stock.ion and /archives/foo\{C}/Stock.ion.

Partitioned writes overwrite all captured files and delete empty ones. Files outside bounded paths remain untouched.

Size caps

Use max on a path schema to cap the number of lines:

schema '/sample/products.csv' max 10 with
  Product : text
  Color : text
  Price : number

You can also specify a warning threshold with max warn..error:

schema '/Items.ion' max 2..4 with
  Id : text

Enum downcast

Fields declared as text in a schema can be downcast to enums when reading:

schema '/sample/products.csv' with
  Product : text
  Color : text
  Price : number

read '/sample/products.csv' as Products with
  Color : table enum Colors

Aliasing on read

Aliases avoid name conflicts with schema fields:

schema '/sample/products.csv' with
  Color : text

read '/sample/products.csv' as Products with
  ColorAlias = Products.Color

This is also the way to expose a schema field that is hidden by a primary dimension of the same name (for example, dim2 = Tbl.dim).

Modules

Schemas can be exported from modules and reused:

// In "/sample/my-module"
export schema '/sample/products.csv' with
  Product : text
  Color : text
  Price : number

import "/sample/my-module" as M
read '/sample/products.csv' as Products

User Contributed Notes

0 notes + add a note