Navigation :

File I/O and schemas

Envision is file centric by design. That can feel unusual if you are used to a database or a live API. This page explains the rationale: file I/O is not an implementation detail but a modeling choice that supports scale, reproducibility, and operational safety.

Table of contents

Files are contracts, not just storage

A file in the Lokad platform is a contract between systems. It is a promise about structure and meaning, not just a container for rows. This contract is why Envision asks you to specify schemas explicitly and why it treats the filesystem as immutable during execution.

Immutability is a defensive design. When you run a script, you want to be sure that its input data does not change halfway through. Envision guarantees that: all reads are from a fixed snapshot, and all writes appear atomically at the end. The result is that you can reason about data quality and run consistency without worrying about hidden race conditions.

Reads are declarations

A read block is not an instruction to fetch data right now; it is a declaration of how the script expects data to look. This is why read blocks must appear at the top of a script. By enforcing that ordering, the compiler can validate schemas before any other logic runs.

// Write an output file from a table comprehension.
table Products = with
  [| as Sku, as Color, as Price |]
  [| "A-01", "white", 10.5 |]
  [| "B-02", "blue", 15.0 |]
  [| "C-03", "red", 5.25 |]

write Products as "/demo/products.csv" with
  Sku = Products.Sku
  Color = Products.Color
  Price = Products.Price

This script is a pure declaration: it says which table to export, which columns to include, and how those columns are computed. The filesystem update happens after the script succeeds, which is why a single script cannot read the file it just wrote.

Reads and relationships travel together

In Envision, a read block can also declare dimensions. That matters because it aligns tables at the moment they are ingested. When you declare dimensions early, you minimize the risk of accidental joins later.

read "/demo/orders.csv" as Orders expect [date] with
  "Sku" as Sku : text
  "OrderDate" as date : date
  Qty : number

table Items[sku] = by Orders.Sku
expect Orders.sku = Orders.Sku

Day.Qty = sum(Orders.Qty)
Items.TotalQty = sum(Orders.Qty)

show linechart "Daily orders" with
  Day.Qty
show table "Items" with
  Items.sku
  Items.TotalQty

This script shows two ideas at once. First, the expect [sku, date] declaration creates the special calendar tables and binds orders to them. Second, the Day table is a direct product of the read statement, not a separate data structure you must build manually. The result is a predictable pipeline from files to calendar analytics.

Schemas are reusable contracts

Named schemas exist to remove duplication and to stabilize file interfaces. They are useful when a pipeline spans multiple scripts or when several teams rely on the same file structure.

schema ProductSchema with
  Sku : text
  Price : number

read "/demo/items.csv" as Items with
  schema ProductSchema
  Category : text

write Items as "/demo/items-out.csv" with
  schema ProductSchema
  Category = Items.Category

Using a named schema communicates intent: the script cares about the file as a contract, not just as a source of columns. This is especially valuable when files are shared across systems or when external partners provide the data.

Types are explicit to avoid ambiguity

Envision does not infer data types. That can feel verbose, but it removes a class of silent bugs. In most data pipelines, type ambiguity shows up late: a numeric field suddenly receives a blank, or a date is formatted in a new locale. By forcing type declarations in read blocks, Envision makes those issues visible at compile time instead of at runtime.

Explicit types also matter for performance. They allow the runtime to choose efficient storage and computations without guessing. This is especially important for large files where small inefficiencies scale into large costs.

File formats are part of the modeling choice

The platform supports CSV and TSV variants for interoperability, but it also provides the Ionic format for performance and for richer data types such as probability distributions. This is another example of file I/O as a modeling decision. If you use Ionic, you are implicitly stating that the file is produced and consumed inside the Lokad platform, where performance and richer types matter more than broad compatibility.

If you choose CSV, you are optimizing for integration with other systems. Both are valid, but they carry different expectations about size, speed, and downstream tooling.

Path schemas expose location and structure together

A path schema binds a schema to a path. It is useful when a file is known and stable enough to be identified by its location. That can happen with reference data such as currency tables or with versioned exports from an ERP. Path schemas also make it obvious which files are inputs versus outputs.

The trade-off is flexibility. A path schema is more rigid than a named schema, so it should be reserved for files that are expected to remain stable and where the clarity is worth the constraint.

Patterns and partitions make scale explicit

Supply chain data often arrives as a set of files rather than a single file. Envision supports wildcard patterns and partitioned writes to make that explicit. This is not just convenience; it is also a way to signal that data volume is large enough to deserve special attention.

When you read a wildcard pattern, you are declaring that all matching files are semantically equivalent and should be treated as a single dataset. When you write partitioned files, you are declaring that consumers should not expect a single monolithic export. These are modeling choices, and they shape how future scripts will reason about data size and freshness.

Paths are part of the interface

File paths in the Lokad filesystem are case sensitive and follow Unix style conventions. That choice is deliberate: it keeps path handling consistent across systems and discourages hidden assumptions about case. Treat paths as part of the interface. When a path changes, the contract changes. This is another reason to keep path schemas or named schemas in a small number of well-known files, rather than copying path strings across many scripts.

You can think of path naming as a form of documentation. A clear path hierarchy, such as /inputs/erp/ or /outputs/forecast/, makes it easier for a reader to understand what is internal versus external, and what is expected to be stable.

Atomic writes are operational guarantees

An Envision script either writes all outputs or writes none. This is not just a filesystem detail; it is an operational contract that prevents half-updated pipelines. If you are exporting replenishment orders and a summary report, you do not want a downstream system to ingest one without the other. Atomic writes make the pipeline reliable without adding custom orchestration logic in your scripts.

This behavior also explains why a script cannot read its own outputs. The data is not visible until the script completes. That constraint feels restrictive, but it prevents subtle errors where a pipeline accidentally mixes old and new data in the same run.

User inputs and forms are files too

User inputs and forms might look like a separate feature, but they are still files at heart. A form is just a structured file that can be edited through the UI. Treating it as a file reinforces the idea that Envision is built around explicit data contracts rather than implicit UI state.

If you want to understand those mechanics in detail, see the reference pages on the form tile and the upload tile. The key point for this page is that the same schema rules apply: a form is not free-form text; it is data with declared types.

Evolving schemas without breaking pipelines

File schemas evolve. New columns appear, old columns vanish, and formats change. Envision handles this by making schemas explicit and allowing you to ignore fields you do not need. If a supplier adds a new column, your script can keep working as long as the columns you rely on remain stable. If a column is removed or changes type, the script fails loudly rather than silently misreading data.

This is another reason why the file-centric model is stable for long running supply chain initiatives. Failures are explicit and early, which is preferable to subtle drift that only shows up in the output.

Why the file-centric design matters

The file-centric approach is sometimes criticized as old fashioned. In practice, it aligns with how large enterprises share data: through exports, batch pipelines, and scheduled transfers. It also provides low operating cost and stable performance, which matters more than elegance when you need to run a daily decision pipeline.

By treating files as immutable inputs and schemas as contracts, Envision keeps data alignment explicit. The language trades a bit of convenience for transparency, which is the right trade when mistakes can propagate into purchase orders, production schedules, or pricing decisions.

User Contributed Notes

0 notes + add a note