# Loops and iterations

While the design of Envision intentionally omits *arbitrary* loops, it does feature multiple mechanisms to iterate. The intent behind this design is to avoid classes of problems associated with arbitrary loops, such as indeterminate termination and/or indeterminate memory consumption.

**Loop blocks** are a feature for repeating arbitrary sets of operations up to 10 times.

**Iteration blocks** are a feature for repeating *complex* operations over all lines of a table. These are divided into three categories:

`each`

blocks apply the operation independently to every line of the table, thus ensuring good performance through parallelization.`each .. scan`

blocks apply the operation one line at a time, in the requested order, and can remember data from one line to the next.`autodiff`

blocks describe a function to minimize for every line, then perform*gradient descent*to minimize that value. This type of block is detailed in a later section.

Unlike most programming languages, in Envision, explicit loops and iterations are considered as moderately advanced language features. As a rule of thumb, these features should be avoided whenever the same effect can be obtained by leveraging the relational algebra.

**Table of contents**

`loop`

blocks

A `loop`

block repeats a block of Envision code. The number of iterations is passed as an integer literal after the `loop`

keyword. The value of the integer must be between 2 and 10 (inclusive). The following script illustrates the `loop`

block:

```
a = 1
loop 3
b = a + 1
a = 2 * b
show summary "" with a, b // 22, 11
```

Unlike most blocks in Envision, a `loop`

block does not involve any scoping. Hence, in the script above, the variable `b`

remains accessible after exiting the `loop`

block.

This script could have been equivalently written:

```
a = 1
b = a + 1 // iteration 1
a = 2 * b
b = a + 1 // iteration 2
a = 2 * b
b = a + 1 // iteration 3
a = 2 * b
show summary "" with a, b // 22, 11
```

Loop blocks cannot include any tile declaration (i.e. `show`

blocks), but they can include table declarations under certain conditions, as illustrated by the following example:

```
a = 10
b = 0
table T = extend.range(10)
loop 10
a = a - 1
table T = where T.N < a
b = b + sum(T.N)
show summary "" with a, b // 0, 120
```

In the above script, the repeated declarations of the table `T`

are valid because they are filters iteratively applied over the same table.

**Roadmap**: the constraints on the `loop`

argument will be relaxed to allow a compile-time constant to be used instead of a literal.

`each`

blocks

The purpose of the `each`

block is to extend what can be done via the relational algebra beyond stand-alone expressions. As the name suggests, `each`

blocks offer the possibility to repeat a *block* of independent operations for each line of a table.

An `each`

block operates over a table referred to as the **observation table** - named immediately after the `each`

keyword - and produces one or more vectors in that table. The following script illustrates the `each`

block:

```
table Obs = with
[| as N |]
[| 1 |]
[| 2 |]
[| 3 |]
Obs.Cpy = each Obs
cpy = Obs.N + 1
return cpy
show table "" a1b3 with Obs.N, Obs.Cpy
```

However, the script above could be rewritten in a simpler way through the relational algebra with a stand-alone expression:

```
table Obs = with
[| as N |]
[| 1 |]
[| 2 |]
[| 3 |]
Obs.Cpy = Obs.N + 1
show table "" a1b3 with Obs.N, Obs.Cpy
```

As there are no dependencies between operations performed for every line of the observation table, the execution of the `each`

block is automatically parallelized by the Envision runtime.

Inside the block, vectors from the observation table can be assigned to scalar variables, while vectors in tables that are unrelated to the observation table can be manipulated as full vectors:

```
table Products = with
[| "Hat" as Label, 37 as PurchaseQty, 15.00 as UnitPrice |]
[| "Shirt" , 112 , 55.00 |]
table Discounts = with
[| 1 as qty, 0 as Discount |]
[| 10 , 0.1 |]
[| 100 , 0.2 |]
Products.Amount = each Products
Quantity = Products.PurchaseQty
Discount = last(Discounts.Discount) if(Discounts.Qty <= Quantity) sort Discounts.Qty
return Products.UnitPrice * (1 - Discount)
show table "" a1c2 with
Products.Label
Products.PurchaseQty
Products.UnitPrice
Products.Amount
```

In the above example, for each product, the entire table `Discounts`

is traversed to find the discount associated with the quantity being purchased. Without `each`

, the above logic could be implemented with `cross`

:

```
table Products = with
[| "Hat" as Label, 37 as PurchaseQty, 15.00 as UnitPrice |]
[| "Shirt" , 112 , 55.00 |]
table Discounts = with
[| 1 as qty, 0 as Discount |]
[| 10 , 0.1 |]
[| 100 , 0.2 |]
Products.Amount = with
Products.Discount = last(Discounts.Discount)
if (Discounts.Qty < Products.PurchaseQty)
cross (Products, Discounts)
sort Discounts.Qty
return Products.UnitPrice * (1 - Products.Discount)
show table "" a1c2 with
Products.Label
Products.PurchaseQty
Products.UnitPrice
Products.Amount
```

However the `each`

block is more readable and more performant than a `cross`

aggregation, especially once the logic grows so complex that several `cross`

aggregations are required to represent it. It is therefore recommended to use `each`

instead of `cross`

(this does not apply to `cross .. by .. at`

).

If the observation table happens to be the left component of a cross-table, then vectors from that cross-table can also be loaded into the `each`

block:

```
table Products = with
[| "Hat" as Label, 15.00 as UnitPrice |]
[| "Shirt" , 55.00 |]
table Colors = with
[| "blue" as Color, 1.1 as PriceFactor |]
[| "red" , 1.2 |]
[| "white" , 0.9 |]
// 'Products' is the left component of 'Variants'
table Variants = cross(Products, Colors)
Variants.Price = Products.UnitPrice * Colors.PriceFactor
// Must be computed outside of the 'each'
Variants.MedProductPrice = median(Variants.Price) by Products.Label
Products.MedOfMed = each Products
// Right cross table can import cross table value
// when each is performed on the left cross table
Colors.MedProductPrice = Variants.MedProductPrice
return median(Colors.MedProductPrice)
show table "" a1c2 with
Products.Label
Products.MedOfMed
```

### Table diagram and limitations

The behaviour of tables in an `each`

block depends on that table’s relationship with the observation table. The tables are classified as follows:

The

**observation table**appears after the`each`

keyword. The body of the block is executed once for each line in the observation table.An

**upstream table**is any table from which one can broadcast (directly or indirectly) into the observation table. Like the observation table, vectors from upstream tables can be assigned to scalars.A

**downstream table**is any table into which one can broadcast (directly or indirectly) from the observation table.A

**full table**is any non-cross table that is neither upstream nor downstream from the observation table. Vectors from full tables can be manipulated as full vectors.An

**upstream-cross table**is any cross table where the left component is either the observation table or an upstream table, and the right component is a full table.

Tables which do not fit any of the above criteria cannot be used in the `each`

block.

Also while upstream and downstream tables allow indirect broadcasting, the chain of broadcast must be *unique* for the table to be available, otherwise the ambiguity is reported as a compilation error.

In order to reliably achieve good performance even for complex operations, an `each`

block has several strict limitations:

Full tables and upstream-cross tables must be

*small*. This allows Envision to keep the complete vectors in-memory.Vectors cannot be broadcast or aggregated from one full table into another. Broadcasting from a scalar to a full table is permitted, and so is aggregating from a full table to a scalar.

Expression Allowed? `Day.X = X`

Yes `Day.X = Week.X`

*No*`X = sum(Day.X)`

Yes `Week.X = sum(Day.X)`

*No*`Week.X = sum(Week.X) by Week.C`

*No*By extension,

`by`

,`cross`

or`over`

aggregation, or lookups cannot be used.Cannot

`sort`

by a value that was computed during the`each`

. However, values from outside the`each`

can be used to sort.Only maps and aggregations can be used. Functions which operate on full vectors (such as

`argfirst`

) are not allowed. In particular,`scan`

is not supported.

**Roadmap**: Downstream tables cannot currently be used in `each`

block, however we intend to make this feature available in the future. The filters `when`

and `where`

are not supported either but are likely to be supported in the future.

### Exercise

What is the classification of each of those tables in the three `each`

blocks below ?

```
read ".." as Category[category]
read ".." as Product[sku] expect [category]
read ".." as Channel[channel]
read ".." as Orders expect [channel, sku, date]
table CategoryWeek = cross(Category, Week)
table ProductWeek = cross(Product, Week)
table ChannelWeek = cross(Channel, Week)
Product.X = each Product
// Here ?
Week.X = each Week
// Here ?
Category.X = each Category
// Here ?
```

#### Answers

Table | `each Product` | `each Week` | `each Category` |
---|---|---|---|

`Product` | Observation | Full | Downstream |

`Week` | Full | Observation | Full |

`Category` | Upstream | Full | Observation |

`Channel` | Full | Full | Full |

`Orders` | Downstream | Downstream | Downstream |

`CategoryWeek` | Upstream-Cross | Unavailable | Upstream-Cross |

`ProductWeek` | Upstream-Cross | Unavailable | Unavailable |

`ChannelWeek` | Unavailable | Unavailable | Unavailable |

`each .. scan`

blocks

The `each .. scan`

blocks allow you to keep values from one iteration to the next. However, this extra capability implies that the Envision runtime can’t parallelize the iterations of an `each .. scan`

block. Thus, this variant should only be favored when values need to be kept from one iteration to the next. A simple example is given below:

```
table Obs = with
[| date(2021, 1, 1) as Date, 13 as Quantity |]
[| date(2021, 2, 1) , 11 |]
[| date(2021, 3, 1) , 17 |]
[| date(2021, 4, 1) , 18 |]
[| date(2021, 5, 1) , 16 |]
Best = 0
Obs.BestSoFar = each Obs scan Obs.Date
keep Best
NewBest = max(Best, Obs.Quantity)
Best = NewBest
return NewBest
show table "" a1b4 with Obs.Date, Obs.BestSoFar
```

In the above script, `scan Obs.Date`

specifies the order in which the lines of the observation table are to be traversed. The statement `keep Best`

specifies that the variable `Best`

must retain its value from one observation line to the next. Finally, `Best = NewBest`

assigns a new value to the variable ; it will be the one available on the next observation line.

Lines of the observation table are processed in the ascender order. However, the option `desc`

can be used to specify the descending order, as illustrated by:

```
table Obs = with
[| date(2021, 1, 1) as Date, 13 as Quantity |]
[| date(2021, 2, 1) , 11 |]
[| date(2021, 3, 1) , 17 |]
[| date(2021, 4, 1) , 18 |]
[| date(2021, 5, 1) , 16 |]
Best = 0
Obs.BestSoFar = each Obs scan Obs.Date desc
keep Best
NewBest = max(Best, Obs.Quantity)
Best = NewBest
return NewBest
show table "" a1b4 with Obs.Date, Obs.BestSoFar
```

The `each .. scan`

block comes with a short series of syntactic constraints relative to the `keep`

statements. The block requires at least one `keep`

statement. All the `keep`

statements must be made at the very beginning of the `each .. scan`

block. The `keep`

statements must refer to variables that have already been defined, prior to the `each .. scan`

block. A variable marked with `keep`

is modified by the execution of the `each .. scan`

block. Its last value remains available after exiting the `each .. scan`

block.

`keep`

vectors must be from small tables in order to be kept in-memory, and must be scalars, full-table or upstream-table vectors.

As a rule of thumb, user-defined processes should be preferred to `each .. scan`

blocks whenever possible. The `each .. scan`

block should be used when the logic grows too complex, or involves keeping non-scalar variables.

### Return-less blocks

It may happens that an `each .. block`

is introduced for the sole purpose of getting the last value held by a `keep`

variable. Thus, the `return`

statement may be omitted altogether as illustrated by the following script:

```
table Currencies = with
[| "EUR" as Code |]
[| "JPY" |]
[| "USD" |]
Sep = ""
List = ""
each Currencies scan Currencies.Code
keep Sep
keep List
List = "\{List}\{Sep}\{Currencies.Code}"
Sep = ", "
show scalar "" with List
```

In the above script, the variable `List`

is built through iterative concatenations. However, as only the final form is of interest, a return-less `each .. block`

is used.

In practice, however, the above script could be rewritten in simpler way leveraging the built-in `join`

aggregator as illustrated by:

```
table Currencies = with
[| "EUR" as Code |]
[| "JPY" |]
[| "USD" |]
show scalar "" with join(Currencies.Code; ", ") sort Currencies.Code
```

`auto`

ordering in `scan`

The ordering of the `scan`

follows the primary dimension of the table being enumerated through the use of the keyword `auto`

:

```
table T = extend.range(6)
x = 0
T.X = each T scan auto
keep x
x = T.N - x
return x
show table "T" a1b5 with T.N, T.X
```

The above script is logically identical to the one below:

```
table T[t] = extend.range(6)
x = 0
T.X = each T scan t
keep x
x = T.N - x
return x
show table "T" a1b5 with T.N, T.X
```

### Any-order blocks

While persisting variables from one line to the next might be needed, the specific ordering might not matter. Envision provides a syntax to deal with those situations as illustrated by:

```
table Obs = with
[| 42 as X |]
[| 41 |]
[| 45 |]
myMin = 1B
myMax = -(1B)
each Obs scan Obs.*
keep myMin
keep myMax
myMin = min(myMin, Obs.X)
myMax = max(myMax, Obs.X)
show summary "" a1b2 with myMin, myMax
```

In the above script, the `scan Values.*`

indicates that an arbitrary order is taken.

As a rule of thumb, this feature should be considered as fringe and sparingly used. Indeed, the Envision compiler does not rely on any proof that ordering does not matter. Hence, if accidentally ordering does matter, the ambiguity might be resolved in non-predictable ways by the Envision runtime.

`each .. when`

blocks

Iterations can be filtered. The `each .. when`

block only executes its body on lines where the condition specified by `when`

is `true`

.

```
table T = extend.range(5)
s = 0
each T scan auto when T.N mod 2 == 1
keep s
s = s + T.N
show scalar "odd sum" with s // 9
```

In the above script, the filter `when T.N mod 2 == 1`

is applied to every line of the table `T`

. It filters out every line where `T.N`

is even.

The `each .. when`

block cannot return a vector, via the keyword `return`

as lines would be missing. Instead, variables marked as `keep`

must be used to extract information from the iteration.

**Roadmap:** the `when`

condition cannot yet reference a variable marked as `keep`

. We plan to lift this limitation in the future. This will allow dynamic filtering with regards to the iteration process itself.

## Illustration: k-means clustering

The k-means method is a clustering algorithm that partitions the observations into $k$ clusters. It can be implemented in Envision using the `loop`

block. The following script illustrates a simple k-means problem with points distributed over the unit circle.

```
numberOfClusters = 10
numberOfPoints = 1000
table C = extend.range(numberOfCluster)
table C[c] = by C.N
table enum D[d] = "x", "y"
table T[t] = extend.range(numberOfPoints)
table TD = cross(T, D)
table CD = cross(C, D)
table TC = cross(T, C)
// Randomly generated data
TD.XY = random.uniform(-0.5 into TD, 0.5)
TD.XY = TD.XY / sqrt(sum(TD.XY^2) into T) // unit circle
// Cluster initialization
CD.XY = random.uniform(-0.5 into CD, 0.5)
CD.XY = CD.XY / sqrt(sum(CD.XY^2) into C) // unit circle
loop 10
// Compute the distance between each observation and each cluster.
TC.Distance = each T
C.Distance = sum((CD.XY - TD.XY)^2)
return C.Distance
// Find the closest cluster for each observation.
T.ClosestC = argmin(TC.Distance, TC.c)
TD.ClosestC = T.ClosestC
// Update the cluster values to be equal to the average value of its associated observations.
CD.XY = avg(TD.XY) by [TD.ClosestC, TD.d] at [CD.c, CD.d]
// Display points and centroids
T.X = TD.XY[d:"x"]
T.Y = TD.XY[d:"y"]
C.X = CD.XY[d:"x"]
C.Y = CD.XY[d:"y"]
table U = with
[| T.X as X, T.Y as Y, true as IsPoint|]
[| C.X, C.Y, false |]
U.Color = if U.IsPoint then "blue" else "red"
show scatter "Points (blue) and Cendroids (red)" a1d8 with
U.X
U.Y { color: #[U.Color] }
```