Iterating with 'each'
An each
loop is a statement used to iterate over the lines of a table, executing the same set of operations once for each iteration. It is more complex but more flexible than for
loops.
The each
loop being an advanced feature, this article assumes in-depth knowledge about for
loops, and will describe each
in terms of its differences with for
loops.
In terms of syntax, the only difference is that in the loop header, the for Expr in Expr
is replaced with each Table
. For comparison, here is an example of the each
loop syntax and equivalent for
:
table T = extend.range(10)
A = 0
T.M = each T scan auto
keep A
A = A + T.N
return A
T.M = for N in T.N scan auto
keep A
A = A + N
return A
When given the choice of using a for
or an each
loop, it is strongly recommended to use the for
loop, as it is significantly simpler to understand. The each
loops should be used only to achieve things that are impossible with for
loops (see reasons to use each
below).
Table diagram
Both the each
and the for
loops execute their body once for every line of data in the iteration table. At the start of each iteration, the for
loop loads the input data into scalar variables as described by the .. in ..
pairs in its header. The each
loop instead loads the input data based on the table diagram. All the tables defined in the script are classified into one of the following categories:
-
The iteration table appears after the
each
keyword. The body of the loop is executed once for every line in the iteration table. -
An upstream table is any table from which one can broadcast (directly or indirectly) into the iteration table.
-
A downstream table is any table into which one can broadcast (directly or indirectly) from the iteration table. Downstream tables cannot be used in an
each
loop. -
An upstream-cross table is any cross table where either component is the iteration table or an upstream table. Upstream-cross tables can only be used in an
each
loop under the following conditions:- The right table must be a full table.
- It must be small, or have the iteration table as its left table.
-
Any other table is classified as full table. Full tables can only be used in an
each
loop if they are small.
The position of a table T
in the diagram determines what happens when reading a variable T.X
, writing a variable T.X
, or using into T
:
Diagram | Read variable | Write variable | into |
---|---|---|---|
Iteration tableIteration |
Returns the scalar value on the line corresponding to the current iteration.X = Iteration.X |
Writing to the iteration table is forbidden. | Broadcasting into the iteration table is forbidden. |
Upstream tableUpstream |
Identifies the line that would be broadcast into the line of the iteration table that corresponds to the current iteration, and returns the scalar value on that line.X = Upstream.X |
Allowed only for keep variables. Assigns a scalar to the line corresponding to the current iteration, leaving the rest unchanged.Upstream.X = X |
Broadcasting into an upstream table is forbidden. |
Upstream-cross tableUpstreamFull |
Identifies the line of the left table that corresponds to the current iteration, and returns the vector in the Full table that corresponds to that line.Full.X = UpstreamFull.x |
Writing to an upstream-cross table is forbidden. | Broadcasting into an upstream-cross table is forbidden. |
Full table | Normal behavior. | Normal behavior. | Normal behavior. |
Exercise
What is the classification of each of those tables in the three each
blocks below ?
read ".." as Category[category]
read ".." as Product[sku] expect [category]
read ".." as Channel[channel]
read ".." as Orders expect [channel, sku, date]
table CategoryWeek = cross(Category, Week)
table ProductWeek = cross(Product, Week)
table ChannelWeek = cross(Channel, Week)
Product.X = each Product
// Here ?
Week.X = each Week
// Here ?
Category.X = each Category
// Here ?
Answers
Table | each Product |
each Week |
each Category |
---|---|---|---|
Product |
Iteration | Full | Downstream |
Week |
Full | Iteration | Full |
Category |
Upstream | Full | Iteration |
Channel |
Full | Full | Full |
Orders |
Downstream | Downstream | Downstream |
CategoryWeek |
Upstream-Cross | Unavailable | Upstream-Cross |
ProductWeek |
Upstream-Cross | Unavailable | Unavailable |
ChannelWeek |
Unavailable | Unavailable | Unavailable |
each .. scan
blocks
Like for .. scan
, the each .. scan
blocks allow you to keep values from one iteration to the next. However, this extra capability implies that the Envision runtime can’t parallelize the iterations of an each .. scan
block. Thus, this variant should only be favored when values need to be kept from one iteration to the next. A simple example is given below:
table Obs = with
[| as Date, as Quantity |]
[| date(2021, 1, 1), 13 |]
[| date(2021, 2, 1), 11 |]
[| date(2021, 3, 1), 17 |]
[| date(2021, 4, 1), 18 |]
[| date(2021, 5, 1), 16 |]
Best = 0
Obs.BestSoFar = each Obs scan Obs.Date
keep Best
NewBest = max(Best, Obs.Quantity)
Best = NewBest
return NewBest
show table "" a1b4 with Obs.Date, Obs.BestSoFar
In the above script, scan Obs.Date
specifies the order in which the lines of the iteration table are to be traversed. The statement keep Best
specifies that the variable Best
must retain its value from one iteration line to the next. Finally, Best = NewBest
assigns a new value to the variable ; it will be the one available on the next iteration line.
Lines of the iteration table are processed in the ascender order. However, the option desc
can be used to specify the descending order, as illustrated by:
table Obs = with
[| as Date, as Quantity |]
[| date(2021, 1, 1), 13 |]
[| date(2021, 2, 1), 11 |]
[| date(2021, 3, 1), 17 |]
[| date(2021, 4, 1), 18 |]
[| date(2021, 5, 1), 16 |]
Best = 0
Obs.BestSoFar = each Obs scan Obs.Date desc
keep Best
NewBest = max(Best, Obs.Quantity)
Best = NewBest
return NewBest
show table "" a1b4 with Obs.Date, Obs.BestSoFar
The each .. scan
block comes with a short series of syntactic constraints relative to the keep
statements. The block requires at least one keep
statement. All the keep
statements must be made at the very beginning of the each .. scan
block. The keep
statements must refer to variables that have already been defined, prior to the each .. scan
block. A variable marked with keep
is modified by the execution of the each .. scan
block. Its last value remains available after exiting the each .. scan
block.
keep
vectors must be from small tables in order to be kept in-memory, and must be scalars, full-table or upstream-table vectors.
As a rule of thumb, user-defined processes should be preferred to each .. scan
blocks whenever possible. The each .. scan
block should be used when the logic grows too complex, or involves keeping non-scalar variables.
Return-less blocks
It may happens that an each .. block
is introduced for the sole purpose of getting the last value held by a keep
variable. Thus, the return
statement may be omitted altogether as illustrated by the following script:
table Currencies = with
[| as Code |]
[| "EUR" |]
[| "JPY" |]
[| "USD" |]
Sep = ""
List = ""
each Currencies scan Currencies.Code
keep Sep
keep List
List = "\{List}\{Sep}\{Currencies.Code}"
Sep = ", "
show scalar "" with List
In the above script, the variable List
is built through iterative concatenations. However, as only the final form is of interest, a return-less each .. block
is used.
In practice, however, the above script could be rewritten in simpler way leveraging the built-in join
aggregator as illustrated by:
table Currencies = with
[| as Code |]
[| "EUR" |]
[| "JPY" |]
[| "USD" |]
show scalar "" with join(Currencies.Code; ", ") sort Currencies.Code
auto
ordering in scan
The ordering of the scan
follows the primary dimension of the table being enumerated through the use of the keyword auto
:
table T = extend.range(6)
x = 0
T.X = each T scan auto
keep x
x = T.N - x
return x
show table "T" a1b5 with T.N, T.X
The above script is logically identical to the one below:
table T[t] = extend.range(6)
x = 0
T.X = each T scan t
keep x
x = T.N - x
return x
show table "T" a1b5 with T.N, T.X
Any-order blocks
While persisting variables from one line to the next might be needed, the specific ordering might not matter. Envision provides a syntax to deal with those situations as illustrated by:
table Obs = with
[| as X |]
[| 42 |]
[| 41 |]
[| 45 |]
myMin = 1B
myMax = -(1B)
each Obs scan Obs.*
keep myMin
keep myMax
myMin = min(myMin, Obs.X)
myMax = max(myMax, Obs.X)
show summary "" a1b2 with myMin, myMax
In the above script, the scan Values.*
indicates that an arbitrary order is taken.
As a rule of thumb, this feature should be considered as fringe and sparingly used. Indeed, the Envision compiler does not rely on any proof that ordering does not matter. Hence, if accidentally ordering does matter, the ambiguity might be resolved in non-predictable ways by the Envision runtime.
each .. when
blocks
Like for .. when
blocks, each
loops can be filtered. The each .. when
block only executes its body on lines where the condition specified by when
is true
.
table T = extend.range(5)
s = 0
each T scan auto when T.N mod 2 == 1
keep s
s = s + T.N
show scalar "odd sum" with s // 9
In the above script, the filter when T.N mod 2 == 1
is applied to every line of the table T
. It filters out every line where T.N
is even.
The each .. when
block cannot return a vector, via the keyword return
as lines would be missing. Instead, variables marked as keep
must be used to extract information from the iteration.
Reasons to use each
The following features are available in each
loops, but not in for
loops, and are thus proper reasons to use each
:
- Having a
keep
in an upstream table. - Reading values from upstream-cross tables.
If an each
loop is using neither of the above, consider changing it to a for
loop.