process
The contextual keyword process specifies the type of a user-defined function.
‘def process funname (params) with’, process definition
The modifier process indicates that the function sequentially processes its vector arguments, all while maintaining internal state.
Call options such as by, at, sort, and scan are only available on process calls; pure functions do not accept call
options. Options like by partition work by group, while scan enforces sequential evaluation.
table T = with
[| as N, as X |]
[| 0, 1 |]
[| 1, 2 |]
[| 2, -1 |]
def process sumOfSquares(x : number) with
keep sum = 0
sum = sum + x * x
return sum
T.CumulativeSum = sumOfSquares(T.X) scan T.N
total = sumOfSquares(T.X) sort T.N
show table "" with T.N, T.X, T.CumulativeSum
// 6
show scalar "" with total
The first show statement results in the following table:
| N | X | CumulativeSum |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 2 | 5 |
| 2 | -1 | 6 |
In our example, the keywords used to invoke sumOfSquares are scan and sort. The scan keyword feeds sumOfSquares with T.X values according to the order of T.N, namely: first 1, then 2, and finally -1. As a result, we obtain a vector containing all the steps of computation:
sum = 0 + 1 * 1 = 1sum = 1 + 2 * 2 = 5sum = 5 + (-1) * (-1) = 6
At each step, the internal state sum (introduced by the keep keyword) has a definite value: it is 0 at the beginning, 1 after the first step, 5 after the second step, and 6 after the third step.
Following the explanation above, the sort keyword acts in the same way as scan, but instead of returning a vector of results, it only returns the final value of sum, which is 6. Therefore, the total variable is not a vector but a mere scalar, which we print with show scalar.
Every process call must specify an order with sort or scan (including scan auto). The compiler enforces ordering even if the process logic is commutative.
table T = extend.range(3)
def process runningCount(x : number) with
keep c = 0
c = c + 1
return c
T.C = runningCount(T.N) scan auto
show table "" with T.N, T.C
Vector arguments can be comma-separated. For example, if we wanted to sum the squares of two numbers, we would define the function as follows:
table T = with
[| as N, as X, as Y |]
[| 0, 1, 4 |]
[| 1, 2, 5 |]
[| 2, 3, 6 |]
def process sumOfSquares(x : number, y : number) with
keep sum = 0
sum = sum + x * x + y * y
return sum
total = sumOfSquares(T.X, T.Y) sort T.N
// 91
show scalar "" with total
The computation would then proceed as follows:
sum = 0 + 1 * 1 + 4 * 4 = 17sum = 17 + 2 * 2 + 5 * 5 = 46sum = 46 + 3 * 3 + 6 * 6 = 91
When a by clause is specified, the process state is initialized once per group and reset between groups. With scan, the process emits a value for each line in each group.
table T = with
[| as Group, as N |]
[| "A", 1 |]
[| "A", 2 |]
[| "B", 3 |]
def process runningSum(x : number) with
keep s = 0
s = s + x
return s
T.S = runningSum(T.N) by T.Group scan T.N
show table "" with T.Group, T.N, T.S
On the other hand, if wanted to initialize sum with a specific value, we would pass it as an additional argument after a semicolon (;):
table T = with
[| as N, as X |]
[| 0, 1 |]
[| 1, 2 |]
[| 2, -1 |]
def process sumOfSquares(x : number; seed : number) with
keep sum = seed
sum = sum + x * x
return sum
total = sumOfSquares(T.X; 5) sort T.N
// 11
show scalar "" with total
This will be computed as follows:
sum = 5 + 1 * 1 = 6sum = 6 + 2 * 2 = 10sum = 10 + (-1) * (-1) = 11
Group arguments are aligned with the grouped table and remain constant for the duration of a group.
table Items = with
[| as Product, as Color, as Sep |]
[| "shirt", "pink", ", " |]
[| "shirt", "white", ", " |]
[| "socks", "green", " - " |]
[| "socks", "yellow", " - " |]
table Products[Product] = by Items.Product
Products.Sep = same(Items.Sep)
def process joinColors(c : text; sep : text) with
keep acc = ""
if acc == ""
acc = c
else
acc = "\{acc}\{sep}\{c}"
return acc
Products.Colors =
joinColors(Items.Color; Products.Sep)
sort Items.Color
show table "" with Product, Products.Colors
Of course, it is also possible to use multiple vector arguments and multiple initialization arguments at the same time, as the following example demonstrates:
table T = with
[| as N, as X, as Y |]
[| 0, 1, 4 |]
[| 1, 2, 5 |]
[| 2, 3, 6 |]
def process sumOfSquares(x : number, y : number; seedX : number, seedY : number) with
keep sum = seedX + seedY
sum = sum + x * x + y * y
return sum
total = sumOfSquares(T.X, T.Y; 5, 10) sort T.N
// 106
show scalar "" with total
The computation proceeds similarly to the previous examples:
sum = 5 + 10 + 1 * 1 + 4 * 4 = 32sum = 32 + 2 * 2 + 5 * 5 = 61sum = 61 + 3 * 3 + 6 * 6 = 106
Finally, what happens if the vector arguments are empty? In this scenario, the process will return the default value of the data type. For example:
table T = with
[| as N, as X |]
[| 0, 1 |]
[| 1, 2 |]
[| 2, -1 |]
def process sumOfSquares(x : number) with
keep sum = 0
sum = sum + x * x
return sum
where T.X > 100
total = sumOfSquares(T.X) sort T.N
// 0
show scalar "" with total
Since where T.X > 100 essentially filters out all the elements of T.X, the process returns 0.
However, it is also possible to specify what default return value must be. In the following example, the default return value is explicitly set to 42:
table T = with
[| as N, as X |]
[| 0, 1 |]
[| 1, 2 |]
[| 2, -1 |]
def process sumOfSquares(x : number) default 42 with
keep sum = 0
sum = sum + x * x
return sum
where T.X > 100
total = sumOfSquares(T.X) sort T.N
// 42
show scalar "" with total
All keep statements, including keep process, must appear before any non-keep statement in a process body.
Process instances
A process instance separates state access from state updates.
table T = with
[| as N |]
[| 2 |]
[| 4 |]
[| 8 |]
def process sumPlusOne(x : number) with
keep process acc = sum(number)
state = acc + 1
updated = acc(x)
return (state, updated)
T.State, T.Updated = sumPlusOne(T.N) scan T.N
show table "" with T.N, T.State, T.Updated