# Algebra of Distributions

Mathematical distributions are powerful and useful for modelling many business situations, especially those where uncertainty exists. Envision treats *distributions* as first-class citizens, and can handle a wide range of operations to be performed with these very distributions. All these operations are collectively referred to as the *algebra of distributions* that are supported by Envision. In this section, we introduce this distribution data type, and review the various operators and functions that apply to it.

The distribution data type

Plotting a distribution

Point-wise operations

Generating distributions

Extending a distribution into a table

Convoluting distributions of probabilities

Historical developments

## The distribution data type

Mathematical distributions are objects that generalize the notion of functions. Within Envision, our ambition is more modest, and what we call distributions are actually functions $f: \mathbb{Z} \to \mathbb{R}$ We refer to these (mathematical) functions as *distributions* because the most frequent use case in Envision is to handle *probability distributions*, that is, strictly positive distributions that have a mass equal to 1.

Within Envision, distributions are materialized through a special data type named *distribution*. Other data types include *number* or *text*. The distribution data type exhibits relatively complex behaviors precisely because it is a function rather than a single value. For example, below, we generate a Dirac, that is a discrete function with a value 0 everywhere by for the point 42 where it is valued at one.

d := dirac(42)

Distributions can be exported into a file using the Ionic data files. However, distributions cannot be exported *as such* into CSV or Excel files.

Envision offers many more ways to generate distributions. They will be reviewed in the following sections.

## Plotting a distribution

Distributions can be visualized with histograms . Let’s consider a simple Poisson distribution:

This plot has been generated in Envision with the single-liner detailed below:

show histogram "My first distribution!" tomato with poisson(21)

The `histogram`

tile expects a single scalar distribution to be provided after the `with`

keyword.

## Point-wise operations

The simplest operations on distributions are known as *point-wise* operations. For example, let $f$ and $g$ represent two distributions $\mathbb{Z} \to \mathbb{R}$ Then, we can define the addition as:

$f+g: k \to f(k) + g(k)$

From Envision’s perspective, assuming that both `X`

and `Y`

are distribution vectors, the same operation can similarly be written as:

Z = X + Y

It must be noted that even when dealing with distributions, Envision remains a *vector* language. Hence, we are typically not processing a single distribution at a time, but a whole vector of distributions at once. The same operation can be performed from a scalar perspective using the following:

Z := X + Y

In this and the following sections, whenever we use `X`

and `Y`

in script examples, we assume that these two variables are actual distributions.

Then, the point-wise multiplication and subtraction are defined with:

$f \times g: k \to f(k) \times g(k)$

$f-g: k \to f(k)-g(k)$

which translates quite transparently into the following Envision syntax:

Z = X * Y Z = Z - Y

From the perspective that a number $\alpha$ can be implicitly assimilated to a constant function $f_{\alpha}: k \to \alpha$, Envision allows you to combine numbers and distributions - but only if the resulting distribution is compact.

// OK, it's compact Z = 2 * X // not dividing by zero is OK Z = X / 2 // incorrect, not a compact distribution Z = X + 1 // incorrect, Y is compact hence has zero values Z = X / Y

The distributions can also be shifted. The shift operator is typically written as:

$f_{n}: k \to f(k+n)$

The corresponding Envision syntax is:

Z = X << n // left shift Z = X >> n // right shift

Naturally, if `n`

is negative, then the shift operators keep working, but the left shift becomes a right shift, and *vice versa*.

## Generating distributions

There are multiple ways to create distributions. Lokad’s forecasting engine generates distributions for future lead times or future demand. When these distributions have been serialized as a grid (*), it is possible to regenerate the distribution through the `distrib()`

function. The relevant syntax is:

Demand = distrib(Id, G.Probability, G.Min, G.Max)

The resulting `Demand`

variable is a distribution. When the original grid includes segments that are longer than 1, `distrib()`

uniformly spreads the mass across the segment. The mass of the distribution is preserved by the `distrib()`

function.

The serialization of a distribution is the process of turning the distribution data into a regular tabular format which can be stored as a flat file. In order to handle the distribution as an actual distribution - and not as a table - we need to de-serialize the table first. This is exactly what is being done above with the

`distrib()`

function.

In addition, Envision also offers the possibility to generate a distribution directly from a set of observed numeric values. This is the purpose of the `ranvar()`

aggregator:

X = ranvar(Orders.Quantity)

The `ranvar()`

aggregator returns a *random variable* that matches the frequency observed in the aggregation groups. When there is nothing to aggregate, `ranvar()`

returns `dirac(0)`

.

Finally, it is possible to generate a distribution from a *time-series*, using the `ranvar.segment()`

aggregator.

D = ranvar.segment( // first date for a each item start: Items.Start // last date (inclusive) for each item end: Items.End // length of period for each item horizon: Items.Horizon // integer for skipping elements step: Items.Step // the date of each event date: Orders.Date // the quantity of each event quantity: Orders.Quantity)

It computes, for each item, the distribution of the sum of event quantities for periods of horizon length entirely between the first and last date for that item. Typically, the horizon length would be the leadtime of an item.

## Extending a distribution into a table

In the previous section, we have seen how a table could be aggregated into a distribution. The reverse process, i.e. extending a distribution into table lines, is also possible. In this section, we review the `extend.distrib()`

function which does precisely this. The syntax is illustrated as follows:

X = poisson(1) table G = extend.distrib(X) G.Probability = int(X, G.Min, G.Max) show table "My Grid" with Id G.Min G.Max G.Probability

where `X`

is the distribution vector generated on line 1 as a Poisson distribution. On line 2, the distributions are inflated into a table named `G`

(for *grid*). This table has an affinity `(Id, *)`

, and as illustrated on lines 3 to 7, the table is auto-populated with the numeric columns `G.Min`

and `G.Max`

. Both `G.Min`

and `G.Max`

are inclusive boundaries.

When extending relatively compact distributions, the resulting table typically contains lines of +1 increments - aka `G.Min`

and `G.Max`

increased by +1 from one line to the next. However, if we were to consider the extension of high valued distributions, for example `dirac(1000000)`

, then it would be extremely inefficient to generate millions of lines. Thus, the function `extend.distrib()`

will aggregate large distributions into thicker buckets. This explains why we have both `G.Min`

and `G.Max`

, which represent the inclusive boundaries of the bucket.

In order to gain more control over the granularity of the buckets generated, the function `extend.distrib()`

offers the first overload:

table G = extend.distrib(X, S)

where `S`

is a number vector. The resulting table provides buckets aligned with the segments [0;0] [1;S] [S+1; S+M] [S+M+1;S+2*M] … where `M`

is the default bucket size - also called the *multiplier*. This overload is typical of when the demand above the *total stock* needs to be considered.

Finally, the second overload of `extend.distrib()`

provides even more control with:

table G = extend.distrib(X, S, M)

where `M`

is a mandatory bucket size. If `M`

is zero, then the extension reverts the default bucket size, auto-adjusted by Envision. This second overload is particularly useful when *lot multipliers* are involved in the ordering process, as the demand needs to be batched into buckets of a specific size.

Beware that `extend.distrib(X, S, M)`

may fail depending on the capacity allocated to your Lokad account if you try to extend a high valued distribution while forcing a low multiplier.

## Convoluting distributions of probabilities

Convolutions represent a more advanced class of operations on distributions. The prime use cases of convolutions involve *random variables*. Unlike point-wise operations, convolutions have probabilistic interpretations such as summing or multiplying independent random variables. Convolutions can be recognized in Envision by their two-character operators ending in `*`

, namely:

// additive convolution Z = X +* Y // subtractive convolution, same as X +* reflect(Y) Z = X -* Y // multiplicative convolution Z = X ** Y // convolution power Z = X ^* Y

The additive (resp. the substractive) convolution can be interpreted as the sum (resp. the difference) of the two independent random variables $X+Y$ (resp. $X-Y$). The multiplicative convolution, also known as the Dirichlet convolution, can be interpreted as the product of two independent random variables.

The convolution power is more complex and represents:

$$X ^ Y = \sum_{k=0}^{\infty} X^k \mathbf{P}[Y=k] \text{ where } X^k = X + \dots + X \text{ ($k$ times)}$$

This last operation is of interest because of its relationship to the process leading to an integrated demand forecast, where $X$ represents the daily demand - assumed stationary - and where $Y$ represents the probabilistic lead times.

See also our page on convolution power.

## Historical developments

Lokad’s forecasting engine started delivering *quantile grids* in early 2015. These grids were not *exactly* probability distributions yet - merely interpolated quantile forecasts - but we were getting fairly close. Working with our clients, we started to realize the massive potential of applying probabilistic analysis to quantitative supply chain optimization. However, our *grids* were just that: big tables listing all the probabilities. And as these grids represented a breakthrough both for our clients and for our ourselves, we quickly realized that processing probabilities represented in the form of lists was no easy task.

The *algebra of distributions* represents a broad technological answer of Lokad to supply chain challenges that involve unknown futures. Indeed, those situations do not merely require a single median forecast, but a complete risk analysis for all possibilities. Envision embraces the idea that *all scenarios* should be considered, instead of just focusing on a hand few scenarios. For this purpose, Random variables can be introduced within Envision scripts and manipulated through operations specifically tailored for random variables, such as *convolutions* - more details in the following. In practice, the algebra of distributions is an elegant way to model complex supply chain situations where both the future demand and the future lead time are uncertain.