# Algebra of Ranvars

In Envision, a ranvar is a $\mathbb{Z} \to \mathbb{R^+}$ generalized function representing the probability distribution $\mathbf{P}$ of the values that an integer-valued random variable $X$ can assume, that is for $X: \mathbb{\Omega} \to \mathbb{Z}$, $\mathbf{P}[X = k]$.

Ranvars are powerful and useful mathematical objects for the modelling of many business situations, especially those where uncertainty exists. Envision treats ranvars as first-class citizens, and can handle a wide range of operations over them. All these operations are collectively referred to as the algebra of ranvars supported by Envision. In this section, we introduce the concept of ranvar data type and review the available options to generate ranvars and the various pertinent operations.

## The ranvar data type

Within Envision, ranvars are materialized through a special data type named distribution. Other data types include number or text.

read "/clean/Demand-Forecast.ion" as Demand[Id,*] with
Id : text
Rank : number
Demand : distribution

The ranvar data type exhibits relatively complex behaviours precisely because it is a function rather than a single value. For example, below, we generate a Dirac delta ranvar (closely related to the Dirac delta function), that is a discrete function with a value of 0 everywhere but in the given point (here, 42) where it is valued one.

R := dirac(42)

Envision offers many more ways to generate ranvars, as we will review in the following sections.

The ranvar data-type can be exported into a file uniquely using the Ionic file format. CSV or Excel files do not support the ranvar data-type. However, a text representation of a ranvar, that can be read by a text editor, can be exported through the function spark.

## Plotting a ranvar

Ranvars can be visualized in Envision with histograms. Let’s consider a simple ranvar following a Poissonian law. This can be plotted by:

show histogram "My first distribution!" tomato with poisson(21)

Please note that the histogram tile expects a single scalar ranvar to be provided after the with keyword.

## Generating ranvars

Envision offers the possibility of generating several parametric ranvars, thanks to specific built-in functions (see example above). A full list of the available probability laws is provided in section Parametric ranvars.

On the other hand, non-parametric ranvars can be generated from a table. The dataset can consist of a single column of observed values or a grid.

### Generating a ranvar from a vector of observed values

The ranvar(v1) aggregator returns a ranvar that matches the observed frequencies provided in the input vector v1:

R := ranvar(Orders.Quantity)
QuantityByCategory = ranvar(Orders.Quantity) by Orders.Category

The [ranvar(v1,v2)] aggregator returns the empirical distribution of the associated random variable, using the numeric observations provided in the input vector v1 along with the weights contained in vector v2.

When there is nothing to aggregate, ranvar() returns dirac(0).

The ranvar.segment() aggregator allows to generate a ranvar from a time-series:

D = ranvar.segment(
start: Items.Start // first date for a each item
end: Items.End // last date (inclusive) for each item
step: Items.Step // integer for skipping elements
horizon: Items.Horizon // length of period for each item
date: Orders.Date // the date of each event
quantity: Orders.Quantity) // the quantity of each event

It computes, for each item, the ranvar of the sum of event quantities for periods of horizon length entirely between the first and last date. Typically, the horizon length of interest is the lead time of an item.

### Generating a ranvar from a grid

In Envision, a grid is a table containing a list of buckets and a value for each one of them. The default columns of a grid are the item Id, the inclusive boundaries of each bucket Min and Max, and, of course, the value associated to the bucket. Typically, grids are generated by expanding Lokad’s forecasts for future lead times or future demand, thanks to the function extend.distrib(). Once these ranvars have been serialized as a grid G, it is possible to regenerate the ranvar using the function distrib(). The relevant syntax is:

Demand = distrib(Id, G.Value, G.Min, G.Max)

The resulting variable Demand is a ranvar (mass is normalized to 1, if not already the case). As a final remark, when the original grid includes segments that are longer than 1, distrib() uniformly spreads the mass across the segment, to preserve the unitary mass.

## Extending a ranvar into a table

In the previous section, we have seen how a table can be aggregated into a ranvar. The reverse process, i.e. extending a ranvars into table lines, is also made possible, thanks to the extend.distrib() function. The syntax is illustrated as follows:

R = poisson(1)
table G = extend.distrib(R)
G.Probability = int(R, G.Min, G.Max)
show table "My Grid" with
Id
G.Min
G.Max
G.Probability

where R is the ranvar vector generated on line 1 as a Poisson ranvar into the Items table. On line 2, the ranvars are inflated into a table named G (for grid). This table has an affinity (Id, *), and the value associated with each bucket of the ranvar (for every Id) can be easily calculated by integrating R between the bucket’s inclusive boundaries G.Min and G.Max.

When extending relatively compact ranvars, the resulting table typically contains lines of +1 increments - aka G.Min is equal to G.Max for each one of the lines and they increase by 1 from one line to the next. However, if we were to consider the extension into a table of ranvars with a large support, for example dirac(1000000), then it would be extremely inefficient to generate millions of lines. Thus, the function extend.distrib() will aggregate large ranvars into thicker buckets, the amplitude of the buckets being 1 when G.Min is close to zero and slowly increasing with G.Min. This explains why both G.Min and G.Max are native columns in a grid.

In order to gain more control over the granularity of the buckets generated through extend.distrib(), it is possible to specify several optional parameters:

table G = extend.distrib(R, S, M, A)

where S, M and A are number vectors.

The first available overload is the amplitude S of the very first bucket (the one coming after the special bucket [0;0]). When S is provided, the resulting table contains buckets corresponding to the segments [0;0], [1;S], [S+1; S+1], [S+2;S+2], … M is the default bucket size - also called the multiplier. When S and M are provided, the extended table contains the buckets [0;0], [1;S], [S+1; S+M], [S+M+1;S+2*M], … Finally, A is the minimum value of G.Max that is required to exist in the grid. Indeed, extend.distrib() creates as many buckets in the grid as needed to represent the full ranvar (its domain subset, also called support). However, it might be necessary to have some more units accessible in the list. The A parameter forces the creation of such additional lines.

Typically, while expanding Lokad’s demand forecast into a grid for the calculation of a purchase suggestion list, S represents the stock available (only additional units are of interest), M is a lot multiplier (the item can be purchased only by batches of M units) and A is an MOQ (Minimal Order Quantity that can be bought, imposed by the supplier).

Beware that extend.distrib(X, S, M, A) may fail depending on the capacity allocated to your Lokad account if you try to extend a high valued ranvar while forcing a low multiplier.

## Transformation of ranvars

As explained in the List of Functions reference, Envision supports several functions that allow to transform ranvars into other ranvars. Here we provide a graphical representation of such operations. Transformations can also be applied to a couple of ranvars or more, as shown in the following example, where we consider the additional ranvars Q and R, (respectively the Poissonian centred in 5 and (a poisson ranvar shifted by 15, as it will be described below in Convoluting ranvars).

If we consider again the Poissonian centered in 21, the available transformations are:

### Stretching ranvars

As illustrated in the image above, the Envision function transform(P,a) allows to stretch or shrink a ranvar along its support. The resulting ranvar conserves the shape of the original one, given as the input, by approximating through interpolation $k \mapsto P(k / a)$. In this section we address the details of this calculation.

Since we are working with discrete ranvars $P : \mathbb{Z} \to \mathbb{R}$ and a shrink could lead to non integer values, the transform(P,a) function is not formally defined. Let us consider separately the case x<1 and x>1.

If x<1, the ranvar incurs a shrinking. This is calculated by considering the ranvar as a step function $\mathbb{R} \to \mathbb{R}$, where a step is centred on the centre of each bucket of the ranvar and extends by 0.5 beyond the bucket boundaries: if the bucket is bounded by $\text{Inf}$ and $\text{Sup}$, the associated step covers the segment $[\text{Inf}- 0.5, \text{Sup} + 0.5]$ (e.g. $[1.5, 2.5]$ for bucket $2$). Once the contraction is applied to this step function, a ranvar is computed by integrating the step function over $[\text{Inf}- 0.5, \text{Sup} + 0.5]$ for each bucket.

If x>1, the ranvar is subjected to a dilation. In order to avoid that the probability density of the $0$ bucket spreads over the neighbouring buckets, the value of the step centred on $0$ is put to $0$. After the expansion of the step function, the original value of the $0$ bucket is restored in the reconstructed ranvar.

This approach preserves the relation a*mean(P) = mean(transform(P,a)) and the character of the ranvar: the dirac(0) ranvar cannot extend into anything else than dirac(0).

### Smoothing ranvars

Many ranvars are produced empirically by merely pilling up the observations. This is what the ranvar() aggregator is doing. This leads to overfitting issues, usually observed as excessively spiky ranvars. The map function smoothreturns a smooth variant of the original ranvar and mitigates this overfitting issue.

SmoothedRanvar := smooth(Ranvar) // smoothing a ranvar, e.g., a demand distribution

Nota: The smooth function is relatively expensive compute-wise, similar to a convolution (detailed in the following sub-section). Thus, should be used with wisdom and moderation.

## Convoluting ranvars

Convolutions represent a more advanced class of operations on ranvars. Unlike point-wise operations, convolutions have probabilistic interpretations: the ranvar of the sum of two independent variables corresponds to the convolution of their ranvars.

Convolutions are coded in Envision as the two-character operators ending in *, namely:

S = Q +* R // additive convolution
S = Q -* R // subtractive convolution, same as Q +* reflect(R)
S = Q ** R // multiplicative convolution
S = Q ^* R // convolution power

where $Q$ and $R$ are the ranvars respectively associated to the two independent random variables $X$ and $Y$. Few examples are provided in the following image:

The additive (resp. the substractive) convolution can be interpreted as the ranvar of the sum (resp. the difference) of the two independent random variables $X$ and $Y$. Please note that the additive convolution with a Dirac delta ranvar can be used to apply a rigid translation over a ranvar:

S = Q +* dirac(15) // returns Q shifted by +15

The multiplicative convolution, also known as the Dirichlet convolution, can be interpreted as the ranvar of the product of the two independent random variables $X$ and $Y$. Similarly, the convolution power of a ranvar with exponent $k \in \mathbb{N}$ represents the ranvar of the sum of $k$ independent and identically distributed random variables: $Q^k = Q * \dots * Q \text{ (}k \text{ times})$. The convolution power of two independent random variables is more complex and represents:

$$Q ^ R = \sum_{k=0}^{\infty} Q^k R(k)$$

where $R(k) = \mathbf{P}[Y = k]$. See also our page on convolution power.

This last operation is of particular interest because of its relationship to the process leading to an integrated demand forecast, where $X$ represents the daily demand - assumed stationary - and where $Y$ represents the probabilistic lead time.

As a final remark, we provide here some examples of equivalent writings, that follow from the different commutative, associative and distributive properties of convolutions:

S = (Q ^* a) ^* b // is equal to
S = Q ^* (a * b)

M1 = mean(Q +* R)  // is equal to
M1 = mean(Q) + mean(R)

V1 = variance(Q +* R)  // is equal to
V1 = variance(Q) + variance(R)

M1 = mean(Q ^* 3)  // is equal to
M1 = mean(Q) * 3

V2 = variance(Q ^* 3)  // is equal to
V2 = variance(Q) * (3^2)

The choice of one among theoretically equivalent writings is important in terms of accuracy of the result. For example:

M = sum(mean(T.Ranvars))  // is more accurate than
M = mean(sumr(T.Ranvars))

## Historical developments

Lokad’s forecasting engine started delivering quantile grids in early 2015. These grids were not exactly probability distributions yet - merely interpolated quantile forecasts - but we were getting fairly close. Working with our clients, we started to realize the massive potential of applying probabilistic analysis to quantitative supply chain optimization. However, our grids were just that: big tables listing all the probabilities. And as these grids represented a breakthrough both for our clients and for our ourselves, we quickly realized that processing probabilities represented in the form of lists was no easy task.

The algebra of ranvars represents a broad technological answer of Lokad to supply chain challenges that involve unknown futures. Indeed, those situations do not merely require a single median forecast, but a complete risk analysis for all possibilities. Envision embraces the idea that all scenarios should be considered, instead of just focusing on a hand few scenarios. For this purpose, random variables can be introduced within Envision scripts and manipulated through specifically tailored operations, such as convolutions - more details in the following. In practice, the algebra of ranvars is an elegant way to model complex supply chain situations where both the future demand and the future lead time are uncertain.