# Probabilistic demand forecasting

Probabilistic demand forecasts are a must-have whenever it comes to inventory optimization. Demand forecasts are said to be *integrated* over the lead time, when forecast values match the total demand over the lead time, as opposed to a classic forecasting perspective, where forecasts are periodic (typically per day, per week or per month) and agnostic of the lead times. Lokad’s forecasting engine delivers integrated probabilistic demand forecasts that take probabilistic lead times as an input. Lokad’s forecasting engine natively supports the various statistical patterns found in business data such as seasonality, trends and product lifecycles. Distortion of demand such as stock-outs or promotions is also taken into account. Lokad’s forecasts represent the expected probabilities for every single unit of future demand. In this section, we detail the syntax used for computing Lokad’s integrated demand forecasts.

## General syntax

The forecasting engine has a function dedicated to probabilistic forecasts. The syntax is the following:

```
Demand = forecast.demand(
category: C1, C2, C3, C4
hierarchy: H1, H2, H3, H4
label: PlainText
demandStartDate: LaunchDate
demandEndDate: EndDate
horizon: Leadtime
offset: 0
present: (max(Orders.Date) by 1) + 1
demandDate: Orders.Date
demandValue: Orders.Quantity
// 'SO' for stock-outs
censoredDemandDate: SO.StartDate, SO.EndDate
inflatedDemandDate: TVAds.StartDate, TVAds.EndDate
// 'Promos' for promotions
promotionDate: Promos.StartDate, Promos.EndDate
promotionDiscount: Promos.Discount
promotionCategory: Promos.Type)
```

Unlike regular functions, call functions have *named* arguments instead of *positional* arguments. These named arguments are more suitable for complex functions, because they make the source code much more readable - at the expense of limited extra verbosity. These arguments behave just like regular function arguments, thus they are permitted for Envision expressions.

The function returns a vector `Demand`

that is of type *distribution* (see also Algebra of Distributions). Distributions are an advanced data type that represents functions $p: \mathbb{Z} \to \mathbb{R}$. More specifically, the forecasting engine returns *random variables*, that is, distributions that are *positive* and have a *mass* equal to 1. In the present case, $p(k)$ represents the probability associated with a demand of $k$ units. Each item - in the Envision sense - becomes associated with its own demand distribution.

The full `forecast.demand`

syntax includes many arguments, however, only four of them are mandatory:

`present`

: a scalar date value`demandDate`

: a date vector with an*item*affinity`demandValue`

: a number vector with an*item*affinity`horizon`

: a distribution vector

The `present`

value is the date intended as the first day to be forecast, following the assumption that data is complete up to the day before. Some businesses may be closed on Sundays for example, and if the most recent date found in the dataset is a Saturday, there is an ambiguity as to whether the forecast should start on Sunday or Monday. In the illustrative syntax above, we use `max(Orders.Date) + 1`

, assuming that orders are observed every day, and that the input data is fresh from the day before.

The `demandDate`

and `demandValue`

are expected to belong to the same table which exhibits an item affinity, that is `[Id, *]`

in Envision terminology. The dates demonstrate when the demand was observed in the past. The values represent the scale of the demand - which is typically counted in *units* or *eaches*. Fractional demand values are not supported. This table contains the demand history being forecast by the forecasting engine. Ideally, the history length should be made as long as possible, although in practice there are limited benefits to exceeding 5 years’ worth of demand. The forecasting engine accommodates both short and long demand history alike, when this history is long, older data points simply fade into statistical irrelevance.

The `horizon`

represents the probabilistic lead time to be used when forecasting the demand. While the lead time is treated as an input when computing an integrated demand forecast, the lead time is typically also a forecast in itself. The forecasting engine offers the possibility to forecast lead times. The lead time forecast is decoupled from the demand forecast itself, because this offers the possibility to perform ad-hoc adjustments on the lead time distributions, before feeding them into the forecasting engine.

Beyond these mandatory arguments, the accuracy of the forecasts can be greatly improved by providing more data to the forecasting engine. The following sections explain this in more detail.

## Formal definition

In this section, we briefly detail the formal definition of the statistical operation performed by the forecasting engine when computing an integrated demand forecast.

Let $y(t)$ be the demand function and $t$ the time. Let the **integrated demand** $D$, associated with the random variable $\Lambda$ representing the lead times, be defined as follows:

$$ \text{D} : (y,\Lambda,t_0) \to \int_0^{\infty} \mathbf{P}[\Lambda=\lambda] \left( \int_{t_0}^{t_0+\lambda} y(t) dt \right) d\lambda $$

where $\mathbf{P}[\Lambda=\lambda]$ represents the probability for the lead time random variable $Lambda$ to be equal to $\lambda$. The demand is qualified as *integrated*, because it is an *integration over a probabilistic lead time*.

If $t_0$ represents the present date, then the demand is known - because observed - until the time $t_0$ but unknown afterwards. The purpose of the forecasting engine is to compute $\hat{D}(y, \Lambda)$, a probabilistic estimate of this future demand expressed as a random variable.

## Categories, hierarchy and label

Categories, hierarchy and plain-text labels play a very similar role from the forecasting engine perspective: they help the forecasting engine cope with sparse historical data.

See Forecasting with categories and a hierarchy.

## New product forecasting

From a forecasting perspective, a *new product* is a product that has not yet been sold. This represents a rather specific forecasting challenge because, by definition, there is no historical data associated with this new product. Our forecasting engine supports new product forecasting through the `demandStartDate`

argument. When historical start dates are known, it is advised to provide this information to the forecasting engine as it contributes to improving the forecasting accuracy, for both new and old products.

The `demandStartDate`

argument expects a date to be provided for each item. This date is intended to represent the first day when the demand becomes effective for this item. This date is in the past for items that have already been sold, and it remains in the future for items that are yet to be launched.

There are two distinct benefits to providing the `demandStartDate`

argument. Obviously, the first benefit consists of forecasting new items. In this case, it is usually also important to specify the `offset`

argument. Indeed, if the offset is kept at zero - its default value - then the period covered by the forecast might not overlap the *active* period for the item.

**Example.** Today is July 1st. The forecasting horizon is a Dirac distribution at 7 days; that is a constant lead time of 7 days. Product A is launched on July 15th - its start date. If the forecast is carried out today, then the forecast distribution for Product A is a Dirac at zero because, the horizon ends prior to the start date of Product A. In order to forecast the first week of demand for Product A, the offset for Product A should be set at 14 days.

The second benefit of specifying the `demandStartDate`

is to increase the forecasting accuracy for *all* items, and not just the items that are yet to be launched. Observing the first unit sold on the start date for a given item is not the same thing as observing the first unit sold six months after its launch. While the former case hints at steady upcoming sales, the latter hints at a very limited demand of only a handful units per year. The forecasting engine leverages the `demandStartDate`

argument to refine the demand forecasts for *all* the items.

## Censored and inflated demand

The intent is to forecast the *demand*. Yet, frequently, historical data only *approximates* the real demand, thus creating distortions (willingly or not). For example, historical data might be represented by historical sales. However, in the case of stock-outs, sales volumes drop while the demand itself might remain steady. Lokad’s forecasting engine is natively designed to take care of these distortions, and this is the very purpose of both the `censoredDemandDate`

and `inflatedDemandDate`

arguments. Both arguments expect a date vector of item affinity, that is `(Id, Date)`

in Envision terminology.

When a date for a given item is marked as *censored* through the `censoredDemandDate`

argument, the forecasting engine will assume that the demand is higher or equal to the observed value. The engine makes no assumptions on *how high* the demand would have been on this particular day as this value cannot ever be known to us. Yet, by pinpointing the bias, the engine can roll out entire classes of optimization tailored around this case. In practice, the most common occurrence for censored demand are *stock-outs* as observed through sales data that does not capture information on prospects who silently walk away while goods are missing.

Similarly, demand can be *inflated*. The `inflatedDemandDate`

argument offers the possibility to pinpoint the dates and the items where the demand should be considered as lower or equal to the observed demand. Again, the real demand remains unknowable, but pinpointing the bias is already very helpful to the forecasting engine. In practice, demand is inflated when there are temporary non-recurrent market boosts: for example, the exceptional victory of a local sports team in a national championship may impact very favorably on the sales of local supermarkets for a few days.

The two arguments `inflatedDemandDate`

and `censoredDemandDate`

can take one or two vectors as input. If two date vectors are provided, then, the pairs *(start, end)* are treated as inclusive segments, with the first date being the start of the segment, and the second date being the end of the segment. If only one date vector is provided, then, segments are considered to be 1-day long; the dates flag the exact days to be considered as inflated or censored.

If demand censorship or inflation are recurrent - per year, per week, etc. - then, there is no need to mark the demand as such, as the forecasting engine handles such patterns automatically.

## Forecasting promotions

The forecasting engine offers a native support for promotions. Providing data about promotions is optional. However, when promotions data is provided, promotions are expected to be specified both in the past and in the future. At the very least, the argument `promotionDate`

can be provided alone. The argument `promotionDate`

follows the same usage pattern as `censoredDemandDate`

: when a single date vector is provided, promotional periods are considered to be 1-day long, if two dates are provided, the first vector represents inclusive start dates, while the second represents inclusive end dates.

The `promotionDiscount`

argument is optional, and can be provided in order to help the forecasting engine gain insights about the *intensity* of a given promotion. A number vector is expected for this argument, and the forecasting engine treats this data as *ordinal* values: the greater the discount, the greater the expected promotional impact. In practice, it is the forecasting engine that computes the expected demand uplift based on uplifts observed for past promotions.

The `promotionCategory`

argument is also optional, and can be provided as a classification of promotional events. When provided, this argument is leveraged by the forecasting engine to test the affinities between promotional events, and to detect whether events marked within the same category achieve similar demand uplifts. This argument is very similar in spirit to the `category`

argument, except that it is applied to promotions instead of being applied to items.

**Caveat lector.** Promotions are notoriously difficult to forecast even with excellent historical data. Lokad’s experience indicates that most companies do not have *high accuracy* promotional data readily available. That being said, such data can be obtained through careful preparation at the later stages of a project. As rule of thumb, gathering promotional data that is good enough to actually improve a forecast’s accuracy requires significant amounts of effort. Feeding the forecasting engine with approximate promotional data only decreases the resulting accuracy.

When promotions data is provided, the periods relating to promotional activity should typically not be flagged through the argument `inflatedDemandDate`

. Flagging a period through both arguments `promotionDate`

and `inflatedDemandDate`

has a subtle semantic: it indicates that the promotional uplift has been inflated beyond what would be reasonably expected from a promotion, and the promotion itself would be considered as biased.