# Probabilistic demand forecasting

Probabilistic demand forecasts are a must-have whenever it comes to inventory optimization. Demand forecasts are said to be integrated over the lead time, when forecast values match the total demand over the lead time, as opposed to a classic forecasting perspective, where forecasts are periodic (typically per day, per week or per month) and agnostic of the lead times. Lokad’s forecasting engine delivers integrated probabilistic demand forecasts that take probabilistic lead times as an input. Lokad’s forecasting engine natively supports the various statistical patterns found in business data such as seasonality, trends and product lifecycles. Distortion of demand such as stock-outs or promotions is also taken into account. Lokad’s forecasts represent the expected probabilities for every single unit of future demand. In this section, we detail the syntax used for computing Lokad’s integrated demand forecasts.

## General syntax

The forecasting engine has a function dedicated to probabilistic forecasts. The syntax is the following:

Demand = forecast.demand(
category: C1, C2, C3, C4
hierarchy: H1, H2, H3, H4
label: PlainText
demandStartDate: LaunchDate
demandEndDate: EndDate
offset: 0
present: (max(Orders.Date) by 1) + 1
demandDate: Orders.Date
demandValue: Orders.Quantity
// 'SO' for stock-outs
censoredDemandDate: SO.StartDate, SO.EndDate
// 'Promos' for promotions
promotionDate: Promos.StartDate, Promos.EndDate
promotionDiscount: Promos.Discount
promotionCategory: Promos.Type)

Unlike regular functions, call functions have named arguments instead of positional arguments. These named arguments are more suitable for complex functions, because they make the source code much more readable - at the expense of limited extra verbosity. These arguments behave just like regular function arguments, thus they are permitted for Envision expressions.

The function returns a vector Demand that is of type distribution. Distributions are an advanced data type that represents functions $p: \mathbb{Z} \to \mathbb{R}$. More specifically, the forecasting engine returns ranvars, that is, distributions that are non-negative and have a mass equal to 1 (see also Algebra of ranvars). In the present case, $p(k)$ represents the probability associated with a demand of $k$ units. Each item - in the Envision sense - becomes associated with its own demand ranvar.

The full forecast.demand syntax includes many arguments, however, only four of them are mandatory:

• present: a scalar date value
• demandDate: a date vector with an item affinity
• demandValue: a number vector with an item affinity
• horizon: a ranvar

The present value is the date intended as the first day to be forecast, following the assumption that data is complete up to the day before. Some businesses may be closed on Sundays for example, and if the most recent date found in the dataset is a Saturday, there is an ambiguity as to whether the forecast should start on Sunday or Monday. In the illustrative syntax above, we use max(Orders.Date) + 1, assuming that orders are observed every day, and that the input data is fresh from the day before.

The demandDate and demandValue are expected to belong to the same table which exhibits an item affinity, that is [Id, *] in Envision terminology. The dates demonstrate when the demand was observed in the past. The values represent the scale of the demand - which is typically counted in units or eaches. Fractional demand values are not supported. This table contains the demand history being forecast by the forecasting engine. Ideally, the history length should be made as long as possible, although in practice there are limited benefits to exceeding 5 years’ worth of demand. The forecasting engine accommodates both short and long demand history alike, when this history is long, older data points simply fade into statistical irrelevance.

The horizon represents the probabilistic lead time to be used when forecasting the demand. While the lead time is treated as an input when computing an integrated demand forecast, the lead time is typically also a forecast in itself. The forecasting engine offers the possibility to forecast lead times. The lead time forecast is decoupled from the demand forecast itself, because this offers the possibility to perform ad-hoc adjustments on the lead time ranvar, before feeding them into the forecasting engine.

Beyond these mandatory arguments, the accuracy of the forecasts can be greatly improved by providing more data to the forecasting engine. The following sections explain this in more detail.

## Formal definition

In this section, we briefly detail the formal definition of the statistical operation performed by the forecasting engine when computing an integrated demand forecast.

Let $d(t)$ be the demand function at the day $t$. Let the integrated demand $D$, associated with the random variable $\Lambda$ representing the lead times, be defined as follows:

$$D : (d,\Lambda,t_1) \mapsto \sum_{j=1}^\infty {\mathbf{P}[\Lambda=j]} \left( \sum_{i=1}^{j} d(t_i) \right)$$

where $\mathbf{P}[\Lambda=j]$ represents the probability for the lead time random variable $\Lambda$ to be equal to $j$. The demand is qualified as integrated, because it is summed over a probabilistic lead time.

If $t_0$ represents the present date, then the demand is known - because observed - until the day $t_0$ but unknown afterwards. The purpose of the forecasting engine is to compute $\hat{D}(d, \Lambda,t_1)$, a probabilistic estimate of this future demand $D(d, \Lambda,t_1)$, expressed as a probability distribution (ranvar) of all possible demand scenarios.

## Categories, hierarchy and label

Categories, hierarchy and plain-text labels play a very similar role from the forecasting engine perspective: they help the forecasting engine cope with sparse historical data.

## New product forecasting

From a forecasting perspective, a new product is a product that has not yet been sold. This represents a rather specific forecasting challenge because, by definition, there is no historical data associated with this new product. Our forecasting engine supports new product forecasting through the demandStartDate argument. When historical start dates are known, it is advised to provide this information to the forecasting engine as it contributes to improving the forecasting accuracy, for both new and old products.

The demandStartDate argument expects a date to be provided for each item. This date is intended to represent the first day when the demand becomes effective for this item. This date is in the past for items that have already been sold, and it remains in the future for items that are yet to be launched.

There are two distinct benefits to providing the demandStartDate argument. Obviously, the first benefit consists of forecasting new items. In this case, it is usually also important to specify the offset argument. Indeed, if the offset is kept at zero - its default value - then the period covered by the forecast might not overlap the active period for the item.

Example. Today is July 1st. The forecasting horizon is a Dirac delta ranvar at 7 days; that is a constant lead time of 7 days. Product A is launched on July 15th - its start date. If the forecast is carried out today, then the forecast probability distribution for Product A is a Dirac delta ranvar at zero because, the horizon ends prior to the start date of Product A. In order to forecast the first week of demand for Product A, the offset for Product A should be set at 14 days.

The second benefit of specifying the demandStartDate is to increase the forecasting accuracy for all items, and not just the items that are yet to be launched. Observing the first unit sold on the start date for a given item is not the same thing as observing the first unit sold six months after its launch. While the former case hints at steady upcoming sales, the latter hints at a very limited demand of only a handful units per year. The forecasting engine leverages the demandStartDate argument to refine the demand forecasts for all the items.

## Censored and inflated demand

The intent is to forecast the demand. Yet, frequently, historical data only approximates the real demand, thus creating distortions (willingly or not). For example, historical data might be represented by historical sales. However, in the case of stock-outs, sales volumes drop while the demand itself might remain steady. Lokad’s forecasting engine is natively designed to take care of these distortions, and this is the very purpose of both the censoredDemandDate and inflatedDemandDate arguments. Both arguments expect a date vector of item affinity, that is (Id, Date) in Envision terminology.

When a date for a given item is marked as censored through the censoredDemandDate argument, the forecasting engine will assume that the demand is higher or equal to the observed value. The engine makes no assumptions on how high the demand would have been on this particular day as this value cannot ever be known to us. Yet, by pinpointing the bias, the engine can roll out entire classes of optimization tailored around this case. In practice, the most common occurrence for censored demand are stock-outs as observed through sales data that does not capture information on prospects who silently walk away while goods are missing.

Similarly, demand can be inflated. The inflatedDemandDate argument offers the possibility to pinpoint the dates and the items where the demand should be considered as lower or equal to the observed demand. Again, the real demand remains unknowable, but pinpointing the bias is already very helpful to the forecasting engine. In practice, demand is inflated when there are temporary non-recurrent market boosts: for example, the exceptional victory of a local sports team in a national championship may impact very favorably on the sales of local supermarkets for a few days.

The two arguments inflatedDemandDate and censoredDemandDate can take one or two vectors as input. If two date vectors are provided, then, the pairs (start, end) are treated as inclusive segments, with the first date being the start of the segment, and the second date being the end of the segment. If only one date vector is provided, then, segments are considered to be 1-day long; the dates flag the exact days to be considered as inflated or censored.

If demand censorship or inflation are recurrent - per year, per week, etc. - then, there is no need to mark the demand as such, as the forecasting engine handles such patterns automatically.

## Forecasting promotions

The forecasting engine offers a native support for promotions. Providing data about promotions is optional. However, when promotions data is provided, promotions are expected to be specified both in the past and in the future. At the very least, the argument promotionDate can be provided alone. The argument promotionDate follows the same usage pattern as censoredDemandDate: when a single date vector is provided, promotional periods are considered to be 1-day long, if two dates are provided, the first vector represents inclusive start dates, while the second represents inclusive end dates.

The promotionDiscount argument is optional, and can be provided in order to help the forecasting engine gain insights about the intensity of a given promotion. A number vector is expected for this argument, and the forecasting engine treats this data as ordinal values: the greater the discount, the greater the expected promotional impact. In practice, it is the forecasting engine that computes the expected demand uplift based on uplifts observed for past promotions.

The promotionCategory argument is also optional, and can be provided as a classification of promotional events. When provided, this argument is leveraged by the forecasting engine to test the affinities between promotional events, and to detect whether events marked within the same category achieve similar demand uplifts. This argument is very similar in spirit to the category argument, except that it is applied to promotions instead of being applied to items.

Caveat lector. Promotions are notoriously difficult to forecast even with excellent historical data. Lokad’s experience indicates that most companies do not have high accuracy promotional data readily available. That being said, such data can be obtained through careful preparation at the later stages of a project. As rule of thumb, gathering promotional data that is good enough to actually improve a forecast’s accuracy requires significant amounts of effort. Feeding the forecasting engine with approximate promotional data only decreases the resulting accuracy.

When promotions data is provided, the periods relating to promotional activity should typically not be flagged through the argument inflatedDemandDate. Flagging a period through both arguments promotionDate and inflatedDemandDate has a subtle semantic: it indicates that the promotional uplift has been inflated beyond what would be reasonably expected from a promotion, and the promotion itself would be considered as biased.