Stock reward function

The stock reward function asks a concrete question: what do we gain, in expected economic terms, by holding one more unit in stock? It turns demand uncertainty into a unit-by-unit return curve, which is exactly what prioritized ordering policies need. The point is not just service level, but the full balance of margin, stock-out penalties, and carrying cost.

Table of contents

Economic variables and the decision focus

Demand probabilities alone do not drive decisions. Two items with the same forecast can still require different stock because their economics differ. The core variables are:

These three variables encode the trade-off between upside and downside. Without them, any policy is blind to the cost of overstock and the cost of not serving.

The recursive definition

The stock reward is not just about the next period. Unsold units survive into later periods, which means rewards and costs accumulate. A simplified definition is:

$$ R(t, k)= \begin{cases} kM+(y_t-k)S & \text{if } y_t \geq k \\ y_tM+(k-y_t)C + a R^{*}(t+1, k-y_t) & \text{if } y_t < k \end{cases} $$

a discounts future rewards, and R* is the same function with S = 0, so the current decision is not held responsible for future stock-outs beyond its lead time window.

In practice we estimate $\hat{R}$ by weighting all demand outcomes with their probabilities. Envision implements this estimate with stockrwd.

Marginal reward components

stockrwd returns a zedfunc over stock positions. It is linear in M, S, and C, so the function is decomposed into three components and recomposed.

table Params = with
  [| as DemandMean, as M, as S, as C, as AM, as AC |]
  [| 4, 12, -8, -1, 0.3, 0.98 |]

Params.Demand = poisson(Params.DemandMean)

Params.RM = stockrwd.m(Params.Demand, Params.AM) * Params.M
Params.RS = stockrwd.s(Params.Demand) * Params.S
Params.RC = stockrwd.c(Params.Demand, Params.AC) * Params.C
Params.R = Params.RM + Params.RS + Params.RC

table Pos = extend.range(10 into Params)
Pos.RM = valueAt(Params.RM, Pos.N)
Pos.RS = valueAt(Params.RS, Pos.N)
Pos.RC = valueAt(Params.RC, Pos.N)
Pos.R = valueAt(Params.R, Pos.N)

show table "Marginal reward by stock position" with
  Pos.N as "k"
  Pos.RM
  Pos.RS
  Pos.RC
  Pos.R

Example output:

k RM RS RC R
1 11.84529 7.853472 -0.01865078 19.68011
2 11.22306 7.26736 -0.09461749 18.3958
3 9.964832 6.095135 -0.2521049 15.80786
4 8.250221 4.532169 -0.4773048 12.30509
5 6.460741 2.969204 -0.7341862 8.69576
6 4.907142 1.718832 -0.9934734 5.632502
7 3.704208 0.8852501 -1.244492 3.344967
8 2.817865 0.4089179 -1.488946 1.737838
9 2.16367 0.1707516 -1.731039 0.6033821
10 1.668751 0.06489992 -1.97279 -0.2391396

The first units are strongly positive, then the marginal reward declines and eventually turns negative because carrying costs accumulate without bound.

Backorders shift the demand mass

Backorders are guaranteed demand. They shift the demand distribution to the right by the backordered quantity. This is why dirac() is used to encode them.

table Params = with
  [| as DemandMean, as Backorder |]
  [| 4, 3 |]

Params.Demand = poisson(Params.DemandMean)
Params.DemandBO = Params.Demand + dirac(Params.Backorder)

Params.Cdf = cdf(Params.Demand)
Params.CdfBO = cdf(Params.DemandBO)

table Pos = extend.range(10 into Params)
Pos.Cdf = valueAt(Params.Cdf, Pos.N)
Pos.CdfBO = valueAt(Params.CdfBO, Pos.N)

show table "Demand shift" with
  Pos.N as "k"
  Pos.Cdf
  Pos.CdfBO

Example output:

k Cdf CdfBO
1 0.09158002 0
2 0.2381081 0
3 0.4334788 0.018316
4 0.6288494 0.09158003
5 0.7851461 0.238108
6 0.8893437 0.4334788
7 0.9488853 0.6288494
8 0.9786561 0.7851461
9 0.9918875 0.8893437
10 0.9971801 0.9488853

The shift is visible: the backorder distribution is the original distribution translated by three units.

Visual intuition and support

Stock reward analysis

The stock reward curve is built from the same demand mass. When backorders are present, the demand mass is shifted, and so are the reward components. The support of the calculation must expand to include shifted demand and any ordering constraints (such as MOQs) that push inventory above the high-demand tail. This is why support extension matters even when demand probabilities are already near zero.

Practical limits and contexts

The model is minimal by design, but it is not universal. If purchase history is truncated, stock valuation skews toward newer costs. If stock-outs are replayed from raw histories, tiny data gaps often create spurious residuals. For serial tracked items, FIFO-style approximations are too coarse. The stock reward remains a strong default, but it still needs domain-specific values for M, S, and C.

User Contributed Notes
0 notes + add a note