Stock reward function
The stock reward function asks a concrete question: what do we gain, in expected economic terms, by holding one more unit in stock? It turns demand uncertainty into a unit-by-unit return curve, which is exactly what prioritized ordering policies need. The point is not just service level, but the full balance of margin, stock-out penalties, and carrying cost.
Table of contents
Economic variables and the decision focus
Demand probabilities alone do not drive decisions. Two items with the same forecast can still require different stock because their economics differ. The core variables are:
M: gross margin for one unit sold.S: stock-out penalty for one unit missed (negative).C: carrying cost for one unit left in stock (negative).
These three variables encode the trade-off between upside and downside. Without them, any policy is blind to the cost of overstock and the cost of not serving.
The recursive definition
The stock reward is not just about the next period. Unsold units survive into later periods, which means rewards and costs accumulate. A simplified definition is:
a discounts future rewards, and R* is the same function with S = 0, so the
current decision is not held responsible for future stock-outs beyond its lead
time window.
In practice we estimate $\hat{R}$ by weighting all demand outcomes with their
probabilities. Envision implements this estimate with stockrwd.
Marginal reward components
stockrwd returns a zedfunc over stock positions. It is linear in M, S,
and C, so the function is decomposed into three components and recomposed.
table Params = with
[| as DemandMean, as M, as S, as C, as AM, as AC |]
[| 4, 12, -8, -1, 0.3, 0.98 |]
Params.Demand = poisson(Params.DemandMean)
Params.RM = stockrwd.m(Params.Demand, Params.AM) * Params.M
Params.RS = stockrwd.s(Params.Demand) * Params.S
Params.RC = stockrwd.c(Params.Demand, Params.AC) * Params.C
Params.R = Params.RM + Params.RS + Params.RC
table Pos = extend.range(10 into Params)
Pos.RM = valueAt(Params.RM, Pos.N)
Pos.RS = valueAt(Params.RS, Pos.N)
Pos.RC = valueAt(Params.RC, Pos.N)
Pos.R = valueAt(Params.R, Pos.N)
show table "Marginal reward by stock position" with
Pos.N as "k"
Pos.RM
Pos.RS
Pos.RC
Pos.R
Example output:
| k | RM | RS | RC | R |
|---|---|---|---|---|
| 1 | 11.84529 | 7.853472 | -0.01865078 | 19.68011 |
| 2 | 11.22306 | 7.26736 | -0.09461749 | 18.3958 |
| 3 | 9.964832 | 6.095135 | -0.2521049 | 15.80786 |
| 4 | 8.250221 | 4.532169 | -0.4773048 | 12.30509 |
| 5 | 6.460741 | 2.969204 | -0.7341862 | 8.69576 |
| 6 | 4.907142 | 1.718832 | -0.9934734 | 5.632502 |
| 7 | 3.704208 | 0.8852501 | -1.244492 | 3.344967 |
| 8 | 2.817865 | 0.4089179 | -1.488946 | 1.737838 |
| 9 | 2.16367 | 0.1707516 | -1.731039 | 0.6033821 |
| 10 | 1.668751 | 0.06489992 | -1.97279 | -0.2391396 |
The first units are strongly positive, then the marginal reward declines and eventually turns negative because carrying costs accumulate without bound.
Backorders shift the demand mass
Backorders are guaranteed demand. They shift the demand distribution to the
right by the backordered quantity. This is why dirac() is used to encode them.
table Params = with
[| as DemandMean, as Backorder |]
[| 4, 3 |]
Params.Demand = poisson(Params.DemandMean)
Params.DemandBO = Params.Demand + dirac(Params.Backorder)
Params.Cdf = cdf(Params.Demand)
Params.CdfBO = cdf(Params.DemandBO)
table Pos = extend.range(10 into Params)
Pos.Cdf = valueAt(Params.Cdf, Pos.N)
Pos.CdfBO = valueAt(Params.CdfBO, Pos.N)
show table "Demand shift" with
Pos.N as "k"
Pos.Cdf
Pos.CdfBO
Example output:
| k | Cdf | CdfBO |
|---|---|---|
| 1 | 0.09158002 | 0 |
| 2 | 0.2381081 | 0 |
| 3 | 0.4334788 | 0.018316 |
| 4 | 0.6288494 | 0.09158003 |
| 5 | 0.7851461 | 0.238108 |
| 6 | 0.8893437 | 0.4334788 |
| 7 | 0.9488853 | 0.6288494 |
| 8 | 0.9786561 | 0.7851461 |
| 9 | 0.9918875 | 0.8893437 |
| 10 | 0.9971801 | 0.9488853 |
The shift is visible: the backorder distribution is the original distribution translated by three units.
Visual intuition and support
The stock reward curve is built from the same demand mass. When backorders are present, the demand mass is shifted, and so are the reward components. The support of the calculation must expand to include shifted demand and any ordering constraints (such as MOQs) that push inventory above the high-demand tail. This is why support extension matters even when demand probabilities are already near zero.
Practical limits and contexts
The model is minimal by design, but it is not universal. If purchase history is
truncated, stock valuation skews toward newer costs. If stock-outs are replayed
from raw histories, tiny data gaps often create spurious residuals. For serial
tracked items, FIFO-style approximations are too coarse. The stock reward
remains a strong default, but it still needs domain-specific values for M,
S, and C.