noGrad
noGrad(a: number) 🡒 number, const autodiff pure function
Returns the specified number but without propagating gradient through this usage.
autodiff Scalar epochs:100 with
params A auto(1,0) // A starts at '1.0'
Loss = A - noGrad(A)
return Loss
show scalar "A" with A // A ends at '0.0'
The loss of this autodiff
block is equal to zero, but the gradient of the parameter A
is equal to 1
.
Examples
A simple linear regression models the relationship between dependent variable $y$ and the co-variate $x$ as a straight line
[y = slope * x + intercept]
In the following example the Mass
$y$ (grams) of a chemical is related to the Time
$x$ (seconds), for which the chemical reaction has been taking place. Given the data in table Data
, the expected results are slope ~ 12 and intercept ~ 12 ( Data source ).
To later show the impact of the noGrad
operation over the gradient’s flow, we fit a line to the data as a control case.
table Data = with
[| 5 as Time, 40 as Mass |]
[| 7, 120 |]
[| 12, 180 |]
[| 16, 210 |]
[| 20, 240 |]
epochs = 1000
slope = 0
intercept = 1
autodiff Data epochs:epochs learningRate: 0.01 with
params slope
params intercept
pred = slope * Data.Time + intercept
return (Data.Mass - pred)^2
Data.Control_Pred = slope * Data.Time + intercept
show scalar "Slope" with slope
show scalar "Intercept" with intercept
show chart "Linear Regression" with
scatter Data.Time
Data.Mass {color: "gray"}
plotxy Data.Time
Data.Control_Pred as "Pred" {color: "blue"}
order by Data.Time
The loss function should decline gracefully to the lowest possible value. We could not reach zero because a straight line is not the best model for the current example, hence the un-modelled dynamics lead to an irreducible error. The results should be in line with the expected ones.
Now imagine that we know the intercept should be located around a certain value, ~12 in this case. We can use the noGrad
operation to limit the gradient if the value of
the parameter is over the desired value.
slope = 0
intercept = 1
autodiff Data epochs:epochs learningRate: 0.01 with
params slope
params intercept
currIntercept = if intercept < 12 then intercept else noGrad(intercept)
pred = slope * Data.Time + currIntercept
return (Data.Mass - pred)^2
Data.NoGrad_Pred = slope * Data.Time + intercept
show scalar "Slope" with slope
show scalar "Intercept" with intercept
show chart "Linear Regression" with
scatter Data.Time
Data.Mass {color: "gray"}
plotxy Data.Time
Data.NoGrad_Pred as "Pred" {color: "red"}
order by Data.Time
The slope remains practically unchanged, while the intercept rests around the desired value.
show chart "Residual" with
scatter Data.Time
Data.Mass {color: "gray"}
plotxy Data.Time
(Data.Mass - Data.Control_Pred) as "Control" {color: "blue"}
(Data.Mass - Data.NoGrad_Pred) as "Pred" {color: "red"}
order by Data.Time
From a residual point of view, no big changes are appreciable. The loss should still decrease gracefully while reaching a value a bit lower than the control. An irreducible error is still present.
A simple linear regression is generally used as a base model for the trend component of a time series forecasting. However we know well that we should take into account seasonal changes.
In a basic seasonal regression model, we see the dependent variable $y$ as the interaction between trend and season
[y = f(trend, season)]
where $f(trend, season)$ indentifies the way in which the two components interacts.
An additive model will have the form
[y = trend + season]
while a multiplicative one has
[y = trend * season]
We can treat the two cases together using a log-transformation and dealing always with an additive model
[log(y) = trend + season]
In the following example Company XYZ’s quarterly revenues, for 2012 through 2015, are stored in table QuarterlyData
. Some indicator variables $Q1$, $Q2$, $Q3$ had already been introduced, they will come at hand later when defining the model ( Data source ).
As we did for the simple linear regression above, we fit the model to the given dataset as a control case.
table QuarterlyData = with
[| 2012 as Year, 1 as Quarter, 10.5 as Revenue, 1 as t, 1 as Q1, 0 as Q2, 0 as Q3 |]
[| 2012 , 2 , 9.2 , 2 , 0 , 1 , 0 |]
[| 2012 , 3 , 13.1 , 3 , 0 , 0 , 1 |]
[| 2012 , 4 , 16 , 4 , 0 , 0 , 0 |]
[| 2013 , 1 , 13.6 , 5 , 1 , 0 , 0 |]
[| 2013 , 2 , 12.2 , 6 , 0 , 1 , 0 |]
[| 2013 , 3 , 15.6 , 7 , 0 , 0 , 1 |]
[| 2013 , 4 , 19.4 , 8 , 0 , 0 , 0 |]
[| 2014 , 1 , 15.9 , 9 , 1 , 0 , 0 |]
[| 2014 , 2 , 14.7 , 10 , 0 , 1 , 0 |]
[| 2014 , 3 , 18.3 , 11 , 0 , 0 , 1 |]
[| 2014 , 4 , 20.5 , 12 , 0 , 0 , 0 |]
[| 2015 , 1 , 16.6 , 13 , 1 , 0 , 0 |]
[| 2015 , 2 , 15.7 , 14 , 0 , 1 , 0 |]
[| 2015 , 3 , 20 , 15 , 0 , 0 , 1 |]
[| 2015 , 4 , 23.3 , 16 , 0 , 0 , 0 |]
table QuarterlyParams = by QuarterlyData.Quarter
slope = 0.5
intercept = 0
autodiff QuarterlyData epochs:epochs learningRate: 0.005 with
params QuarterlyParams.Q abstract auto(0.5, 1)
params q1 = QuarterlyParams.Q[QuarterlyData.Quarter]
params q2 = QuarterlyParams.Q[QuarterlyData.Quarter]
params q3 = QuarterlyParams.Q[QuarterlyData.Quarter]
params slope
params intercept
trend = slope * QuarterlyData.t + intercept
season = QuarterlyData.Q1 * q1 + QuarterlyData.Q2 * q2 + QuarterlyData.Q3 * q3
pred = trend + season
return (log(QuarterlyData.Revenue) - pred)^2
QuarterlyData.bQ1 = QuarterlyParams.Q * QuarterlyData.Q1
QuarterlyData.bQ2 = QuarterlyParams.Q * QuarterlyData.Q2
QuarterlyData.bQ3 = QuarterlyParams.Q * QuarterlyData.Q3
QuarterlyData.Control_LogPred =
slope * QuarterlyData.t + intercept
+ QuarterlyData.Q1 * QuarterlyData.bQ1
+ QuarterlyData.Q2 * QuarterlyData.bQ2
+ QuarterlyData.Q3 * QuarterlyData.bQ3
///
/// Recall that by exponentiating the predicted value of log(y), the untransformed value
/// provides an estimate of the median response at the given values of the co-variates.
/// Contrart, standard regression provides an estimate for the mean response at given
/// values of the co-variates.
/// If prediction of the mean is desired, then an adjustment factor must be applied to the
/// untranformed prediction.
/// The adjustment is accomplished by multiplying the untransformed prediction by
/// exp(s^2/2), where s is the regression standard error from the log-based regression.
///
QuarterlyData.Control_LogAdjustment =
(log(QuarterlyData.Revenue) - QuarterlyData.Control_LogPred)^2
logAdjustment =
(
sum(QuarterlyData.Control_LogAdjustment)
/ ((last(QuarterlyData.t) sort QuarterlyData.t) - 1)
) / 2
QuarterlyData.Control_Pred = exp(QuarterlyData.Control_LogPred) * exp(logAdjustment)
show scalar "Slope" with slope
show scalar "Intercept" with intercept
show table "Control Season" with
QuarterlyData.Quarter
QuarterlyData.Revenue
QuarterlyData.t
QuarterlyData.bQ1
QuarterlyData.bQ2
QuarterlyData.bQ3
show table "Quartertly Params" with
QuarterlyParams.Q
show plot "Seasonal Regression" with
QuarterlyData.t
QuarterlyData.Revenue {color: "gray" ; seriesPattern: dotted}
QuarterlyData.Control_Pred as "Pred" {color: "blue"}
Now imagine that we need to optimize the 2nd quarter factor in one case, or take the average between the 1st and 3rd quarter otherwise. In the following we discriminate regarding the epochs - $\frac{1}{4}$ epochs we optimize, $\frac{3}{4}$ we average - but we could have used other conditions.
We keep the gradient flowing through the branching introduced by the if ... then ... else
. For $\frac{3}{4}$ of the training, we are taking the average between $Q1$ and $Q3$ as value for $Q2$. This implies that $Q1$ and $Q3$ will be influenced by 2nd quarter data.
slope = 0.5
intercept = 0
autodiff QuarterlyData epochs:epochs learningRate: 0.005 with
params QuarterlyParams.Q abstract auto(0.5, 1)
params q1 = QuarterlyParams.Q[QuarterlyData.Quarter]
params q2 = QuarterlyParams.Q[QuarterlyData.Quarter]
params q3 = QuarterlyParams.Q[QuarterlyData.Quarter]
params slope
params intercept
trend = slope * QuarterlyData.t + intercept
q2 = if (epoch < (epochs/4)) then q2 else (q1 + q3 / 2)
season = QuarterlyData.Q1 * q1 + QuarterlyData.Q2 * q2 + QuarterlyData.Q3 * q3
pred = trend + season
return (log(QuarterlyData.Revenue) - pred)^2
QuarterlyData.bQ1 = QuarterlyParams.Q * QuarterlyData.Q1
QuarterlyData.bQ2 = QuarterlyParams.Q * QuarterlyData.Q2
QuarterlyData.bQ3 = QuarterlyParams.Q * QuarterlyData.Q3
QuarterlyData.AvgQ1Q3_LogPred =
slope * QuarterlyData.t + intercept
+ QuarterlyData.Q1 * QuarterlyData.bQ1
+ QuarterlyData.Q2 * QuarterlyData.bQ2
+ QuarterlyData.Q3 * QuarterlyData.bQ3
QuarterlyData.AvgQ1Q3_LogAdjustment =
(log(QuarterlyData.Revenue) - QuarterlyData.AvgQ1Q3_LogPred)^2
logAdjustment =
(
sum(QuarterlyData.AvgQ1Q3_LogAdjustment)
/ ((last(QuarterlyData.t) sort QuarterlyData.t) - 1)
) / 2
QuarterlyData.AvgQ1Q3_Pred = exp(QuarterlyData.AvgQ1Q3_LogPred) * exp(logAdjustment)
show scalar "Slope" with slope
show scalar "Intercept" with intercept
show table "Season" with
QuarterlyData.Quarter
QuarterlyData.Revenue
QuarterlyData.t
QuarterlyData.bQ1
QuarterlyData.bQ2
QuarterlyData.bQ3
show table "Quartertly Params" with
QuarterlyParams.Q
show plot "Seasonal Regression" with
QuarterlyData.t
QuarterlyData.Revenue {color: "gray" ; seriesPattern: dotted}
QuarterlyData.AvgQ1Q3_Pred as "Pred" {color: "red"}
Comparing the final parameter results, only the $Q2$ factor is off due to the averaging process. Inspecting the loss function we should be able to identify a small, almost imperceptible bump, around epoch 249, when we switch from optimization to averaging. Letting the gradient flow the system was able to recover the right values for non-constrained model parameters.
We keep the same scenario: $\frac{1}{4}$ epochs we optimize $Q2$, $\frac{3}{4}$ epochs we average $Q2 = \frac{(Q1 + Q3)}{2}$. This time we use the noGrad
to obstacle the gradient flow through the second branch.
slope = 0.5
intercept = 0
autodiff QuarterlyData epochs:epochs learningRate: 0.005 with
params QuarterlyParams.Q abstract auto(0.5, 1)
params q1 = QuarterlyParams.Q[QuarterlyData.Quarter]
params q2 = QuarterlyParams.Q[QuarterlyData.Quarter]
params q3 = QuarterlyParams.Q[QuarterlyData.Quarter]
params slope
params intercept
trend = slope * QuarterlyData.t + intercept
q2 = if (epoch < (epochs/2)) then q2 else noGrad(q1 + q3 / 2)
season = QuarterlyData.Q1 * q1 + QuarterlyData.Q2 * q2 + QuarterlyData.Q3 * q3
pred = trend + season
return (log(QuarterlyData.Revenue) - pred)^2
QuarterlyData.bQ1 = QuarterlyParams.Q * QuarterlyData.Q1
QuarterlyData.bQ2 = QuarterlyParams.Q * QuarterlyData.Q2
QuarterlyData.bQ3 = QuarterlyParams.Q * QuarterlyData.Q3
QuarterlyData.NoGradAvgQ1Q3_LogPred =
slope * QuarterlyData.t + intercept
+ QuarterlyData.Q1 * QuarterlyData.bQ1
+ QuarterlyData.Q2 * QuarterlyData.bQ2
+ QuarterlyData.Q3 * QuarterlyData.bQ3
QuarterlyData.NoGradAvgQ1Q3_LogAdjustment =
(log(QuarterlyData.Revenue) - QuarterlyData.NoGradAvgQ1Q3_LogPred)^2
logAdjustment =
(
sum(QuarterlyData.NoGradAvgQ1Q3_LogAdjustment)
/ ((last(QuarterlyData.t) sort QuarterlyData.t) - 1)
) / 2
QuarterlyData.NoGradAvgQ1Q3_Pred =
exp(QuarterlyData.NoGradAvgQ1Q3_LogPred) * exp(logAdjustment)
show scalar "Slope" with slope
show scalar "Intercept" with intercept
show table "Season" with
QuarterlyData.Quarter
QuarterlyData.Revenue
QuarterlyData.t
QuarterlyData.bQ1
QuarterlyData.bQ2
QuarterlyData.bQ3
show table "Quartertly Params" with
QuarterlyParams.Q
show plot "Seasonal Regression" with
QuarterlyData.t
QuarterlyData.Revenue {color: "gray" ; seriesPattern: dotted}
QuarterlyData.NoGradAvgQ1Q3_Pred as "Pred" {color: "green"}
Comparing the final parameter results, we should see that all the parameters are somewhat off. Halting the gradient, the system was not able to recover the missed optimization on Q2
.
show plot "Residuals" with
QuarterlyData.t
(QuarterlyData.Revenue - QuarterlyData.Control_Pred) as "Control" {color: "blue"; seriesPattern: dashed}
(QuarterlyData.Revenue - QuarterlyData.AvgQ1Q3_Pred) as "avg(Q1, Q3)" {color: "red"; seriesPattern: dotted}
(QuarterlyData.Revenue - QuarterlyData.NoGradAvgQ1Q3_Pred) as "noGrad(avg(Q1, Q3))" {color: "green"; seriesPattern: dashed}
Looking at the residuals what we are describing is even clearer. To the systematic error on the 2nd quarter - when we average but let the gradient flow - a systematic error on the 4th quarter is added. The latter is related to the error on the intercept parameter.