noGrad

noGrad(a: number) 🡒 number, const autodiff pure function

Returns the specified number but without propagating gradient through this usage.

autodiff Scalar epochs:100 with
  params A auto(1,0)  // A starts at '1.0'
  Loss = A - noGrad(A)
  return Loss

show scalar "A" with A // A ends at '0.0'

The loss of this autodiff block is equal to zero, but the gradient of the parameter A is equal to 1.

Examples

A simple linear regression models the relationship between dependent variable $y$ and the co-variate $x$ as a straight line [y = slope * x + intercept] In the following example the Mass $y$ (grams) of a chemical is related to the Time $x$ (seconds), for which the chemical reaction has been taking place. Given the data in table Data, the expected results are slope ~ 12 and intercept ~ 12 ( Data source ).
To later show the impact of the noGrad operation over the gradient’s flow, we fit a line to the data as a control case.

table Data = with
  [| 5 as Time, 40 as Mass |]
  [| 7, 120 |]
  [| 12, 180 |]
  [| 16, 210 |]
  [| 20, 240 |]

epochs = 1000

slope = 0
intercept = 1

autodiff Data epochs:epochs learningRate: 0.01 with
  params slope
  params intercept

  pred = slope * Data.Time + intercept

  return (Data.Mass - pred)^2

Data.Control_Pred = slope * Data.Time + intercept

show scalar "Slope" with slope
show scalar "Intercept" with intercept

show chart "Linear Regression" with
  scatter Data.Time
    Data.Mass {color: "gray"}
  plotxy Data.Time
    Data.Control_Pred as "Pred" {color: "blue"}
    order by Data.Time

The loss function should decline gracefully to the lowest possible value. We could not reach zero because a straight line is not the best model for the current example, hence the un-modelled dynamics lead to an irreducible error. The results should be in line with the expected ones.

Now imagine that we know the intercept should be located around a certain value, ~12 in this case. We can use the noGrad operation to limit the gradient if the value of the parameter is over the desired value.

slope = 0
intercept = 1

autodiff Data epochs:epochs learningRate: 0.01 with
  params slope
  params intercept

  currIntercept = if intercept < 12 then intercept else noGrad(intercept)
  pred = slope * Data.Time + currIntercept

  return (Data.Mass - pred)^2

Data.NoGrad_Pred = slope * Data.Time + intercept

show scalar "Slope" with slope
show scalar "Intercept" with intercept

show chart "Linear Regression" with
  scatter Data.Time
    Data.Mass {color: "gray"}
  plotxy Data.Time
    Data.NoGrad_Pred as "Pred" {color: "red"}
    order by Data.Time

The slope remains practically unchanged, while the intercept rests around the desired value.

show chart "Residual" with
  scatter Data.Time
    Data.Mass {color: "gray"}
  plotxy Data.Time
    (Data.Mass - Data.Control_Pred) as "Control" {color: "blue"}
    (Data.Mass - Data.NoGrad_Pred) as "Pred" {color: "red"}
    order by Data.Time

From a residual point of view, no big changes are appreciable. The loss should still decrease gracefully while reaching a value a bit lower than the control. An irreducible error is still present.

A simple linear regression is generally used as a base model for the trend component of a time series forecasting. However we know well that we should take into account seasonal changes.
In a basic seasonal regression model, we see the dependent variable $y$ as the interaction between trend and season [y = f(trend, season)] where $f(trend, season)$ indentifies the way in which the two components interacts.
An additive model will have the form [y = trend + season] while a multiplicative one has [y = trend * season] We can treat the two cases together using a log-transformation and dealing always with an additive model [log(y) = trend + season] In the following example Company XYZ’s quarterly revenues, for 2012 through 2015, are stored in table QuarterlyData. Some indicator variables $Q1$, $Q2$, $Q3$ had already been introduced, they will come at hand later when defining the model ( Data source ).
As we did for the simple linear regression above, we fit the model to the given dataset as a control case.

table QuarterlyData = with
  [| 2012 as Year, 1 as Quarter, 10.5 as Revenue, 1 as t, 1 as Q1, 0 as Q2, 0 as Q3 |]
  [| 2012        , 2           , 9.2            , 2     , 0      , 1      , 0       |]
  [| 2012        , 3           , 13.1           , 3     , 0      , 0      , 1       |]
  [| 2012        , 4           , 16             , 4     , 0      , 0      , 0       |]
  [| 2013        , 1           , 13.6           , 5     , 1      , 0      , 0       |]
  [| 2013        , 2           , 12.2           , 6     , 0      , 1      , 0       |]
  [| 2013        , 3           , 15.6           , 7     , 0      , 0      , 1       |]
  [| 2013        , 4           , 19.4           , 8     , 0      , 0      , 0       |]
  [| 2014        , 1           , 15.9           , 9     , 1      , 0      , 0       |]
  [| 2014        , 2           , 14.7           , 10    , 0      , 1      , 0       |]
  [| 2014        , 3           , 18.3           , 11    , 0      , 0      , 1       |]
  [| 2014        , 4           , 20.5           , 12    , 0      , 0      , 0       |]
  [| 2015        , 1           , 16.6           , 13    , 1      , 0      , 0       |]
  [| 2015        , 2           , 15.7           , 14    , 0      , 1      , 0       |]
  [| 2015        , 3           , 20             , 15    , 0      , 0      , 1       |]
  [| 2015        , 4           , 23.3           , 16    , 0      , 0      , 0       |]

table QuarterlyParams = by QuarterlyData.Quarter

slope = 0.5
intercept = 0

autodiff QuarterlyData epochs:epochs learningRate: 0.005 with
  params QuarterlyParams.Q abstract auto(0.5, 1)
  params q1 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params q2 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params q3 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params slope
  params intercept

  trend = slope * QuarterlyData.t + intercept
  season = QuarterlyData.Q1 * q1 + QuarterlyData.Q2 * q2 + QuarterlyData.Q3 * q3
  pred = trend + season

  return (log(QuarterlyData.Revenue) - pred)^2

QuarterlyData.bQ1 = QuarterlyParams.Q * QuarterlyData.Q1
QuarterlyData.bQ2 = QuarterlyParams.Q * QuarterlyData.Q2
QuarterlyData.bQ3 = QuarterlyParams.Q * QuarterlyData.Q3
QuarterlyData.Control_LogPred =
    slope * QuarterlyData.t + intercept
    + QuarterlyData.Q1 * QuarterlyData.bQ1
    + QuarterlyData.Q2 * QuarterlyData.bQ2
    + QuarterlyData.Q3 * QuarterlyData.bQ3

///
/// Recall that by exponentiating the predicted value of log(y), the untransformed value
/// provides an estimate of the median response at the given values of the co-variates.
/// Contrart, standard regression provides an estimate for the mean response at given
/// values of the co-variates.
/// If prediction of the mean is desired, then an adjustment factor must be applied to the
/// untranformed prediction.
/// The adjustment is accomplished by multiplying the untransformed prediction by
/// exp(s^2/2), where s is the regression standard error from the log-based regression.
///

QuarterlyData.Control_LogAdjustment =
    (log(QuarterlyData.Revenue) - QuarterlyData.Control_LogPred)^2
logAdjustment =
    (
      sum(QuarterlyData.Control_LogAdjustment)
      / ((last(QuarterlyData.t) sort QuarterlyData.t) - 1)
    ) / 2
QuarterlyData.Control_Pred = exp(QuarterlyData.Control_LogPred) * exp(logAdjustment)

show scalar "Slope" with slope
show scalar "Intercept" with intercept

show table "Control Season" with
  QuarterlyData.Quarter
  QuarterlyData.Revenue
  QuarterlyData.t
  QuarterlyData.bQ1
  QuarterlyData.bQ2
  QuarterlyData.bQ3

show table "Quartertly Params" with
  QuarterlyParams.Q

show plot "Seasonal Regression" with
  QuarterlyData.t
  QuarterlyData.Revenue {color: "gray" ; seriesPattern: dotted}
  QuarterlyData.Control_Pred as "Pred" {color: "blue"}

Now imagine that we need to optimize the 2nd quarter factor in one case, or take the average between the 1st and 3rd quarter otherwise. In the following we discriminate regarding the epochs - $\frac{1}{4}$ epochs we optimize, $\frac{3}{4}$ we average - but we could have used other conditions.
We keep the gradient flowing through the branching introduced by the if ... then ... else. For $\frac{3}{4}$ of the training, we are taking the average between $Q1$ and $Q3$ as value for $Q2$. This implies that $Q1$ and $Q3$ will be influenced by 2nd quarter data.

slope = 0.5
intercept = 0

autodiff QuarterlyData epochs:epochs learningRate: 0.005 with
  params QuarterlyParams.Q abstract auto(0.5, 1)
  params q1 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params q2 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params q3 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params slope
  params intercept

  trend = slope * QuarterlyData.t + intercept
  q2 = if (epoch < (epochs/4)) then q2 else (q1 + q3 / 2)
  season = QuarterlyData.Q1 * q1 + QuarterlyData.Q2 * q2 + QuarterlyData.Q3 * q3
  pred = trend + season

  return (log(QuarterlyData.Revenue) - pred)^2

QuarterlyData.bQ1 = QuarterlyParams.Q * QuarterlyData.Q1
QuarterlyData.bQ2 = QuarterlyParams.Q * QuarterlyData.Q2
QuarterlyData.bQ3 = QuarterlyParams.Q * QuarterlyData.Q3
QuarterlyData.AvgQ1Q3_LogPred =
    slope * QuarterlyData.t + intercept
    + QuarterlyData.Q1 * QuarterlyData.bQ1
    + QuarterlyData.Q2 * QuarterlyData.bQ2
    + QuarterlyData.Q3 * QuarterlyData.bQ3
QuarterlyData.AvgQ1Q3_LogAdjustment =
    (log(QuarterlyData.Revenue) - QuarterlyData.AvgQ1Q3_LogPred)^2
logAdjustment =
    (
      sum(QuarterlyData.AvgQ1Q3_LogAdjustment)
      / ((last(QuarterlyData.t) sort QuarterlyData.t) - 1)
    ) / 2
QuarterlyData.AvgQ1Q3_Pred = exp(QuarterlyData.AvgQ1Q3_LogPred) * exp(logAdjustment)

show scalar "Slope" with slope
show scalar "Intercept" with intercept

show table "Season" with
  QuarterlyData.Quarter
  QuarterlyData.Revenue
  QuarterlyData.t
  QuarterlyData.bQ1
  QuarterlyData.bQ2
  QuarterlyData.bQ3

show table "Quartertly Params" with
  QuarterlyParams.Q

show plot "Seasonal Regression" with
  QuarterlyData.t
  QuarterlyData.Revenue {color: "gray" ; seriesPattern: dotted}
  QuarterlyData.AvgQ1Q3_Pred as "Pred" {color: "red"}

Comparing the final parameter results, only the $Q2$ factor is off due to the averaging process. Inspecting the loss function we should be able to identify a small, almost imperceptible bump, around epoch 249, when we switch from optimization to averaging. Letting the gradient flow the system was able to recover the right values for non-constrained model parameters.

We keep the same scenario: $\frac{1}{4}$ epochs we optimize $Q2$, $\frac{3}{4}$ epochs we average $Q2 = \frac{(Q1 + Q3)}{2}$. This time we use the noGrad to obstacle the gradient flow through the second branch.

slope = 0.5
intercept = 0

autodiff QuarterlyData epochs:epochs learningRate: 0.005 with
  params QuarterlyParams.Q abstract auto(0.5, 1)
  params q1 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params q2 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params q3 = QuarterlyParams.Q[QuarterlyData.Quarter]
  params slope
  params intercept

  trend = slope * QuarterlyData.t + intercept
  q2 = if (epoch < (epochs/2)) then q2 else noGrad(q1 + q3 / 2)
  season = QuarterlyData.Q1 * q1 + QuarterlyData.Q2 * q2 + QuarterlyData.Q3 * q3
  pred = trend + season

  return (log(QuarterlyData.Revenue) - pred)^2

QuarterlyData.bQ1 = QuarterlyParams.Q * QuarterlyData.Q1
QuarterlyData.bQ2 = QuarterlyParams.Q * QuarterlyData.Q2
QuarterlyData.bQ3 = QuarterlyParams.Q * QuarterlyData.Q3
QuarterlyData.NoGradAvgQ1Q3_LogPred =
    slope * QuarterlyData.t + intercept
    + QuarterlyData.Q1 * QuarterlyData.bQ1
    + QuarterlyData.Q2 * QuarterlyData.bQ2
    + QuarterlyData.Q3 * QuarterlyData.bQ3
QuarterlyData.NoGradAvgQ1Q3_LogAdjustment =
    (log(QuarterlyData.Revenue) - QuarterlyData.NoGradAvgQ1Q3_LogPred)^2
logAdjustment =
    (
      sum(QuarterlyData.NoGradAvgQ1Q3_LogAdjustment)
      / ((last(QuarterlyData.t) sort QuarterlyData.t) - 1)
    ) / 2
QuarterlyData.NoGradAvgQ1Q3_Pred =
    exp(QuarterlyData.NoGradAvgQ1Q3_LogPred) * exp(logAdjustment)

show scalar "Slope" with slope
show scalar "Intercept" with intercept

show table "Season" with
  QuarterlyData.Quarter
  QuarterlyData.Revenue
  QuarterlyData.t
  QuarterlyData.bQ1
  QuarterlyData.bQ2
  QuarterlyData.bQ3

show table "Quartertly Params" with
  QuarterlyParams.Q

show plot "Seasonal Regression" with
  QuarterlyData.t
  QuarterlyData.Revenue {color: "gray" ; seriesPattern: dotted}
  QuarterlyData.NoGradAvgQ1Q3_Pred as "Pred" {color: "green"}

Comparing the final parameter results, we should see that all the parameters are somewhat off. Halting the gradient, the system was not able to recover the missed optimization on Q2.

show plot "Residuals" with
  QuarterlyData.t
  (QuarterlyData.Revenue - QuarterlyData.Control_Pred) as "Control" {color: "blue"; seriesPattern: dashed}
  (QuarterlyData.Revenue - QuarterlyData.AvgQ1Q3_Pred) as "avg(Q1, Q3)" {color: "red"; seriesPattern: dotted}
  (QuarterlyData.Revenue - QuarterlyData.NoGradAvgQ1Q3_Pred) as "noGrad(avg(Q1, Q3))" {color: "green"; seriesPattern: dashed}

Looking at the residuals what we are describing is even clearer. To the systematic error on the 2nd quarter - when we average but let the gradient flow - a systematic error on the 4th quarter is added. The latter is related to the error on the intercept parameter.

See also

User Contributed Notes
0 notes + add a note