Forecast stability measurement
When judging the quality of a forecast, one aspect that is generally ignored is its stability. As explained in my last blog, stability measures how much the forecast changes from one forecast cycle to the next. This blog briefly overviews an intuitive stability metric and how to apply it.
Note on terminology: when speaking of “item,” it refers to a combination of product and location, at any level of aggregation, unless a specific aggregation is indicated. A time series exists as the combination of an item across multiple consecutive periods.
Stability in forecasting means the forecast for any item in any given period does not change dramatically between one forecasting cycle to the next. The opposite of that—instability—measures the amount of change between cycles. The Cycle-over-Cycle Change (COCC) is a straightforward metric that measures instability, like a WAPE for error. Its formula is identical to that of WAPE, with two differences:
- Instead of comparing a forecast to an actual, it is compared to a prior forecast.
- It ignores values that do not exist in both forecasts.
Where t is the periods measured, i is the items (series) measured, c and p are the current forecasting cycle and any given prior forecasting cycle, respectively, and Ip,c is an indicator which is 1 if a value exists in both the current and previous cycles and 0 otherwise.
Basically, it is a sum of absolute changes divided by the sum of the prior forecast for an estimate of the relative change. The forecasts compared within the absolute operator are always for the same item and period. Their difference is their lags. For example, a forecast for item A for March 2022 generated in February will be compared with a forecast for item A for March 2022 generated in January. One is lag 1, the other lag 2. A forecast for March would never be compared to another month.
The above gives a percentage change between two forecasting cycles (any two, these do not need to be consecutive). The same logic can be extended to cover many successive cycles. Where p = c-1, and c iterates through as many cycles as desired. In this case, each of those iterations could provide its own percentage to be tabulated, or an average of those percentages can be used for one overall verdict.
For troubleshooting, the exact percentages can be calculated at any level of aggregation to allow a drill down into the detail where changes are gravest. If more than two forecasting cycles are involved, the average percentages need to be performed separately for meaningful values at each such level of aggregation.
Note: extremely high values are expected for intermittent demand at the detail level. Drill down into detail will only be helpful up to a certain depth. Instead, only the numerator may need to be considered at the most granular levels to assess impact fairly. This is similar to evaluating errors, where a WAPE at some level of aggregation may need to be replaced by MAE in detail for meaningful insights.
How to apply
Like a MAPE or WAPE, COCC will be sensitive to the level of aggregation where the absolute operator is applied. Rather than a problem, this is an opportunity. With the same formula, you could measure the total change over a whole year, the average weekly change within a year, the average monthly change within a year, or the change between two specific cycles.
- To compare impact per period across two cycles, determine the appropriate level of item aggregation, keep time in the granularity of the forecast, and apply the formula to that period only. The first summation (over t) disappears.
- To compare average change per period across multiple periods include the absolute values for all the applicable periods and sum over those periods. For an average monthly change, use monthly granularity. For an average weekly change, use weekly granularity. You can do this over any horizon, such as a typical lead time or a whole year.
- For a total change across a whole year, the period is sized to a year or any 12-month or 52-week period, and the absolute value is taken by item for that single period. In such a case, a slight deviation from the formula can also be made where the periods included in the year total may not necessarily be the same. For example, you could compare the total for Jan-Dec to the total of Feb-Jan. This is not perfect but can provide quick insights.
The level of item aggregation can also be chosen depending on how the forecast is used. If budget or capacity gets reserved three months out by month, brand, and region based on the forecast, change should be measured at that granularity. In this case, we would sum across three months after taking the absolute values by month, brand, and region. For most other purposes, the appropriate item granularity will likely be the same granularity where the accuracy of the forecast is determined.
The gap between consecutive forecasts may also be considered. For example, if accuracy matters at lags W+2 and W+4, we may want to compare lag W+2 with lag W+4 for the same periods rather than measure change between consecutive weekly cycles. In this case, current forecast c will be at W+4, while prior forecast p will be at W+2 in the formula.
Stability is an aspect of forecast quality that is often ignored. But as I hope to have illustrated here, it is remarkably simple to calculate. There is no practical reason not to measure it. The only thing requiring some thought is at which granularity and across which time horizon to calculate it—precisely the same as you would do to determine forecast accuracy.
Not all change is bad. If information changes, so should a forecast. However, it is surprisingly common that forecasts change much more dramatically than any change in the underlying data would warrant. This signals that overfitting is likely occurring and contributes to the bullwhip effect felt on the supply chain. Both are unwelcome. Both can be identified by measuring the COCC of your forecast and then taking corrective action.