VOOZH about

URL: https://towardsdatascience.com/introduction-to-hierarchical-time-series-forecasting-part-i-88a116f2e2/

⇱ Introduction to hierarchical time series forecasting - part I | Towards Data Science


Introduction to hierarchical time series forecasting – part I

Problem definition and overview of different approaches

7 min read
👁 Photo by Edvard Alexander Rølvaag on Unsplash
Photo by Edvard Alexander Rølvaag on Unsplash

Most of the articles on time series forecasting focus on a particular level of aggregation. However, the challenge appears when we can drill down our aggregated data to observe the same series on a more granular level. In such a case, we can often find out that the forecasts for the lower levels do not add up to the aggregated forecast – they are not coherent. To make sure that is not the case, we can employ an approach called hierarchical time series (HTS) forecasting.

Theoretical introduction

We should start the introduction with the data. When speaking about aggregated and disaggregated time series, we can distinguish two scenarios. It’s easiest to understand them by analyzing an example, so let’s assume we are an online retailer selling different sorts of products on many markets (like Amazon).

The first scenario involves a clear, hierarchical structure of the data, in which the lower levels are uniquely nested within the higher-level groups. The easiest example would be geographical splitting. As the retailer, we can look at the total level of sales over all markets, and then break down the aggregated series by country. If necessary, we can dive deeper into the sales per region (like states in the US, etc.). When our data follows such a structure, we are dealing with hierarchical time series.

👁 An example of hierarchical time series. Image by the author.
An example of hierarchical time series. Image by the author.

The second scenario involves time series in which the levels are crossed, not nested. As a retailer, we can have multiples levels of detail: product category, price range, our own products as opposed to the ones sold by third-party sellers, etc. With such splits, there is no single "correct" way of aggregating. In such a case, we work with grouped time series.

👁 Example of grouped time series. Image by author.
Example of grouped time series. Image by author.

Naturally, the hierarchical and grouped time series can mix into an even more complex structure, when we analyze, for example, geographical location and product category jointly.

The entire challenge of hierarchical time series forecasting (this name also includes grouped and mixed cases, just to be clear) is to generate forecasts that are coherent across the entire aggregation structure. By coherent, I mean forecasts that add up in a manner that is consistent with the underlying aggregation structure. For example, the forecasts of all regions should add to the country levels, all the country levels to the higher level, etc. Alternatively, incoherent forecasts can be reconciled to make them coherent.

One more thing that requires clarification is that hierarchical time series forecasting is not a methodology of time series forecasting (such as ARIMA, ETS, or Prophet) per se. Instead, it is a collection of different techniques that make the forecasts coherent across the given hierarchy of individual time series.

Below, we go over the main approaches to hierarchical time series forecasting.

The bottom-up approach

In the bottom-up approach we forecast the most granular level of the hierarchy and then aggregate the forecasts to create the estimates for the higher levels. Coming back to the initial example of an online retailer, we would forecast the sales in each of the regions and then sum those up to create the forecasts for the respective countries. We might sum again to arrive at continent/area level and then ultimately arrive at the grand total.

Advantages:

  • no information is lost due to aggregation, as the forecasts are obtained at the lowest level.

Disadvantages:

  • the relationships between the series (for example, between different regions) are not taken into account,
  • tends to perform poorly on highly aggregated data,
  • computationally intensive (depends on the task and the number of series in the lower levels),
  • more noise in the data at the most granular levels results in worse overall accuracy of the forecast.

The top-down approach

The top-down approach involves forecasting the top level of the hierarchy, and then splitting the forecast into the more granular series. Most commonly, historical proportions are used for determining the split. To give an example, we would forecast the grand total level. Then, looking at past data we could infer that US covers 50% of the sales, Europe 40%, etc. Then, we can iterate and break the series down into more granular levels.

Advantages:

  • the simplest approach,
  • reliable forecast for the higher level(s) of the hierarchy,
  • only a single forecast is required.

Disadvantages:

  • less accurate forecasts at the lower levels due to loss of information (via historical proportions).

The middle-out approach

The middle-out approach is a combination of the two methods described above and it can only be used for strictly hierarchical time series. In this approach, we select the middle level and forecast it directly. Then, for all the levels above the selected level, we use the bottom-up approach – we sum the levels up the hierarchy. For the levels below the middle one, we use the top-down approach.

As it is a compromise between the two different approaches, the resulting forecasts do not lose that much information, and the computation time does not explode as in the case of the bottom-up approach.

The optimal reconciliation approach

The three approaches described above focus on forecasting the time series on a single level and then using those to infer the rest of the levels. As opposed to them, in the optimal reconciliation method, we forecast each of the levels using all the information and relationships the given hierarchy can offer.

In this approach, we assume that the base forecasts (for each of the series for all the levels) approximately satisfy the hierarchical structure. This means that the forecasts should be relatively accurate, not to distort the balance. Then, we use a linear regression model to reconcile the individual forecasts. Effectively, the coherent forecasts are a weighted sum of all the base forecasts from all the levels. To find the weights, we need to solve a system of equations to ensure that the hierarchical relationship between the different levels is preserved.

Advantages:

  • more accurate forecasts,
  • unbiased forecasts at all levels with minimal loss of information
  • takes the relationships between time series into account,
  • as each forecast is created independently, the approach allows for using a different forecasting method (ARIMA, ETS, Prophet, etc.) at each level. Additionally, different levels can use different feature sets, as some variables might not be available at a given level of granularity.

Disadvantages:

  • the most complex method,
  • can be computationally-intensive – does not scale well for a large number of series.

There are also other reconciliation approaches available, for example the MinT (Minimum Trace) optimal reconciliation approach, about which you can read here.

Conclusions

In this article, I provided a brief introduction to hierarchical time series forecasting and described the most popular approaches used for tackling that challenge. An obvious question would be which of the approaches to use. And as you might have guessed, the answer is: it depends.

The first three approaches tend to be biased toward the level they are forecasting, which intuitively makes sense. So when getting an accurate forecast for a particular level is most important and we want to obtain the rest as a by-product, we might want to start with one of the simpler approaches and see if we are satisfied.

Otherwise, we might look into the optimal reconciliation approaches that tend to be quite accurate on all levels of the hierarchy. Ideally, we can just try all the different approaches, while employing some kind of time series cross-validation scheme to assess the performance of each of them and select the one that works best for our problem.

For an example of how to implement hierarchical time series forecasting in Python, please have a look at the second part of the article.

As always, any constructive feedback is welcome. You can reach out to me on Twitter or in the comments.

If you liked this article, you might also be interested in one of the following:

Beautiful decision tree visualizations with dtreeviz

The best book to start learning about time series forecasting

Facebook’s Prophet + Deep Learning = NeuralProphet

References

  • Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3.
  • Rob J. Hyndman, Roman A. Ahmed, George Athanasopoulos, Han L Shang (2011) Computational Statistics and Data Analysis 55(9), 2579–2589

Written By

Eryk Lewinson

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles