Introduction to hierarchical time series forecasting – part II
Example in Python using scikit-hts
In the first part of this article, I provided an introduction to hierarchical time series forecasting, described different types of hierarchical structures, and went over the most popular approaches to forecasting such time series. In the second part, I present an example of how to approach such a task in Python using the scikit-hts library.
Setup
As always, we start with the setup. First, we need to install scikit-hts using pip. Then, we import the following libraries.
Note that scikit-hts is imported simply as hts.
Data
In this article, we use the Australian tourism data set, which was also used in Forecasting: Principles and Practice (you can read my opinion about the book here). The data set is natively available in the R package called tsibble, but you can also download it from Kaggle or my GitHub.
The data set contains the quarterly number of trips to Australia between 1998 and 2016. Furthermore, it comes with a geographical breakdown (per state and region) and the purpose of the trip (business, holiday, visiting, other). So we could create either a hierarchical or grouped time series forecast. For this article, we will focus on the strictly hierarchical one (though scikit-hts can handle the grouped variant as well).
The original data set contains 8 states, while the one we will be working on only contains 7. The difference is Tasmania, which was apparently dropped in the Kaggle data set.
In the following snippet, we carry out some preprocessing steps. Two noteworthy steps are aggregating over the purpose (as we will be working with a strictly hierarchical structure) and renaming the states of Australia to their abbreviations. In the very end, we also create a concatenation of state and region, to create a unique identifier which we will then use for creating the hierarchical structure.
After all the steps, the DataFrame looks as follows:
Using this one-liner we can inspect which regions are within each of the states in our data set.
# inspect all the regions per state
df.groupby("state")["region"].apply(set).to_frame()
The next step is to create a DataFrame, which we can feed to the scikit-hts models. The DataFrame should contain each time series as a column and the rows indicate the observations in a given time period (quarter in this case).
Using the snippet above, we create three DataFrames, one for each level of the hierarchy:
- bottom level – this is simply a pivot that transforms the initial DataFrame in the long format to the wide format,
- middle level – before pivoting the DataFrame, we first sum over the states,
- total level – the highest level, which is the sum of all states.
After preparing the DataFrames, we join them using the index and print the number of unique series in each level of the hierarchy:
Number of time series at the bottom level: 77
Number of time series at the middle level: 7
Adding the grand total level, we arrive at 85 unique time series.
Creating the hierarchy
The estimators we are going to use also require a clear definition of the hierarchical structure. The easiest way to prepare it is to build the hierarchical tree as a dictionary. Each node (identified as the key in the dict) has a list of children as the corresponding value in the dict. Then, each of the children can also be a key in the dict and have children of its own.
The easiest way to understand it is by looking at an example. In the following snippet, we create the hierarchy of our time series.
As you can see, we benefited from creating the unique state-region names, because now we can easily identify the state’s children by checking if the regions’ names start with the state’s abbreviation. Below we show a piece of the hierarchy dictionary.
Alternatively, we can use the HierarchyTree class to represent the underlying structure of our series. We must provide the DataFrame and the dictionary as inputs and then we can directly pass the instantiated object to the estimator (without the individual components of the HierarchyTree). An additional benefit is a slightly nicer representation of the tree’s structure, which you can see below.
Visualizing the data
Using the prepared hierarchy, we can quickly visualize the data. We start with the grand total level. In the image, we can observe the entire history of data available for this exercise.
hierarchy_df["total"].plot(title="Trips - total level");
As the second step, let’s plot all the state-level series. To do so, we can simply fetch all columns which are the children of the total level in the already defined hierarchy.
ax = hierarchy_df[hierarchy['total']].plot(title="Trips - state level")
ax.legend(bbox_to_anchor=(1.0, 1.0));
Lastly, let’s plot the lowest level for the Western Australia state. I chose this one as it has the fewest regions, so the plot is still easy to read.
ax = hierarchy_df[hierarchy['WA']].plot(title="Trips - regions of Western Australia")
ax.legend(bbox_to_anchor=(1.0, 1.0));
Hierarchical time series forecasting
Finally, we can focus on the modeling part. In this article, I just want to highlight the functionalities of scikit-hts. That is why I present simplified examples, in which I use the entire data set for training and then forecast 4 steps (a year) into the future. Naturally, in a real-life scenario we would employ an appropriate time-series cross-validation scheme and try to tune the hyperparameters of the model to obtain the best fit.
The main class of the library is the HTSRegressor. The usage of the library will be familiar to anyone who has used scikit-learn before (initialize the estimator -> fit to data -> predict). The two arguments we should pay attention to are model and revision_method.
model determines the underlying type of model that will be used to forecast the individual time series. Currently, the library supports:
auto_arima– from thepmdarimalibrary,- SARIMAX – from
statsmodels, - Holt-Winters exponential smoothing – also from
statsmodels, - Facebook’s Prophet.
The revision_method argument is responsible for the approach to hierarchical time series forecasting. We can choose from:
BU– the bottom-up approach,AHP– average historical proportions (top-down approach),PHA– proportions of historical averages (top-down approach),FP– the forecasted proportions (top-down approach),OLS– the optimal combination using OLS,WLSS– optimal combination using structurally weighted OLS,WLSV– optimal combination using variance-weighted OLS.
For a more detailed understanding of these methods, I refer you to Forecasting: Principles and Practice.
The bottom-up approach
I tried to start off with the bottom-up approach, however, I kept on encountering weird errors. The same syntax worked for other revision methods, so I do assume there is some kind of a bug in the library.
But we can still look at the syntax. First, I instantiate the estimator— in this case, I am using auto-ARIMA and the bottom-up revision method. Then, I fit the model using the familiar fit method. As arguments, I pass the DataFrame with all the time series and the dictionary containing the hierarchy. Lastly, I obtain 4 out-of-sample predictions using the predict method. The resulting DataFrame contains n + 4 rows, where n is the number of rows in the original DataFrame. We get the fitted values (n) and the 4 predictions on top of those.
The top-down approach
After failing to run the previous example, I moved to the top-down approach using average historical proportions. This time, the code ran successfully.
Below, you can see the fitted values and predictions for each of the three levels – the grand total level, state, and region levels. For brevity, I only show a single example of the lower levels. In the Notebook, you can find a convenience function used for generating the plots for any level of the hierarchy.
As we can see, the models are quite off at the very begging of the time series, but they stabilize quite quickly. Of course, this is quite an unscientific evaluation of the performance, but as mentioned before, we focus purely on obtaining the hierarchical forecasts, and tuning them is beyond of the scope of this article. Another observation is that the lower levels are more erratic, that is, they contain more noise.
While playing around with the library, I noticed a few quirks:
- unlike the
scikit-learnAPI, we cannot usen_jobs=-1to indicate using all the available cores. And when using numbers other than 0 (as in the examples provided by the author of the library), I was getting weird results. - in the available examples, the author suggests assigning an output while calling the
fitmethod. - the results are way off when using Prophet as the underlying model. You can see that in the image below.
The optimal combination using OLS
Lastly, we use the optimal reconciliation using OLS. The code only requires slight adjustments.
We present the plots of the very same levels as for the top-down approach to enable easy comparison.
The difference is mostly visible at the lowest level, where the fit using the OLS approach is much better, even though the series itself is quite noisy. That should not come as a surprise, as the optimal reconciliation approach is known to provide the most accurate forecasts (for more information about its advantages, please see the previous article).
There is also one thing that we should be aware of – the OLS approach created a negative fitted value for the first observation. And as we know, all the values in the series are non-negative. However, given it’s only the fitted value and just for the first observation in the series (when the algorithm has very little information), this should not be an issue.
Conclusions
In this article, I showed how to use scikit-hts for hierarchical time series forecasting in Python. The library offers an API similar to scikit-learn and is quite easy to start playing around with.
However, the library is in the alpha version. After spending quite some time trying to make the example work (and not succeeding for all the revision methods), I must say that there is still a long way to go for the Python library to catch up to the R’s equivalent – fable. Unfortunately, the documentation is also quite far from complete and not that easy to understand (the same goes for the examples in the repo).
Having said that, I would give it some extra thought if it makes sense to use scikit-hts for a serious project. I would either consider creating a custom solution (especially if using the simpler approaches such as the bottom-up one) or maybe move to R for the particular forecast.
You can find the code used for this article on my GitHub. If you managed to overcome the issues I mentioned in the article, I would be very happy to hear how you did it. Also, any constructive feedback is welcome. You can reach out to me on Twitter or in the comments.
Liked the article? Become a Medium member to continue learning by reading without limits. If you use this link to become a member, you will support me at no extra cost to you. Thanks in advance and see you around!
You might also be interested in one of the following articles:
The best book to start learning about time series forecasting
References
- Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3.
- data: https://www.kaggle.com/luisblanche/quarterly-tourism-in-australia
- https://scikit-hts.readthedocs.io/en/latest/readme.html
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS