![]() |
VOOZH | about |
Time series data records data points with respect to time intervals. The analysis of such dataset is important to recognize patterns and making predictions as well as providing informative insights. Box-Jenkins model is a forecasting method that is used to forecasts time series data for a specific period of time.
In this article we will be taking a dive into the Box-Jenkins method for ARIMA modelling as it helps us analyze and forecast time series data.
Let us first discuss an overview about what is an ARIMA model so that we can get a sound understanding about the process.
ARIMA modelling or Autoregressive Integrated Moving Average is a time series analysis and forecasting method, the ARIMA model is a combination of autoregression, differencing and moving average which are used in the modelling of time series. Let's break it down and discuss the different components one by one:
Where:
- is the value of time series on time t.
- c is a constant value.
- are autoregressive coefficients.
- is error at time t.
Where:
- is the value of time series at time t.
- c is a constant.
- are the noise terms or the error terms.
- are the moving average constants.
ARIMA model combines all the AR, I, MA components in it. ARIMA modelling combines all the components mentioned above and its general form is given by:
The general ARIMA forecasting process involves selecting appropriate values for p, d, and q, estimating the model parameters, and using the model to make predictions. The Box-Jenkins methodology is often used for identifying and fitting ARIMA models to time series data.
Let's discuss the box-jenkins method in detail now.
Box-Jenkins method is a type of forecasting and analyzing methodology for time series data. Box-Jenkins method comprises of three stages through which time series analysis could be performed. It comprises of different steps including identification, estimation, diagnostic checking, model refinement and forecasting. The Box-Jenkins method is an iterative process, and steps 1 to 4 from identification to model refinement are often repeated until a suitable and well-diagnosed model is obtained. It is important to note that the method assumes that the underlying time series data is generated by a stationary and linear process. The different stages of the Box-Jenkins model could be identified as:
Identification is the first step of Box-Jenkins method it helps in determining the orders of autoregressive (AR), differencing (I), and moving average (MA) components that are appropriate for a given time series. This step helps in identifying the values of p, d and q for the given time series. Let's see the key stages involved in this phase:
Estimation is the second stage in the Box-Jenkins methodology for ARIMA modeling. In this stage, the identified ARIMA model parameters, including the autoregressive (AR), differencing (I), and moving average (MA) components, are estimated based on historical time series data. The primary goal is to fit the chosen ARIMA model to the observed data. Let's see the key stages involved in this phase:
Diagnostic checking is an important step in the Box-Jenkins methodology for ARIMA modeling. It involves evaluating the acceptance of the fitted ARIMA model by examining the residuals, which are the differences between the observed and predicted values. The goal is to ensure that the residuals are random and do not contain any patterns or structure. Now, let's discuss the key aspects of diagnostic checking in Box-Jenkins:
The model refinement stage in the Box-Jenkins method involves a thorough evaluation of the estimated ARIMA model to ensure that it meets the required statistical assumptions and adequately captures the patterns in the time series data. If there are some issues in the model diagnostics, it will be required to refine the model by altering the orders of autoregressive, integrated and moving average or by considering additional factors which were not considered earlier. After rechecking and re-establishing the order of different components or by considering additional elements the diagnostic checks are again to be performed.
Once a satisfactory model is identified and validated, it could be used for the prediction purposes for future time series data points. Now let's discuss the application of Box-Jenkins method.
Here we are using apple stock data from yfinance, we will be using Box-Jenkins method to analyze the stock data, here's the step-by-step code with explanation:
The code imports necessary libraries yfinance for downloading stock price data, pandas for data manipulation, matplotlib.pyplot for plotting, statsmodels for time series analysis and ARIMA modeling, warnings to suppress warnings during execution.
Now we will be using the functions that are defined for checking stationarity using the Augmented Dickey-Fuller (ADF) test and for plotting the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF).
Stock price data for Apple Inc. (AAPL) is downloaded using yfinance. The data is collected from the start of 2015 to the start of 2023. Log returns are calculated to stabilize variance and make the time series more suitable for modeling.
The stationarity of the log returns is checked before and after differencing. The time series is differenced to achieve stationarity. ACF and PACF plots are created for the differenced series to help determine ARIMA orders.
Output:
ADF Statistic: -13.869148958528394
p-value: 6.51329302121344e-26
Critical Values: {'1%': -3.4336173133865064, '5%': -2.86298332472282, '10%': -2.5675383641200633}
ADF Statistic: -14.058039719328459
p-value: 3.091971442666415e-26
Critical Values: {'1%': -3.433648628001351, '5%': -2.8629971502062155, '10%': -2.5675457254979093}
The code iterates through different values of p, d, and q to find the combination that minimizes both the AIC and BIC values, helping to identify the optimal ARIMA model order.
Output:
Best AIC: -10277.232291010881, Best BIC: -10260.410146733962, Best Order: (0, 0, 1)
The ARIMA model is fitted using the optimal orders obtained from the AIC and BIC selection process. Diagnostics are performed on the residuals, including checking for stationarity. The Ljung-Box test is conducted to assess the autocorrelation in residuals.
Output:
ADF Statistic: -13.478138873971695
p-value: 3.2812344010002946e-25
Critical Values: {'1%': -3.4336189466940414, '5%': -2.8629840458358933, '10%': -2.5675387480760885}
Ljung-Box test statistics: lb_stat
Ljung-Box p-values: lb_pvalue
Finally, the observed log returns and the fitted values from the ARIMA model are plotted to visualize the model's performance.
Output:
The code mentioned above provides a comprehensive example of applying the Box-Jenkins methodology, including stationarity checks, differencing, model fitting, diagnostics, and result visualization for time series analysis and forecasting of stock returns. Adjustments to the model orders and parameters may be necessary based on the diagnostic results.