SARIMA or Seasonal Autoregressive Integrated Moving Average is an extension of the traditional ARIMA model, specifically designed for time series data with seasonal patterns. While ARIMA is great for non-seasonal data, SARIMA introduces seasonal components to handle periodic fluctuations and provides better forecasting capabilities for seasonal data.
Understanding the Components of SARIMA
SARIMA consists of several components that help capture both short-term and long-term dependencies within a time series:
Seasonal Component: Represents the repeating patterns or cycles in the data at regular intervals like yearly, monthly, daily, etc. This allows SARIMA to model seasonality effectively.
Autoregressive (AR) Component: Models the relationship between current and past observations. It captures the autocorrelation of the data over time.
Integrated (I) Component: Addresses non-stationarity by differencing the data to make it stationary which is crucial for time series analysis.
Moving Average (MA) Component: Models the relationship between current observations and past residual errors. It helps in capturing short-term fluctuations.
SARIMA Notation
The SARIMA model is represented as:
SARIMA(p, d, q)(P, D, Q, s)
Parameters:
p: Autoregressive order
d: Number of non-seasonal differences
q: Moving average order
P: Seasonal autoregressive order
D: Seasonal differencing order
Q: Seasonal moving average order
s: Length of the seasonal period (e.g., 12 for monthly data)
Before applying SARIMA, seasonal differencing is often required to make the data stationary. This process involves subtracting the current observation from one that corresponds to the same season in the previous cycle. Seasonal differencing helps remove the seasonal pattern from the data, enabling more accurate forecasting.
Understanding Mathematical Representation of SARIMA
The SARIMA model can be expressed mathematically as:
Before applying SARIMA, we need to check if the data is stationary. Stationary data has constant mean and variance, which is a key assumption for SARIMA. We use the Augmented Dickey-Fuller test (ADF) for this.
adfuller(timeseries, autolag='AIC'): Performs the Augmented Dickey-Fuller test for stationarity.
result[1]: Extracts the p-value from the ADF test to check for stationarity.
We can identify the SARIMA model parameters (p, d, q, P, D, Q, s) using Autocorrelation (ACF) and Partial Autocorrelation (PACF) plots. These plots help in determining the order of the model components.
plot_acf(): Plots the Autocorrelation Function (ACF) to visualize correlations between lags.
plot_pacf(): Plots the Partial AutoCorrelation Function (PACF) to visualize partial correlations between lags.