StatsModels is a comprehensive Python library for statistical modeling, offering robust tools for time series analysis. Time Series Analysis module provides a wide range of models, from basic autoregressive processes to advanced state-space frameworks, enabling rigorous analysis of temporal data patterns. The library emphasizes statistical rigor with integrated hypothesis testing and diagnostics.
Key Components of StatsModel 's Time Series Module
Core Models and Functions
ARIMA/SARIMAX: For non-seasonal and seasonal integrated autoregressive moving average models.
Sets the date column as the index and parses it as datetime, which is crucial for time series analysis
The data is structured so that each row represents a time point (e.g., daily, monthly).
Ensuring the index is a datetime object allows StatsModels to recognize time ordering and frequencies.
Here, the code:
Loads the AirPassengers dataset from a CSV file containing monthly totals of airline passengers from 1949 to 1960, with the "Month" column parsed as a datetime index and the "Passengers" column renamed to "value".
Plots the time series to visually display how the number of airline passengers changes over time, allowing you to observe overall trends, seasonality, and any anomalies in the data.
It performs the Augmented Dickey-Fuller (ADF) test on your time series data to check if it is stationary. Specifically:
1. The function adfuller(data['value']) tests for the presence of a unit root, which would indicate non-stationarity (i.e., the mean and variance change over time).
2. The output includes an ADF test statistic and a p-value.
If the p-value is less than 0.05, you can reject the null hypothesis and conclude the series is stationary.
If the p-value is greater than 0.05, you fail to reject the null and the series is likely non-stationary.
It applies first-order differencing to the time series, which means it subtracts each value from its previous value to remove trends and stabilize the mean. Then, it runs the Augmented Dickey-Fuller (ADF) test again on the differenced data to check if the series has become stationary (i.e., its statistical properties no longer depend on time).
This code uses seasonal decomposition to break down your time series into three separate components: trend (long-term movement), seasonality (regular repeating patterns), and residuals (random noise). The 'multiplicative' model is chosen, meaning the components are multiplied together, which is appropriate when seasonal effects increase or decrease with the trend.
It fits a SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors) model to your time series data, then prints a statistical summary of the results.
SARIMAX data, order=(1,1,1), seasonal_order=(1,1,1,12)) creates a model that accounts for:
Autoregression (AR): Uses past values to predict current values.
Integration (I): Applies differencing to make the series stationary.
Moving Average (MA): Models the relationship between an observation and past errors.