![]() |
VOOZH | about |
Autoregressive models (AR models) are a concept in time series analysis and forecasting that captures the relationship between an observation and several lagged observations i.e previous time steps. Its idea is that the current value of a time series data can be expressed as a linear combination of its past values with some random noise.
Mathematically, autoregressive model of order p, denoted as AR(p) and can be expressed as:
Where:
Autocorrelation Function (ACF) in Autoregressive Models is the one that measures the correlation between a time series and its past lagged values. Its working is as follows:
A lag represents the number of time steps by which the series is shifted. For example:
The autocorrelation coefficient at a specific lag quantifies this relationship:
To analyze autocorrelation patterns, we use an ACF plot which displays autocorrelation values across various lags:
Such patterns help reveal the underlying temporal structure of the data and guide the selection of an appropriate lag order in AR models.
The ACF helps determine how many past time steps (lags) should be included in the model. Typically you will look at the ACF plot along with a Partial Autocorrelation Function (PACF) plot to choose a suitable lag order.
It also helps assess whether a time series is stationary which means its statistical properties like mean and variance stay consistent over time. In a stationary series autocorrelations typically decrease gradually as lag increases. If autocorrelations persist or decay slowly it may indicate non-stationarity suggesting that the series needs transformation before modeling.
Autoregressive models vary based on the number of past values (lags) they use. The two most common types are:
1. AR(1) Model: This is a autoregressive model of order 1 which is the simplest form of an autoregressive model. In this model the current value of the time series depends only on its immediate past value along with a constant and some random noise. This model is particularly useful when the data shows strong autocorrelation at lag 1 i.e the current value depends only on the previous value. It is expressed as:
2. AR(p) Model: It is the generalized form of the autoregressive model where the current value depends on the past values. Choosing the correct order is a crucial step and typically involves analyzing the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots.
We load all necessary libraries like numpy, pandas, matplotlib, statsmodels and scikit learn.
We read the CSV into a pandas DataFrame and convert "Date" column into "datetime" format. We wil interpolate any missing temperature values. You can download dataset from here.
We apply the Augmented Dickey–Fuller (ADF) test to check if the temperature series is stationary which is requirement for AR modeling. The ADF statistic and p‑value tell us whether to difference the series.
Output:
ADF statistic = -1.096, p-value = 0.716
Since the ADF p‑value is above 0.05 the series is non‑stationary. We take a first difference to remove trends and achieve stationarity.
We plot the autocorrelation (ACF) and partial autocorrelation (PACF) of the differenced series to identify appropriate lag order for the AR model.
Output:
We split the differenced data into training as 80% and testing as 20% to evaluate performance of the model.
Using the lag order suggested by the PACF plot (example: p=13), we fit an AutoRegressive (AR) model on the training data.
In this step we use the fitted AR model to produce both in‑sample “fitted” values on the training set and out‑of‑sample forecasts on the test set. We call the predict method of the AutoRegResults object, specifying
start: the first timestamp at which to begin predicting.end: the last timestamp at which to predict.dynamic=False: means the model uses actual past values (not its own predictions) to make each forecast.We compute root‑mean‑square error (RMSE) and mean absolute error (MAE) on the differenced scale to quantify forecast accuracy.
Output:
1.3502853217579163
1.064373117847641
To compare your AR model’s output with actual temperatures you must turn the predicted daily changes () back into real temperature values (). We do this because the model was trained on the difference of the series (day‑to‑day changes) so its forecasts are in units of but we want forecasts in the original units (example: ) so we “undo” the differencing.
Finally, we plot the full observed temperature series alongside the AR model forecast, marking the train/test split point.
Output:
Autoregressive models are tools for forecasting time series that show consistent patterns. In this article we applied an AR model to temperature to make predictions.