India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

Reading list

Understanding Time Series Univariate and Multivariate Time Series with Examples

Stationary and Non Stationary Time Series Analyzing Time Series Data using Python

What is Time Series Forecasting?Machine Learning for Time Series Forecasting Feature Engineering Techniques for Time Series Data

Simple Exponential Smoothing Holt Winters Algorithm Time Series in Dashboards

Introduction to ARIMA How to Check Stationary?Implement ARIMA

Prophet Implement Prophet

Time Series Forecasting using Deep Learning

Performing Time Series Analysis using ARIMA Model in R

👁 De-Yu

De-Yu Last Updated : 22 Nov, 2021

7 min read

This article was published as a part of the Data Science Blogathon.

Welcome to the World of Time Series Analysis!

From this article, you will learn how to perform time series analysis using the ARIMA model (with code!). The dataset used in this article can be downloaded here. The usage time series data consist of the number of users connected to the internet through a server. The data are collected at a time interval of one minute and there are 100 observations. I will fit an appropriate ARIMA model for it using R codes.

Stationary Time Series

A key role in time series analysis is played by processes whose properties, or some of them, do not vary over time. Such a property is illustrated in the following important concept, stationarity. We then introduce the most commonly used stationary linear time series models–the autoregressive integrated moving average (ARIMA) models. These models have assumed great importance in modeling real-world processes.

ARIMA in Time Series Analysis

An autoregressive integrated moving average – ARIMA model is a generalization of a simple autoregressive moving average – ARMA model. Both of these models are used to forecast or predict future points in the time-series data. ARIMA is a form of regression analysis that indicates the strength of a dependent variable relative to other changing variables.

The final objective of the model is to predict future time series movement by examining the differences between values in the series instead of through actual values. ARIMA models are applied in the cases where the data shows evidence of non-stationarity. In time series analysis, non-stationary data are always transformed into stationary data.

The common causes of non-stationary in time series data are the trend and the seasonal components. The way to transformed non-stationary data to stationary is to apply the differencing step. It is possible to apply one or more times of differencing steps to eliminate the trend component in the data. Similarly, to remove the seasonal components in data, seasonal differencing could be applied. [1]

According to the name, we can split the model into smaller components as follow:

AR: an AutoregRegressive model which represents a type of random process. The output of the model is linearly dependent on its own previous value i.e. some number of lagged data points or the number of past observations [2].
MA: a Moving Average model which output is dependent linearly on the current and various past observations of a stochastic term [3].
I: integrated here means the differencing step to generate stationary time series data, i.e. removing the seasonal and trend components [1].

ARIMA model is generally denoted as ARIMA(p, d, q) and parameter p, d, q are defined as follow:

p: the lag order or the number of time lag of autoregressive model AR(p)
d: degree of differencing or the number of times the data have had subtracted with past value
q: the order of moving average model MA(q)

Read the dataset

wwwusage = scan("wwwusage.txt", skip=1)

Time series plot

# x-axis show values from 0 to 120 while y-axis show values from 80 to 250 according to min, max values of dataset
plot(1:100, wwwusage, xlim = c(0, 120), ylim=c(80, 250))
#connect the datapoints with line
lines(1:100, wwwusage, type="l")

👁 time series plot

Figure 1: series usage

Interpretation of sample ACF and PACF plot

ACF: The autocorrelation coefficient function, define how the data points in a time series are related to the preceding data points.

PACF: The partial autocorrelation coefficient function, like the autocorrelation function, conveys vital information regarding the dependence structure of a stationary process.

Now let’s explain how to select a model based on ACF and PACF plots.

Given observations of a time series, one approach to the fitting of a model to the data is to match the sample ACF and sample PACF of the data with the ACF and PACF of the model, respectively. We can use the sample ACF plot to see if the specific time series data is stationary or not. Here is an example of how the sample ACF plot will look like if the time series is not stationary.

👁 Image

For stationary time-series data, the sample ACF will die down very quickly or even cuts off. Otherwise, the time series is not stationary. Cut off here means the ACF value is less than the indicated confidence interval or inside the blue dotted line.

If we confirm the data is stationary, we can decide which parameters q and p should be used for the model based on at which lag the sample ACF and PACF value cut off. In this case, parameter p will be decided by sample PACF cut off time and parameter q will be decided by sample ACF cut off time. If we know the time series is not stationary, will try to perform the higher-order differencing to ensure the data is stationary.

Sample ACF and PACF plot

acf(wwwusage, lag.max=100)
pacf(wwwusage, lag.max=100)

👁 sample acf and pacf plot

Figure 2: Sample ACF and PACF plot for series wwwusage

From both time series and sample ACF plot, we get the conclusion that the raw data is not stationary. Sample ACF got the periodical pattern, which very likely contains seasonal components. If data contains trend and seasonal components, it is said to be non-stationary.

Sample PACF cut off after lag 2, however, we can’t use it to decide what model to use since the data is not stationary.

Take first-order differencing

zt = wwwusaget – wwwusaget-1 = (1-B) wwwusaget

z = diff(wwwusage, 1, 1)
plot(1:99, z, xlim = c(0, 120), ylim=c(-15, 15))
lines(1:99, z, type="l" )

👁 first order diffrencing

Figure 3: series z (first-order differencing on series wwwusage)

acf(z, lag.max=100)
pacf(z, lag.max=100)

👁 sample acf and pacf plot

Figure 4: Sample ACF and PACF plot for series z

From sample ACF plot, we still can see some periodical pattern, so decided to apply higher-order differencing i.e. second-order differencing on original data. The purpose is to check if higher-order differencing can get a better result.

Take second-order differencing

zt = wwwusaget – 2wwwusaget-1 + wwwusaget-2 = (1-2B+B2) wwwusaget

z = diff(wwwusage, 1, 2)
plot(1:98, z, xlim = c(0, 120), ylim=c(-15, 15))
lines(1:98, z, type="l" )

👁 second order differencing

Figure 5: series z (second-order differencing on series wwwusage)

acf(z, lag.max=100)
pacf(z, lag.max=100)

👁 acf and pacf plot

Figure 6: Sample ACF and PACF plot for series z

We can see sample ACF decay very fast, so the data is stationary. It is acceptable to have 5% of time lags lie outside the dotted blue boundary. Therefore, we can ignore around 5-time lags (5% x 100) at lag 8, 23, 26, 44, and 52.

Since sample ACF cut off after lag 2, we can use ARIMA(0,2,2) for wwwusage. Similarly, for sample PACF, it cut off after lag 2 we can use ARIMA(2,2,0) for wwwusage.

Model 1: ARIMA(0,2,2)

fit1 = arima(wwwusage, order = c(0,2,2))
tsdiag(fit1)

👁 arima model 1

Figure 7: ACF of residuals (model 1)

Thus, model 1 is adequate, i.e. there is no autocorrelation in the residuals.

👁 Image

Write down fitted model 1

The fitted model is: (1-2B+B2) wwwusaget = zt + 0.1318zt-1 – 0.3590zt-2

Model 2: ARIMA(2,2,0)

fit2 = arima(wwwusage, order = c(2,2,0))
tsdiag(fit2)

👁 arima model 3

Figure 8: ACF of residuals (model 2)

Thus, model 2 is adequate, i.e. there is no autocorrelation in the residuals.

👁 Image

Write down fitted model 2

The fitted model is: (1-2B+B2) wwwusaget = 0.2579 wwwusaget-1 – 0.4407 wwwusaget-2 + zt

Model 3: Auto ARIMA

library(forecast)
fit3 = auto.arima(wwwusage)
tsdiag(fit3)

👁 auto arima model 3

Figure 9: ACF of residuals (model 3)

Thus, model 3 is adequate, i.e. there is no autocorrelation in the residuals.

👁 arima model 3

Write down fitted model 3

Evaluation based on AIC

Model	AIC criterion
Model 1: ARIMA(0,2,2)	517.21
Model 2: ARIMA(2,2,0)	511.46
Model 3: ARIMA(1,1,1)	514.3

From the diagnostic checking, all the 3 models are adequate. AIC criterion allowed us to compare the fit of different models if the models are adequate. The smaller the AIC criterion, the better the model. Therefore, a final fitted model chosen for wwwusage data is ARIMA(2,2,0) which give the lowest AIC (511.46) among all models from the table above.

Conclusion

Final fitted model ARIMA(2,2,0) with fitted values and forecast for 10 steps ahead

# fitted values
plot(1:100, wwwusage, xlim = c(0, 120), ylim=c(80, 330))
lines(1:100, wwwusage, type="l" )
lines(1:100, wwwusage-fit2$residuals, type="l", col="red")

# forecast for 10 steps ahead
forecast = predict(fit2, n.ahead=10)
lines(101:110, forecast$pred, type="o", col="red")
lines(101:110, forecast$pred-1.96*forecast$se, col="blue")
lines(101:110, forecast$pred+1.96*forecast$se, col="blue")

👁 final fitted arima model

Figure 10: the black dots are the observation and the red dots are the predictions

References

[1] Autoregressive integrated moving average – Wikipedia

[2] Autoregressive model – Wikipeida

[3] Moving average model -Wikipedia

About the Author

Here is my profile on LinkedIn.

Thanks for giving your time!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion

👁 De-Yu

De-Yu

R Time Series

Login to continue reading and enjoy expert-curated content.

Free Courses

👁 Generative AI
4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

👁 Generative AI
4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

👁 Generative AI
4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

👁 Generative AI
4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

👁 Generative AI
4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Cancel reply

Become an Author

Share insights, grow your voice, and inspire the data community.

Reach a Global Audience
Share Your Expertise with the World
Build Your Brand & Audience

Join a Thriving AI Community
Level Up Your AI Game
Expand Your Influence in Genrative AI

👁 imag

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

👁 Av Logo White

Continue your learning for FREE

👁 Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

👁 Popup Banner

👁 AI Popup Banner

URL: https://www.analyticsvidhya.com/blog/2021/11/performing-time-series-analysis-using-arima-model-in-r/

⇱ Time Series Analysis using Arima Model - Analytics Vidhya

Reading list

Performing Time Series Analysis using ARIMA Model in R

Welcome to the World of Time Series Analysis!

Stationary Time Series

ARIMA in Time Series Analysis

Read the dataset

Time series plot

Interpretation of sample ACF and PACF plot

Sample ACF and PACF plot

Take first-order differencing

Take second-order differencing

Model 1: ARIMA(0,2,2)

Model 2: ARIMA(2,2,0)

Model 3: Auto ARIMA

Evaluation based on AIC

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Continue your learning for FREE

Enter email address to continue

Enter OTP sent to

Enter the OTP