VOOZH about

URL: https://www.analyticsvidhya.com/blog/2021/04/how-to-check-stationarity-of-data-in-python/

⇱ Stationarity | What is Stationarity | Checking Stationarity in Python


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

How to check Stationarity of Data in Python

Nishtha Last Updated : 19 Oct, 2024
5 min read
This article was published as a part of the Data Science Blogathon.

Introduction

Hello readers!

In our routine life, we come across a lot of statistics that vary to and fro. The prominent ones being the environmental factors such as temperature, humidity level, the amount of rainfall, etc. These factors generally change many a time during the day.

Apart from these, we visit many shops which sell various types of items and keep a record of their sales either daily or monthly

Now, what is the common feature in all such types of data and how can this data be pre-processed before applying any suitable model to it?

In this article, we will get the answer to such questions and see what criteria such dataset follow and how these criteria can be monitored in the python language.

Table of Contents

  • Objective
  • Overview
  • Data Stationarity
  • Method to check Stationarity of data
  • Implementation of code
  • Observation
  • Conclusion

Objective –To examine the stationarity of time series data in python by comparing two different datasets with the help of two test Rolling statistics and Augmented Dickey – fuller test.

Overview-

Time Series data -The set of observations that are collected at the regular intervals of time form a time series data. It tells the magnitude of data collected over a period of time. For example, you have data of a mobile store that describes the total sale of mobile phones per day or data of the amount of rainfall per day of a particular place, such types of data are called time series data where one of the variables is the time.

The time-series data must be in equal intervals of time as a day, a month, a week, a decade, etc.

Uses:-

  • Time series data is used to predict future data values with the help of previous data.
  • It helps to forecast the business opportunity in the future by analyzing the previous sales data, observing the previous trend, analyzing the past behavior, etc.
  • It helps to evaluate the current accomplishment.

Patterns in time series

They are 3 types of patterns that are usually observed in time series data:-

1) Trend: – It describes the movement of data values either higher or lower at regular intervals of time over a long period. If the movement of data value is in the upper pattern, then it is known as an upward trend and if the movement of data value shows a lower pattern then it is known as a downward trend. If the data values show a constant movement, then is known as a horizontal trend.

2) Seasonality: – It is a continuous upward and downward trend that repeat itself after a fixed interval of time.

3) Irregularity: – It has no systematic pattern and occurs only for a short period of time and it does not repeat itself after a fixed interval of time. It can also be known as noise or residual.

Stationarity

The Time series data model works on stationary data. The stationarity of data is described by the following three criteria:-

1) It should have a constant mean

2) It should have a constant variance

3) Auto covariance does not depend on the time

*Mean – it is the average value of all the data

*Variance – it is a difference of each point value from the mean

*Auto covariance –it is a relationship between any two values at a certain amount of time.

Method to check the stationarity of the Time Series Data:-

There are two methods in python to check data stationarity:-

1) Rolling statistics:-

This method gave a visual representation of the data to define its stationarity. A Moving variance or moving average graph is plot and then it is observed whether it varies with time or not. In this method, a moving window of time is taken (based on our needs, for eg-10, 12, etc.) and then the mean of that time period is calculated as the current value.

2) Augmented Dickey- fuller Test (ADCF): –

In this method, we take a null hypothesis that the data is non-stationary. After executing this test, it will give some results comprised of test statistics and some other critical values that help to define the stationarity. If the test statistic is less than the critical value then we can reject the null hypothesis and say that the series is stationary.

Implementation of Code Data set Description

Dataset 1:- Monthly electricity production

Dataset 2:- Monthly sunspots 

#Importing modules

import numpy as np
import pandas as pd
import matplotlib.pylab as plt
%matplotlib inline
#Loading Datasets


data=pd.read_csv(r'C:UsersnishthaDesktopelec.csv')

data1=pd.read_csv(r'C:UsersnishthaDesktopmonthly-sunspots.csv')

Converting data into date time format

#Dataset1:- 

data['DATE']=pd.to_datetime(data['DATE'],infer_datetime_format=True)
index=data.set_index(['DATE'])
from datetime import datetime
index.head()

#Dataset2:-

data1['Month']=pd.to_datetime(data1['Month'],infer_datetime_format=True)
index1=data1.set_index(['Month'])
from datetime import datetime
index1.head()

Plotting the graphs

#Dataset 1:-

plt.xlabel("DATE")
plt.ylabel("Electric Production")
plt.plot(index)
πŸ‘ Image

Dataset 2:-

Python Code:

#import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#from datetime import datetime


data1=pd.read_csv('monthly-sunspots.csv')

data1['Month']=pd.to_datetime(data1['Month'],infer_datetime_format=True)
index1=data1.set_index(['Month'])
print(index1.head())

plt.xlabel("DATE")
plt.ylabel("Minimum Temperature")
plt.plot(index1)
plt.show()
πŸ‘ Stationarity plot

#Augmented Dickey-fuller test

#Dataset1

from statsmodels.tsa.stattools import adfuller
print("Observations of Dickey-fuller test")
dftest = adfuller(data['prod'],autolag='AIC')
dfoutput=pd.Series(dftest[0:4],index=['Test Statistic','p-value','#lags used','number of observations used'])
for key,value in dftest[4].items():
 dfoutput['critical value (%s)'%key]= value
print(dfoutput)
Observations of Dickey-fuller test
Test Statistic -2.256990
p-value 0.186215
#lags used 15.000000
number of observations used 381.000000
critical value (1%) -3.447631
critical value (5%) -2.869156
critical value (10%) -2.570827
dtype: float64

#Dataset2

from statsmodels.tsa.stattools import adfuller
print("Observations of Dickey-fuller test")
dftest = adfuller(data1['Sunspots'],autolag='AIC')
dfoutput=pd.Series(dftest[0:4],index=['Test Statistic','p-value','#lags used','number of observations used'])
for key,value in dftest[4].items():
 dfoutput['critical value (%s)'%key]= value
print(dfoutput)
Observations of Dickey-fuller test
Test Statistic -9.567668e+00
p-value 2.333452e-16
#lags used 2.700000e+01
number of observations used 2.792000e+03
critical value (1%) -3.432694e+00
critical value (5%) -2.862576e+00
critical value (10%) -2.567321e+00
dtype: float64

#Rolling Statistics Test

#Dataset1

rmean=index.rolling(window=12).mean()
rstd=index.rolling(window=12).std()
print(rmean,rstd)
orig=plt.plot(index , color='black',label='Original')
mean= plt.plot(rmean , color='red',label='Rolling Mean')
std=plt.plot(rstd,color='blue',label = 'Rolling Standard Deviation')
plt.legend(loc='best')
plt.title("Rolling mean and standard deviation")
plt.show(block=False)
πŸ‘ Stationarity rolling mean

#dataset2

rmean1=index1.rolling(window=12).mean()
rstd1=index1.rolling(window=12).std()
print(rmean1,rstd1)
orig=plt.plot(index1 , color='black',label='Original')
mean= plt.plot(rmean1 , color='red',label='Rolling Mean')
std=plt.plot(rstd1,color='blue',label = 'Rolling Standard Deviation')
plt.legend(loc='best')
plt.title("Rolling mean and standard deviation")
plt.show(block=False)
πŸ‘ Stationarity standard deviation

Observation

We observed the following things on two datasets after implementing both the test.

Augmented dickey-fuller test :

The result of the dickey-fuller test consists of some values like test statistics, p-value critical values, etc. For dataset1 the test statistic value (-2.25) is not less than the critical values (-3.44 , -2.86 , -2.57) at different percentage . In this case, we cannot reject our null hypothesis and conclude that our data is not stationary,

For dataset2, the test statistics value (-9.56) is less than the critical values (-3.43,-2.87,-2.56) at different percentage. In this case, we can reject our null hypothesis conclude that our data is stationary.

Rolling Statistics Test :

The Rolling statistics test gives the visual representation of the dataset.

For the first dataset, the graph of rolling mean and rolling standard deviation is not constant, this shows that our first dataset is not stationary while for the second dataset graph of rolling mean and rolling standard deviation is constant, this shows that our second dataset is stationary.

Conclusions

Both tests can be used to check the stationarity of the data. The Rolling statistic test gives the pictorial representation while the dickey-fuller test gives some values which help to determine whether data is stationary or not.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion. 

Login to continue reading and enjoy expert-curated content.

Free Courses

Ensemble Learning and Ensemble Learning Techniques

Learn ensemble learning, its techniques, and how it works in this course!

Bagging and Boosting ML Algorithms

Explore Bagging and Boosting to understand advanced ML algorithms.

Naive Bayes from Scratch

Master NaΓ―ve Bayes for ML: Build classifiers, analyze data, and apply Bayes.

Dimensionality Reduction for Machine Learning

Master key dimensionality reduction techniques for ML success!

Responses From Readers

Great article, with good text and enlightening examples. Congratulations ! Thanks for sharing it.

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner