VOOZH about

URL: https://www.geeksforgeeks.org/data-analysis/panel-data-analysis-in-statsmodels/

⇱ Panel Data Analysis in StatsModels - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Panel Data Analysis in StatsModels

Last Updated : 30 Jun, 2025

Panel data (also known as longitudinal or cross-sectional time-series data) consists of observations on multiple entities (such as individuals, firms, or states) tracked over time. This data structure allows analysts to:

  • Control for unobserved individual multiplicity.
  • Study dynamic behaviors and trends
  • Improve the efficiency of econometric estimates

Panel data analysis is widely used in economics, social sciences, and business research for its ability to provide richer information compared to purely cross-sectional or time-series data.

Types of Panel Data Models

The main models used in panel data analysis are:

  • Pooled OLS Regression: Ignores the panel structure, treats all observations as independent.
  • Fixed Effects Model (FE): Controls for time-invariant characteristics by using entity-specific intercepts.
  • Random Effects Model (RE): Assumes entity-specific effects are random and uncorrelated with regressors.

Panel Data Analysis with StatsModels

While StatsModels does not have a dedicated high-level panel data API, it supports panel analysis through:

  • Pooled OLS: Standard OLS regression
  • Fixed Effects: By including entity/time dummies or using the MixedLM (Mixed Linear Model) class
  • Random Effects: Using MixedLM for random intercepts

Step-by-Step Implementation

1. Import Required Libraries

  • import pandas as pd : For data manipulation and DataFrame operations.
  • import numpy as np : For numerical operations and random number generation.
  • import statsmodels.api as sm : For core statistical models (like OLS regression).
  • import statsmodels.formula.api as smf : For formula-based statistical models (like MixedLM).

2. Simulate Panel Data

A balanced panel is created dataset with 5 states and 10 years each, including income (independent variable) and violent (dependent variable):

3. Set Panel Structure

Set a multi-index for the panel structure,organizes data for panel analysis(not strictly required for modeling, but good practice):

4. Pooled OLS Regression (Baseline)

This model ignores the panel structure and treats all observations as independent:

Output

👁 Pooled-OLS-Regression
Pooled OLS Regression

5. Fixed Effects Model (Entity Dummies Approach)

This model controls for unobserved, time-invariant differences between  entities(states) by adding state dummies:

Output

👁 Fixed-Effects-Model
Fixed Effects Model

6. Random Effects Model (Mixed Linear Model)

This model treats state effects as random variables across states, assuming these effects are uncorrelated with the regressors:

Output

👁 Random-Effects-Model
Random Effects Model

You can download the complete source code from here : Panel Data Analysis in StatsModels

Comment