VOOZH about

URL: https://www.geeksforgeeks.org/python/box-cox-transformation-using-python/

⇱ Python | Box-Cox Transformation - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Python | Box-Cox Transformation

Last Updated : 25 Aug, 2025

The Box-Cox transformation, introduced by statisticians George Box and David Cox, offers a systematic way to address this problem. It transforms non-normal data into a form that better approximates normality, stabilizes variance and enhances the applicability of methods that rely on normality assumptions.

The Box-Cox transformation belongs to a family of power transformations that adjust the shape of a dataset's distribution. It is particularly useful for positively skewed data such as financial metrics, biological measurements or time-to-event data.

Mathematical Definition

The transformation is mathematically defined as:

Where:

  • is the original data point.
  • is the transformation parameter.

For = 0, the transformation is equivalent to the natural logarithm. For other values of λ, it performs a power transformation scaled by λ.

Selecting the Optimal

The objective is to find the value that best normalizes the data by maximizing the log-likelihood function. This process is automated using computational tools.

Python's scipy.stats module provides the boxcox function, which:

  • Applies the Box-Cox transformation.
  • Estimates the optimal using numerical optimization.

Applying the Box-Cox Transformation in Python

We will use a sample dataset from an exponential distribution (which is right-skewed) to demonstrate the process.

1. Import Required Libraries

Here we will import the required python libraries:

  • numpy: Generates sample data.
  • scipy.stats: Provides the boxcox function for the transformation.
  • matplotlib: Used to visualize the distributions before and after transformation.

2. Generate Right-Skewed Data

  • np.random.seed(0): Ensures reproducibility of results.
  • np.random.exponential(scale=2, size=1000): Generates 1000 data points from an exponential distribution with scale parameter = 2.

3. Apply the Box-Cox Transformation and Estimate λ

  • transformed_data: The transformed version of the original dataset.
  • lambda_opt: The estimated optimal λ value.

4. Print the Optimal

Displays the value of that best normalizes the dataset.

Output:

Optimal lambda: 0.24201319421740217

5. Visualize the Original and Transformed Data

  • Original Data (left): Shows a strong right skew.
  • Transformed Data (right): Displays a more symmetric, bell-shaped distribution after transformation.

Output:

👁 Box-cox-plot
Box-cox plot

Limitations of Box-Cox Transformation

  • Works only with positive data: The transformation cannot be applied to zero or negative values. A common solution is to add a positive constant to shift all values into the positive range.
  • Sensitive to outliers: Extreme values can heavily influence the optimal , leading to less effective transformations.
  • Reduced interpretability: The transformed data may lose the meaning of the original units, making it harder to interpret results.
  • No guarantee of perfect normality: It reduces skewness but datasets with multiple modes or highly irregular patterns may still deviate from a normal distribution.
Comment
Article Tags:
Article Tags: