VOOZH about

URL: https://www.analyticsvidhya.com/blog/2021/04/how-to-reduce-memory-usage-in-python-pandas/

⇱ How to reduce memory usage in Python (Pandas)?


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

How to reduce memory usage in Python (Pandas)?

Vishesh Last Updated : 24 Apr, 2021
4 min read
This article was published as a part of the Data Science Blogathon.

Introduction

Python is one of the most widely-used programming languages for Data Science, Data Analytics, and Machine Learning. Its popularity arises from the fact that it is easy to pick up for beginners, has a great online community of learners, and it has some very useful and powerful data-centric libraries (like Pandas, NumPy, and Matplotlib) which help us in managing and manipulating large amounts of data with ease. Python has become the go-to language for Data Scientists and Data Analysts.

Pandas library in Python allows us to store tabular data with the help of a data type called dataframe. A pandas dataframe allows users to store a large amount of tabular data and makes it very easy to access this data using row and column indices. We can store data with hundreds of columns (fields) and thousands of rows (records).

When dealing with a large amount of data, we have to be careful with how we use memory. Shortage of memory is a common issue when we have a large amount of data at hand. In case the entire RAM space is consumed, the program can crash and throw a MemoryError, which can be tricky to handle at times. Limiting the memory usage becomes important in this case. Reducing memory usage also speeds up computation and helps save time.

The info() method in Pandas tells us how much memory is being taken up by a particular dataframe. To do this, we can assign the memory_usage argument a value = β€œdeep” within the info() method. This will give us the total memory being taken up by the pandas dataframe.

However, the info() method does not give us a detailed description of the memory usage. It only tells us the total memory being used by the dataframe. For a more detailed overview, we can use the memory_usage() method. The memory_usage() method gives us the total memory being used by each column in the dataframe. It returns a Pandas series which lists the space being taken up by each column in bytes. Passing the deep argument a value = True within the memory_usage() method gives us the total memory usage of the dataframe columns.

In general, columns having object datatype (gender, occupation, and zip code in case of our data) take up a lot of space since they are storing strings in them which take up more space than integers and floating-point numbers. Having columns with object datatype can increase memory usage significantly.

To get around this, we can change the datatype of certain object columns to category. For instance, the gender column can only take up 2 values, either M or F. Thus, it makes sense to change the datatype of the gender column from object to category. This will result in a reduction in space being taken up by the gender column.

When the datatype of the gender column is changed to a category, the gender records are stored as integer codes instead of strings. These integer codes in turn refer to the string values, either M or F. Since integers take up less space than strings, the memory usage comes down significantly. The dataframe may look the same on the surface, but the way it is storing data on the inside has changed. Space is taken up by the gender column goes down from 58,466 bytes to 1,147 bytes, a 98% reduction in space.

Similarly, we can change the data type of other object columns in our dataframe. This can reduce memory usage to a large extent, and can prevent the unnecessary occurrence of MemoryError in our program.

Another way to reduce memory being used by columns storing only numerical values is to change the data type according to the range of values. For example, in the case of our data, the minimum and maximum values of age are 7 and  73 respectively. This range of values can very well be represented by an 8-bit binary number. So, instead of storing age data as a 64-bit integer which is the default in most newer versions of Pandas, we can store it as an 8-bit integer. As the number of bits required to store the data has reduced, the memory usage also comes down.

An 8-bit integer can range between -127 and +128 (in 2’s complement representation), which will be sufficient for the age column in our dataframe. This will result in a significant reduction in the memory being taken up by the age column.

When the datatype of the age column is converted from int64 to int8, the space being taken up by the column does down from 7544 bytes to 943 bytes, an 87.5% reduction in space.

We can also change the datatype from int64 to int16 or int32. While int16 supports a range of -32,768 to +32,767, int32 supports a much larger range of numbers, from -2147483648 to +2147483647. We can choose int8, int16, or int32 depending on the range of values.

The table below lists the entire range of values that can be represented by the different integer data types:

Maximum value Minimum value
int8 127 -128
int16 32767 -32768
int32 2147483647 -2147483648
int64 9223372036854775807 -9223372036854775808

Similarly, we can also change the data type of columns having floating-point numbers. A change in datatype from float64 to float16 will result in a significant reduction in space.

In this blog post, we have learned about 2 methods in pandas that tell us about the memory being taken up by a dataframe, the info() method and the memory_usage() method. We also looked at two ways to reduce the memory being used by a pandas dataframe. The first way is to change the data type of an object column in a dataframe to the category in the case of categorical data.

This does not affect the way the dataframe looks but reduces the memory usage significantly. The second way is to change the data type of numerical columns in a dataframe based on the range of values. This works for columns storing either integers or floating-point numbers.

You can also refer to the YouTube video linked below to get a deeper understanding of the same. It explains the same methods to reduce the memory being taken up by a pandas dataframe.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion. 

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Very Nicely explained and useful.

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner