VOOZH about

URL: https://www.analyticsvidhya.com/blog/2020/02/join-dataframes-in-python/

⇱ How To Join Two Dataframes In Python?


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

How to Join Multiple Dataframes in Python

Gyan Prakash Tripathi Last Updated : 14 Jun, 2020
4 min read

The Challenge of Merging Multiple Dataframes in Python

Here’s a scenario that trips up almost every fresher and aspiring data scientist:

You are working on a project where data is being collected from several sources. Before you can get to the exploring and model-building part, you would need to first join these multiple datasets (in the form of tables, dataframes, etc.). How can you do this without losing any information?

This might sound like a simple scenario but it can be intimidating for a lot of newcomers, especially those who are unfamiliar with Python programming.

👁 how_to_join_dataframes_python

Drilling down further into this, I can broadly classify this into two scenarios:

  1. First, the data with similar attributes may be distributed into multiple files. For example, suppose you are provided with multiple files each of which stores the information of sales that occurred in a particular week of the year. Thus, you will have 52 files for the whole year. Each file will have the same number and names of the columns.
  2. Second, you may require combining information from multiple sources. For example, let’s say you want to get the contact information of people who have bought your products. Here you have two files – the first one with sales information and a second one with information about the customers.

I will show you how to work with both scenarios and join multiple dataframes in Python.

Understanding the Problem at Hand

I’ll take a popular and easy-to-understand example for the purpose of this article.

Let’s consider the example of examinations in a particular school. There are various subjects being taught with different teachers assigned to each subject. They update their own files regarding the student marks and overall performance. We’re talking about multiple files here!

For this article, we will use two such files that I have created to demonstrate the working of functions in Python. The first file contains data about class 12th students and the other one has data for class 10th. We will also use a third file that stores the names of students along with their Student ID.

Note: While these datasets are created from scratch, I encourage you to apply what you’ll learn on a dataset of your choice.

Step-by-Step Process for Merging Dataframes in Python

Here’s how we’ll approach this problem:

  1. Load the Datasets in Python
  2. Combine Two Similar Dataframes (Append)
  3. Combine Information from Two Dataframes (Merge)

Step 1: Loading the Datasets in Python

We will use three separate datasets in this article. First, we need to load these files into separate dataframes.

The first two dataframes contain the percentage of students along with their Student ID. In our first dataframe, we have the marks for class 10 students while the second dataframe contains marks for the students in 12th standard. The third dataframe contains the names of students along with their respective Student ID.

We can use the ‘head’ function to check the first few rows of each dataframe:

👁 Sample Dataframes

Step 2: Combining Two Similar Dataframes (Append)

Let’s combine the files of class 10th and 12th in order to find the average marks scored by the students. Here, we will use the ‘append’ function from the Pandas library:

Output: ((50,3),(50,3),(100,3))

As you can see from the output, the append function adds the two dataframes vertically.

The resultant dataframe is allMarks. The shapes of all three dataframes are compared above.

Next, let’s have a look at the content of ‘allMarks’ and calculate the mean:

👁 How to join dataframes in python

Output: 49.74

Step 3: Combining Information from Two Dataframes (Merge)

Now, let’s say we want to find the name of the student who came first among both the batches. Here, we do not need to add the dataframes vertically. We will have to scale it horizontally in order to add one more column for the name of students.

To do this, we will find the maximum marks scored:

Output: 100

The maximum marks achieved by a student are 100. Now, we will use the ‘merge’ function to find the name of this student:

👁 How to join dataframes in python

Finally, the resultant dataframe has names of students mapped along with their marks.

The merge function requires a necessary attribute on which the two dataframes will be merged. We need to pass the name of this column is in the ‘on’ argument.

Another important argument of merge is ‘how’. This specifies the type of join you want to perform on the dataframes. Here are the different join types you can perform (SQL users will be very familiar with this):

  • Inner join (performed by default if you don’t provide any argument)
  • Outer join
  • Right join
  • Left join

We can also sort the dataframe using the ‘sort’ argument. These are the most commonly used arguments while merging two dataframes.

Now, we will see the rows where the dataframe contains 100 ‘Exam points’:

👁 How to join dataframes in python

Three students have got 100 marks, out of which two are in class 10th. Well done!

End Notes

Pretty straightforward, right? No need to trip yourself up over this anymore! You can go ahead and apply this to any dataset of your choice. My recommendation is to pick up the food forecasting challenge that contains 3 different files.

If you are a newcomer to Python for data science, you can enroll in this free course.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Chris

Hi Gyan, isn't the IDandName DataFrame missing from your merge operation?

123 1
Gyan Prakash Tripathi

Hi Chris,thanks for reading the article and bringing it to my attention. I missed it by mistake, have updated the article.

123 456

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
👁 Av Logo White

Continue your learning for FREE

Forgot your password?
👁 Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

👁 Popup Banner
👁 AI Popup Banner