Merge/Join Two Dataframes on Multiple Columns in Pandas

Last Updated : 23 Jul, 2025

When working with large datasets, it's common to combine multiple DataFrames based on multiple columns to extract meaningful insights. Pandas provides the merge() function, which enables efficient and flexible merging of DataFrames based on one or more keys. This guide will explore different ways to merge DataFrames on multiple columns, including inner, left, right and outer joins.

Example: Merging DataFrames on Multiple Columns with Different Names

Sometimes, the common columns are present but have different names. Instead of renaming them manually, we can specify the column names separately for each DataFrame using left_on and right_on.

Output

👁 Screenshot-2024-11-19-203847

Different Column Names

Explanation: The common columns are product_code in df1 and code in df2, as well as store_location in df1 and store in df2. The inner join returns only rows where both columns match in both DataFrames.

Understanding the merge() function

The merge() function in Pandas is used to combine two DataFrames based on one or more keys. The general syntax is:

import pandas as pd
merged_df = pd.merge(df1, df2, on=['column1', 'column2'], how='type_of_join')

Parameters:

df1, df2: The DataFrames to be merged.
on: A list of column names to merge on.
how: Specifies the type of join.
'inner' (default): Returns only matching rows.
'left': All rows from df1, with matching rows from df2.
'right': All rows from df2, with matching rows from df1.
'outer': Returns all rows, filling missing values with NaN.

Examples

Example 1: Joining DataFrames on Multiple Matching Columns

If the column names match in both DataFrames, we can pass a list of column names in the on parameter.

Output

👁 Screenshot-2024-11-19-202354

Joining Dataframes on Multiple columns using Matching Columns Names

Explanation: Since ID and Order exist in both DataFrames, they are used as keys since the inner join keeps only matching rows where both columns match in both DataFrames.

Example 2: Adding Suffixes for Overlapping Column Names

When merging, other columns may have the same names in both DataFrames. Pandas automatically appends _x and _y to distinguish them. We can customize the suffix names using the suffixes parameter.

Output

👁 Screenshot-2024-11-19-202612

Adding Suffixes for Overlapping Multiple Columns

Explanation: The suffixes=('_df1', '_df2') ensures overlapping column names are differentiated aqnd this prevents confusion when both DataFrames have columns with identical names.

Types of joins

Here we will go through some examples to see the working of merge function based on multiple columns.

1. Inner Join on Multiple columns

Let us consider two dataframes. We are basically merging the two dataframes using the two columns product_code and store_location.

Output

👁 Screenshot-2024-11-19-203203

Inner Join on Multiple columns

Explanation: In this case, we merge two DataFrames (df1 and df2) using the common columns product_code and store_location. The inner join keeps only the rows where both columns match in both DataFrames.

2. Left Join on multiple columns

Let us consider two dataframes. We are basically merging the two dataframes using three columns and the join type is left.

Output

 EmpID Dep Salary Bonus
0 101 HR 70000 5000.0
1 102 Finance 80000 6000.0
2 103 IT 90000 NaN

Explanation: Here, we merge df1 and df2 on EmployeeID and Department, but using a left join. This means all rows from the left DataFrame (df1) are retained, and if a match is found in df2, corresponding values are added. If there is no match in df2, the new columns will have NaN (missing values).

3. Right Join on multiple columns

Let us consider two dataframes. We are basically merging the two dataframes using three columns and the join type is right.

Output

 EmpID Dep Salary Bonus
0 101 HR 70000.0 5000
1 102 Finance 80000.0 6000
2 104 IT NaN 7000

Explanation: Similar to the left join, but this time a right join is performed. All rows from the right DataFrame (df2) are retained, and only matching rows from df1 are included. If there is no match in df1, the missing values are filled with NaN.

4. Outer Join on multiple columns

Let us consider two dataframes. We are basically merging the two dataframes using three columns and the join type is outer.

Output

 EmpID Dep Salary Bonus
0 101 HR 70000.0 5000.0
1 102 Finance 80000.0 6000.0
2 103 IT 90000.0 NaN
3 104 IT NaN 7000.0

Explanation: The outer join keeps all rows from both DataFrames (df1 and df2), merging based on common columns EmpID and Dep. If a match is found, corresponding values are added. If a row is present in only one DataFrame, NaN is used for missing values in the other DataFrame.

Benefits of joining dataframes on multiple columns

Merging DataFrames based on multiple columns has several benefits:

Improves Data Integrity: Ensures accurate matching when a single column is insufficient.
Reduces Data Loss: By using multiple keys, fewer incorrect matches occur.
Enables Complex Data Analysis: Merging on multiple columns allows deeper insights.

Comment

Article Tags:

Explore

Introduction

Creating Objects

Viewing Data

Selection & Slicing

Operations

Manipulating Data

Grouping Data

Merging, Joining, Concatenating and Comparing

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Visualization

Applications and Projects

Courses

URL: https://www.geeksforgeeks.org/pandas/merge-join-two-dataframes-on-multiple-columns-in-pandas/