![]() |
VOOZH | about |
The merge() function is designed to merge two DataFrames based on one or more columns with matching values. The basic idea is to identify columns that contain common data between the DataFrames and use them to align rows.
Let's understand the process of joining two pandas DataFrames using merge(), explaining the key concepts, parameters, and practical examples to make the process clear and accessible.
If the column names are the same in both tables, you just need to use on to specify that column name. For example:
ID Name Age 0 1 Emily 24 1 2 Jack 27
This example performs an inner join, resulting in a DataFrame that includes only the rows with matching ID values.
The core idea behind merge() is simple: it allows to specify how the rows from two DataFrames should be aligned based on one or more keys (columns or indexes). The result is a new DataFrame that contains data from both original DataFrames.
Syntax:
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None)
Parameters:
An inner join keeps rows from both DataFrames where there is a match in the specified column(s).
Output
First DataFrame:
fruit market_price
0 apple 21
1 banana 14
2 avocado 35Second DataFrame:
fruit wholesaler_price
0 banana 65
1 apple 68
2 avocado 75Merged DataFrame:
fruit market_price wholesaler_price
0 apple 21 68
1 banana 14 65
2 avocado 35 75
An outer join includes all rows from both DataFrames so If we use how = "Outer" , it returns all elements in df1 and df2 but if element column are null then its return NaN value.
Output
fruit market_price wholesaler_price
0 apple 21 68
1 avocado 35 75
2 banana 14 65
A left join keeps all rows from the left DataFrame, adding only matching rows from the right.
Output
fruit market_price wholesaler_price
0 apple 21 68
1 banana 14 65
2 avocado 35 75
A right join keeps all rows from the right DataFrame, adding only matching rows from the left.
Output
fruit market_price wholesaler_price
0 banana 14 65
1 apple 21 68
2 avocado 35 75
Here are the main points to remember when joining two DataFrames using merge():