![]() |
VOOZH | about |
An open-source manipulation tool that is used for handling data is known as Pandas. Have you ever encountered a dataset that has columns with data as a list? In such cases, there is a necessity to split that column into various columns, as Pandas cannot handle such data. In this article, we will discuss the same, i.e., unnest or explode multiple list columns into a Pandas data frame.
Pandas is an open-source data manipulation and analysis tool built on top of the Python programming language. It provides powerful data structures, such as DataFrame and Series, that allow users to easily manipulate and analyze data.
Nested list columns are columns in a DataFrame where each cell contains a list of values, rather than a single scalar value. This occurs when the data is structured hierarchically, with each cell representing a collection of related sub-values.
Decoupling multiple list columns in a data frame can be useful for several reasons:
The way of flattening nested Series objects and DataFrame columns by splitting their content into multiple rows is known as the explode function. In this method, we will see how we can unnest multiple list columns using the explode function.
Syntax:
df=df.explode(['Favourite Ice-cream', 'Favourite Soft-Drink']).reset_index(drop=True)
Here,
- column-1, column-2: These are the columns that you want to unnest.
- df: It is the data frame that has those nested columns.
In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using the explode function.
Output:
Actual dataframe:
Name Favourite Ice-cream Favourite Soft-Drink
0 Arun [Strawberry, Choco-chips] [Coca Cola, Lemonade]
1 Aniket [Vanilla, Black Currant] [Thumbs Up, Sprite]
2 Ishita [Butterscotch, Chocolate] [Moutain Dew, Fanta]
3 Raghav [Mango, Choco-chips] [Mirinda, Maaza]
4 Vinayak [Kulfi, Kaju-Kishmish] [7Up, Sprite]
Dataframe after unnesting:
Name Favourite Ice-cream Favourite Soft-Drink
0 Arun Strawberry Coca Cola
1 Arun Choco-chips Lemonade
2 Aniket Vanilla Thumbs Up
3 Aniket Black Currant Sprite
4 Ishita Butterscotch Moutain Dew
5 Ishita Chocolate Fanta
6 Raghav Mango Mirinda
7 Raghav Choco-chips Maaza
8 Vinayak Kulfi 7Up
9 Vinayak Kaju-Kishmish Sprite
The function that splits a series object containing list-like values into multiple rows, one for each element in the list is known as pandas.series.explode function. In this method, we will see how we can unnest multiple list columns using the pandas.series.explode function.
Syntax:
df=df.set_index(['column-3']).apply(pd.Series.explode).reset_index()
Here,
- column-3: It is the column that is already unnested.
- df: It is the data frame that has those nested columns.
In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using pandas.series.explode function.
Output:
Actual dataframe:
Name Favourite Ice-cream Favourite Soft-Drink
0 Arun [Strawberry, Choco-chips] [Coca Cola, Lemonade]
1 Aniket [Vanilla, Black Currant] [Thumbs Up, Sprite]
2 Ishita [Butterscotch, Chocolate] [Moutain Dew, Fanta]
3 Raghav [Mango, Choco-chips] [Mirinda, Maaza]
4 Vinayak [Kulfi, Kaju-Kishmish] [7Up, Sprite]
Dataframe after unnesting:
Name Favourite Ice-cream Favourite Soft-Drink
0 Arun Strawberry Coca Cola
1 Arun Choco-chips Lemonade
2 Aniket Vanilla Thumbs Up
3 Aniket Black Currant Sprite
4 Ishita Butterscotch Moutain Dew
5 Ishita Chocolate Fanta
6 Raghav Mango Mirinda
7 Raghav Choco-chips Maaza
8 Vinayak Kulfi 7Up
9 Vinayak Kaju-Kishmish Sprite
An anonymous function that can take any number of arguments, but can only have one expression is known as lambda function. In this method, we will see how we can unnest multiple list columns using the pandas.series with lambda function.
Syntax:
df=df.set_index('Name').apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)
Here,
- column-3: It is the column that is already unnested.
- df: It is the data frame that has those nested columns.
In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using pandas.series with lambda function.
Output:
Actual dataframe:
Name Favourite Ice-cream Favourite Soft-Drink
0 Arun [Strawberry, Choco-chips] [Coca Cola, Lemonade]
1 Aniket [Vanilla, Black Currant] [Thumbs Up, Sprite]
2 Ishita [Butterscotch, Chocolate] [Moutain Dew, Fanta]
3 Raghav [Mango, Choco-chips] [Mirinda, Maaza]
4 Vinayak [Kulfi, Kaju-Kishmish] [7Up, Sprite]
Dataframe after unnesting:
Name Favourite Ice-cream Favourite Soft-Drink
0 Arun Strawberry Coca Cola
1 Arun Choco-chips Lemonade
2 Aniket Vanilla Thumbs Up
3 Aniket Black Currant Sprite
4 Ishita Butterscotch Moutain Dew
5 Ishita Chocolate Fanta
6 Raghav Mango Mirinda
7 Raghav Choco-chips Maaza
8 Vinayak Kulfi 7Up
9 Vinayak Kaju-Kishmish Sprite