![]() |
VOOZH | about |
When working with datasets, we need to remove unnecessary columns to simplify the analysis. In Python, the Pandas library provides several simple ways to drop one or more columns from a DataFrame.
Below is the sample dataframe we will be using in this article:
A B C 0 A1 B1 C1 1 A2 B2 C2 2 A3 B3 C3 3 A4 B4 C4 4 A5 B5 C5
Let's explore different methods to remove one or multiple columns in pandas dataframe.
The most common method to remove columns is DataFrame.drop(). You can drop single or multiple columns by specifying their names.
To drop a single column, use the drop() method with the column’s name.
Output
A C
0 A1 C1
1 A2 C2
2 A3 C3
3 A4 C4
4 A5 C5
Explanation:
Output
A
0 A1
1 A2
2 A3
3 A4
4 A5
Note: axis=1 is used for columns, while axis=0 is for rows.
If you know the index positions of the columns to remove, you can use them instead of names, useful in automated processes.
Output
B
0 B1
1 B2
2 B3
3 B4
4 B5
Explanation:data.columns[[0, 2]] selects the 1st and 3rd columns (A and C) for removal.
loc[] method lets you remove columns by their names, which is useful for deleting specific columns without relying on their positions.
Output
A
0 A1
1 A2
2 A3
3 A4
4 A5
Explanation:data.loc[:, 'B':'C'].columns selects all columns from B to C, which are then dropped.
pop() removes a specified column and returns it as a Series, allowing you to use that column’s data separately.
Output
0 B1
1 B2
2 B3
3 B4
4 B5
Name: B, dtype: object
When a column has too many missing values, it may not be useful for analysis. In such cases, we can remove those columns by setting a limit (threshold) for how many missing values are allowed.
Example: The following code drops columns having more than 50% missing values using a threshold condition.
A C 0 1.0 1 1 2.0 2 2 NaN 3 3 4.0 4
Explanation: