![]() |
VOOZH | about |
Goal is to remove leading/trailing whitespace and make each product name have its first letter uppercase and the rest lowercase (e.g., UMbreLla -> Umbrella).
Let's consider a DataFrame with product names that are not formatted properly:
Output
Date Product Updated_Price Discount
0 10/2/2011 UMbreLla 1250 10
1 11/2/2011 maTtress 1450 8
2 12/2/2011 BaDmintoN 1550 15
3 13/2/2011 Shuttle 400 10
Now let's see different methods to clean product names in a DataFrame.
Pandas provides a set of vectorized string functions accessible via .str accessor. These functions operate efficiently on entire columns at once, allowing you to clean text data quickly.
Output
Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10
Explanation:
apply() function can be used to execute a custom operation on each element of a column individually. By combining it with a lambda function, you can remove unwanted spaces and adjust capitalization of strings.
Output
Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10
Explanation:df['Product'].apply(lambda x: x.strip().capitalize()) for each product string x, remove outer whitespace and capitalize it; assign results back to the column.
Explicit loop allows you to access and modify each row of DataFrame manually. By iterating over the rows and updating column values with string operations like strip() and capitalize(), you can clean the data step by step.
Output
Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10
Explanation: