![]() |
VOOZH | about |
Iterating over rows in a Pandas DataFrame means accessing each row one by one to perform operations or calculations. For example, you have a DataFrame of employees salaries and bonuses and want to calculate total compensation for each employee efficient row-wise operations are essential.
Let’s consider this DataFrame:
Output
A B C
0 2 21 X
1 7 21 X
2 14 27 X
3 2 29 X
4 16 21 Z
5 18 10 Y
6 7 28 Z
7 12 21 Z
8 15 11 X
9 13 24 Z
Now, let’s explore the most efficient methods one by one.
itertuples() returns each row as a lightweight named tuple, preserving data types and consuming less memory. It is ideal for large datasets when you need structured row-wise access.
Example: In this example, we compute a new column Result based on column C. If C is 'X', we multiply A and B; otherwise, we add them.
Output
A B C Result
0 11 12 X 132
1 10 24 Y 34
2 11 28 Y 39
3 17 22 Z 39
4 9 20 Z 29
5 13 15 Z 28
6 2 27 Y 29
7 10 18 Z 28
8 5 14 Y 19
9 17 25 X 425
Explanation:
.apply() allows applying a custom function to each row or column. It is flexible for complex logic that depends on multiple columns but slower than itertuples().
Example: In this example, a custom function calculates Result based on column C.
Output
A B C Result
0 16 23 X 32
1 2 26 X 4
2 5 24 Z 72
3 16 22 X 32
4 16 28 X 32
5 9 10 Z 30
6 17 16 Z 48
7 15 11 Y 33
8 3 27 Z 81
9 9 10 Y 30
Explanation:
Vectorized operations perform calculations on entire columns at once without explicit iteration. They are the fastest method for large datasets and should be preferred when possible.
Example: In this example, Result is computed using np.where for conditional vectorized operations.
Output
A B C Result
0 5 28 X 140
1 18 29 X 522
2 6 17 X 102
3 11 19 Y 30
4 15 10 X 150
5 9 20 Y 29
6 14 17 X 238
7 8 16 X 128
8 2 22 Y 24
9 11 27 X 297
Explanation: