![]() |
VOOZH | about |
Iterating over rows means processing each row one by one to apply some calculation or condition. For example, Consider a DataFrame of student's marks with columns Math and Science, you want to calculate the total score per student row by row.
Let’s consider this DataFrame:
Output
A B C
0 5 10 X
1 7 20 Y
2 3 30 X
3 9 40 Z
4 2 50 Y
Vectorized operations operate on whole columns at once (no Python-level loop). They are the fastest and most memory-efficient for column-wise transformations.
Example: In this example, compute Result = A*B when C == 'X', otherwise Result = A + B, using np.where.
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation:
itertuples() yields each row as a named tuple. It’s faster and lighter than iterrows() and preserves dtypes good when you need Python-level row access but care about performance.
Example: In this example, compute the same Result using itertuples() and collect results in a list.
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation:
.apply() runs a function on each row (or column). It’s readable and good for more complex row-level logic when vectorization is difficult. It’s usually slower than itertuples() but easier to express complex rules.
Example: In this example, use apply() with a small function returning the same Result.
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation:
iterrows() yields rows as Series objects. It’s easy to use but slow and may change dtypes; avoid for large data.
Example: In this example, compute Result with iterrows() and print each row’s total.
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation: