![]() |
VOOZH | about |
DataFrame.sum() function in Pandas allows users to compute the sum of values along a specified axis. It can be used to sum values along either the index (rows) or columns, while also providing flexibility in handling missing (NaN) values. Example:
Column-wise sum: A 6 B 15 C 24 dtype: int64 Row-wise sum: 0 12 1 15 2 18 dtype: int64
Explanation: This code creates a DataFrame from a dictionary and calculates sums along both columns and rows. By default, summing along columns (axis=0) adds all values in each column separately. When summing along rows (axis=1), it adds values in each row. Missing values, if present, would be ignored.
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters:
Returns: A Series or scalar containing the summed values along the specified axis.
Let's use the Pandas sum() function to find the sum of all values over the index axis (rows) in a DataFrame. In this example, we'll exclude NaN values while calculating the sum. Dataset Link: nba.csv
Output:
Explanation: sum() function adds up all the values in each column. If a column has missing values (NaN), they are ignored. If there are non-numeric columns (like text), they are converted to numbers before summing.
Now, let's calculate the sum of all values over the column axis (columns) using the sum() function. Again, we'll ensure that NaN values are excluded from the sum.
Output:
Explanation: Instead of adding up values in columns, this example sums up values in each row. First, it selects only numeric columns to avoid errors. Missing values are ignored during the summation.
The min_count parameter ensures that the sum operation is performed only if at least a certain number of non-NaN values are present. Otherwise, the result will be NaN.
Sum with min_count=2: A 4.0 B NaN C 24.0 dtype: float64
Explanation: This example sets a rule i.e. only sum values in a column if at least two non-missing numbers are present. If a column has too many missing values, the sum result for that column is NaN instead of a number.