![]() |
VOOZH | about |
In real-world datasets, missing values (NaN) are common and can cause errors in calculations or visualizations. One practical solution is to replace NaN values with the average of their respective columns. Below are different methods, arranged from most efficient to least efficient.
This vectorized approach uses NumPy functions to calculate column means while ignoring NaNs and replaces them efficiently in the original array. It is fast and memory-efficient.
Example: Here we compute the column averages and replace all NaN values with the corresponding column mean using np.take.
[[1.3 2.5 3.6 6. ] [2.6 3.3 4.5 5.5] [2.1 3.2 5.4 6.5]]
Explanation:
This method creates a masked array where NaNs are ignored and uses np.where to build a new array with column averages replacing NaNs.
Example: Replace NaNs by computing column averages on a masked array and filling in missing values using np.where.
[[1.3 2.5 3.6 6. ] [2.6 3.3 4.5 5.5] [2.1 3.2 5.4 6.5]]
Explanation:
This method iterates over NaN positions and replaces them with column averages. Slower than vectorized NumPy approaches but works without NumPy functions.
Example: We find NaN positions using np.where, then replace each NaN with the column mean using a loop.
[[1.3 2.5 3.6 6. ] [2.6 3.3 4.5 5.5] [2.1 3.2 5.4 6.5]]
Explanation:
For Python lists (not NumPy arrays), you can compute column averages and replace None values using list comprehension. Slower due to Python loops.
Example: Compute column averages and replace None values with the corresponding mean for each row.
[[1.3, 2.5, 3.6, 6.0], [2.6, 3.3, 4.5, 5.5], [2.1, 3.2, 5.4, 6.5]]
Explanation: