Remove all columns where the entire column is null in PySpark DataFrame

Last Updated : 23 Jul, 2025

In this article, we'll learn how to drop the columns in DataFrame if the entire column is null in Python using Pyspark.

Creating a spark dataframe with Null Columns:

To create a dataframe with pyspark.sql.SparkSession.createDataFrame() methods.

Syntax

pyspark.sql.SparkSession.createDataFrame()

Parameters:

dataRDD: An RDD of any kind of SQL data representation(e.g. Row, tuple, int, boolean, etc.), or list, or pandas.DataFrame.
schema: A datatype string or a list of column names, default is None.
samplingRatio: The sample ratio of rows used for inferring
verifySchema: Verify data types of every row against schema. Enabled by default.

Returns: Dataframe

Output:

+---------+----------+---------+------+------+
|firstname|middlename|lastname |gender|salary|
+---------+----------+---------+------+------+
|James |null |Bond |M |6000 |
|Michael |null |null |M |4000 |
|Robert |null |Pattinson|M |4000 |
|Natalie |null |Portman |F |4000 |
|Julia |null |Roberts |F |1000 |
+---------+----------+---------+------+------+

Remove all columns where the entire column is null in PySpark DataFrame

Here we want to drop all the columns where the entire column is null, as we can see the middle name columns are null and we want to drop that.

{'firstname': 0, 'middlename': 5, 'lastname': 1, 'gender': 0, 'salary': 0}
['middlename']
+---------+---------+------+------+
|firstname|lastname |gender|salary|
+---------+---------+------+------+
|James |Bond |M |6000 |
|Michael |null |M |4000 |
|Robert |Pattinson|M |4000 |
|Natalie |Portman |F |4000 |
|Julia |Roberts |F |1000 |
+---------+---------+------+------+

Comment

Article Tags:

Technical Scripter

Python

Technical Scripter 2022

Explore

Python Fundamentals

Python Data Structures

Advanced Python

Data Science with Python

Web Development with Python

Python Practice

Python Courses

URL: https://www.geeksforgeeks.org/python/remove-all-columns-where-the-entire-column-is-null-in-pyspark-dataframe/

⇱ Remove all columns where the entire column is null in PySpark DataFrame - GeeksforGeeks

Remove all columns where the entire column is null in PySpark DataFrame

Creating a spark dataframe with Null Columns:

Remove all columns where the entire column is null in PySpark DataFrame

Explore