Renaming columns for PySpark DataFrames Aggregates

Last Updated : 19 Dec, 2021

In this article, we will discuss how to rename columns for PySpark dataframe aggregates using Pyspark.

Dataframe in use:

In PySpark, groupBy() is used to collect the identical data into groups on the PySpark DataFrame and perform aggregate functions on the grouped data. These are available in functions module:

Method 1: Using alias()

We can use this method to change the column name which is aggregated.

Syntax:

dataframe.groupBy('column_name_group').agg(aggregate_function('column_name').alias("new_column_name"))

where,

dataframe is the input dataframe
column_name_group is the grouped column
aggregate_function is the function from the above functions
column_name is the column where aggregation is performed
new_column_name is the new name for column_name

Example 1: Aggregating DEPT column with sum() and avg() by changing FEE column name to Total Fee

Output:

👁 Image

Example 2 : Aggregating DEPT column with min(),count(),mean() and max() by changing FEE column name to Total Fee

Output:

👁 Image

Method 2: Using withColumnRenamed()

This takes a resultant aggregated column name and renames this column. After aggregation, It will return the column names as aggregate_operation(old_column)

so using this we can replace this with our new column

Syntax:

dataframe.groupBy("column_name_group").agg({"column_name":"aggregate_operation"}).withColumnRenamed("aggregate_operation(column_name)", "new_column_name")

Example: Aggregating DEPT column with sum() FEE and rename to Total Fee

Output:

👁 Image

Comment

Article Tags:

Python

Python-Pyspark

Explore

Python Fundamentals

Python Data Structures

Advanced Python

Data Science with Python

Web Development with Python

Python Practice

Python Courses

URL: https://www.geeksforgeeks.org/python/renaming-columns-for-pyspark-dataframes-aggregates/