VOOZH about

URL: https://www.geeksforgeeks.org/python/renaming-columns-for-pyspark-dataframes-aggregates/

⇱ Renaming columns for PySpark DataFrames Aggregates - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Renaming columns for PySpark DataFrames Aggregates

Last Updated : 19 Dec, 2021

In this article, we will discuss how to rename columns for PySpark dataframe aggregates using Pyspark.

Dataframe in use:

👁 Image

In PySpark,  groupBy() is used to collect the identical data into groups on the PySpark DataFrame and perform aggregate functions on the grouped data. These are available in functions module:

Method 1: Using alias()

We can use this method to change the column name which is aggregated.

Syntax:

dataframe.groupBy('column_name_group').agg(aggregate_function('column_name').alias("new_column_name"))

where,

  • dataframe  is the input dataframe
  • column_name_group is the grouped column
  • aggregate_function is the function from the above functions
  • column_name is the column where aggregation is performed
  • new_column_name is the new name for column_name

Example 1: Aggregating DEPT column with sum() and avg() by changing FEE column name to Total Fee

Output:

👁 Image

Example 2 : Aggregating DEPT column with min(),count(),mean() and max() by changing FEE column name to Total Fee

Output:

👁 Image

Method 2: Using withColumnRenamed()

This takes a resultant aggregated column name and renames this column. After aggregation, It will return the column names as aggregate_operation(old_column)

so using this we can replace this with our new column

Syntax:

dataframe.groupBy("column_name_group").agg({"column_name":"aggregate_operation"}).withColumnRenamed("aggregate_operation(column_name)", "new_column_name")

Example: Aggregating DEPT column with sum() FEE and rename to Total Fee

Output:

👁 Image
Comment
Article Tags: