VOOZH about

URL: https://www.geeksforgeeks.org/python/udf-to-sort-list-in-pyspark/

⇱ UDF to sort list in PySpark - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

UDF to sort list in PySpark

Last Updated : 23 Jul, 2025

The most useful feature of Spark SQL used to create a reusable function in Pyspark is known as UDF or User defined function in Python. The column type of the Pyspark can be String, Integer, Array, etc. There occurs some situations in which you have got ArrayType column in Pyspark data frame and you need to sort that list in each Row of the column. This can be achieved in various ways but the easiest way is to do using UDF. In this article, we will discuss the same.

Example 1:

In this example, we have created a data frame with four columns 'Full_Name', 'Date_Of_Birth', 'Gender', 'Fees'. The 'Full_Name' column is further nested and contains a list with the list values 'First_Name', 'Middle_Name' and 'Last_Name' as follows:

👁 Image
 

Then, we created a user-defined function to sort the ArrayType column, i.e., Full_Name in ascending order and put the sorted values in the new column of the data frame 'Sorted_Full_Name' by calling that user-defined function.

Output:

👁 Image
 

Example 2:

In this example, we have created the data frame with two columns 'name' and 'marks'. The 'marks' column has the data in the form of a list as follows:

👁 Image
 

Then, we created a user-defined function to sort the ArrayType column, i.e., marks in descending order, and put the sorted values in the new column of the data frame 'Sorted_Marks' by calling that user-defined function.

Output:

👁 Image
 
Comment