![]() |
VOOZH | about |
In this article, we are going to sort the dataframe columns in the pyspark. For this, we are using sort() and orderBy() functions in ascending order and descending order sorting.
Let's create a sample dataframe.
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 1| sravan|company 1| | 2| ojaswi|company 1| | 3| rohith|company 2| | 4| sridevi|company 1| | 1| sravan|company 1| | 4| sridevi|company 1| +-----------+-------------+---------+
The sort function is used to sort the data frame column.
Syntax: dataframe.sort(['column name'], ascending=True).show()
Example 1: Arrange in ascending Using Sort() with one column
Sort the data based on Employee Name in increasing order
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 1| sravan|company 1| | 1| sravan|company 1| | 2| ojaswi|company 1| | 3| rohith|company 2| | 4| sridevi|company 1| | 4| sridevi|company 1| +-----------+-------------+---------+
Sort the data based on Employee name in decreasing order:
Syntax: dataframe.sort(['column name'], ascending = False).show()
Code:
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| +-----------+-------------+---------+
Example 2: Using Sort() with multiple columns
We are going to sort the dataframe based on employee id and employee name in ascending order.
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 1| sravan|company 1| | 1| sravan|company 1| | 2| ojaswi|company 1| | 3| rohith|company 2| | 4| sridevi|company 1| | 4| sridevi|company 1| +-----------+-------------+---------+
We are going to sort the dataframe based on employee ID, company, and employee name in descending order
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
Example 3: Sort by ASC methods.
ASC method of the Column function, it returns a sort expression based on the ascending order of the given column name.
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 1| sravan|company 1| | 1| sravan|company 1| | 2| ojaswi|company 1| | 3| rohith|company 2| | 4| sridevi|company 1| | 4| sridevi|company 1| +-----------+-------------+---------+
Example 4: Sort by DESC methods.
DESC method of the Column function, it returns a sort expression based on the descending order of the given column name.
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
The orderBy() function sorts by one or more columns. By default, it sorts by ascending order.
Syntax: orderBy(*cols, ascending=True)
Parameters:
- cols→ Columns by which sorting is needed to be performed.
- ascending→ Boolean value to say that sorting is to be done in ascending order
Example 1: ascending for one column
Python program to sort the dataframe based on Employee ID in ascending order
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
Python program to sort the dataframe based on Employee ID in descending order
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
Example 2: Ascending multiple columns
Sort the dataframe based on employee ID and employee Name columns in descending order using orderBy.
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
Sort the dataframe based on employee ID and employee Name columns in ascending order
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 1| sravan|company 1| | 1| sravan|company 1| | 2| ojaswi|company 1| | 3| rohith|company 2| | 4| sridevi|company 1| | 4| sridevi|company 1| +-----------+-------------+---------+