VOOZH about

URL: https://www.geeksforgeeks.org/python/get-specific-row-from-pyspark-dataframe/

⇱ Get specific row from PySpark dataframe - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Get specific row from PySpark dataframe

Last Updated : 18 Jul, 2021

In this article, we will discuss how to get the specific row from the PySpark dataframe.

Creating Dataframe for demonstration:

Output:

👁 Image

Method 1: Using collect()

This is used to get the all row's data from the dataframe in list format.

Syntax: dataframe.collect()[index_position]

Where,

  • dataframe is the pyspark dataframe
  • index_position is the index row in dataframe

Example: Python code to access rows

Output:

Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1')

Row(Employee ID='2', Employee NAME='ojaswi', Company Name='company 2')

Row(Employee ID='5', Employee NAME='gnanesh', Company Name='company 1')

Row(Employee ID='3', Employee NAME='bobby', Company Name='company 3')

Method 2: Using show()

This function is used to get the top n rows from the pyspark dataframe.

Syntax: dataframe.show(no_of_rows)

where, no_of_rows is the row number to get the data

Example: Python code to get the data using show() function

Output:

👁 Image

Method 3: Using first()

This function is used to return only the first row in the dataframe.

Syntax: dataframe.first()

Example: Python code to select the first row in the dataframe.

Output:

Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1')

Method 4: Using head()

This method is used to display top n rows in the dataframe.

Syntax: dataframe.head(n)

where, n is the number of rows to be displayed

Example: Python code to display the number of rows to be displayed.

Output:

[Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1')]

[Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1'), 

Row(Employee ID='2', Employee NAME='ojaswi', Company Name='company 2'), 

Row(Employee ID='3', Employee NAME='bobby', Company Name='company 3')]

[Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1'), 

Row(Employee ID='2', Employee NAME='ojaswi', Company Name='company 2')]

Method 5: Using tail()

Used to return last n rows in the dataframe

Syntax: dataframe.tail(n)

where n is the no of rows to be returned from last in the dataframe.

Example: Python code to get last n rows

Output:

[Row(Employee ID='5', Employee NAME='gnanesh', Company Name='company 1')]

[Row(Employee ID='3', Employee NAME='bobby', Company Name='company 3'),

 Row(Employee ID='4', Employee NAME='rohith', Company Name='company 2'),

  Row(Employee ID='5', Employee NAME='gnanesh', Company Name='company 1')]

[Row(Employee ID='4', Employee NAME='rohith', Company Name='company 2'),

 Row(Employee ID='5', Employee NAME='gnanesh', Company Name='company 1')]

Method 6: Using select() with collect() method

This method is used to select a particular row from the dataframe, It can be used with collect() function.

Syntax: dataframe.select([columns]).collect()[index]

where, 

  • dataframe is the pyspark dataframe
  • Columns is the list of columns to be displayed in each row
  • Index is the index number of row to be displayed.

Example: Python code to select the particular row.

Output:

Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1')

Row(Employee ID='3', Employee NAME='bobby', Company Name='company 3')

Row(Employee ID='4', Employee NAME='rohith', Company Name='company 2')

Method 7: Using take() method

This method is also used to select top n rows

Syntax: dataframe.take(n)

where n is the number of rows to be selected

Output:

[Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1'), 

Row(Employee ID='2', Employee NAME='ojaswi', Company Name='company 2')]

[Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1'),

Row(Employee ID='2', Employee NAME='ojaswi', Company Name='company 2'),

 Row(Employee ID='3', Employee NAME='bobby', Company Name='company 3'),

  Row(Employee ID='4', Employee NAME='rohith', Company Name='company 2')]

[Row(Employee ID='1', Employee NAME='sravan', Company Name='company 1')]

Comment
Article Tags: