VOOZH about

URL: https://www.geeksforgeeks.org/python/how-to-add-multiple-columns-in-pyspark-dataframes/

⇱ How to Add Multiple Columns in PySpark Dataframes ? - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

How to Add Multiple Columns in PySpark Dataframes ?

Last Updated : 30 Jun, 2021

In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes. 

Let's create a sample dataframe for demonstration:

Dataset Used: Cricket_data_set_odi

Output:

πŸ‘ Image

Method 1: Using withColumn()

withColumn() is used to add a new or update an existing column on DataFrame

Syntax: df.withColumn(colName, col)

Returns: A new :class:`DataFrame` by adding a column or replacing the existing column that has the same name. 

Code:

Output:

πŸ‘ Image

Method 2: Using select()

You can also add multiple columns using select.

Syntax: df.select(*cols)

Code:

Output :

πŸ‘ Image

Method 3: Adding a Constant multiple Column to DataFrame Using withColumn() and select()

Let’s create a new column with constant value using lit() SQL function, on the below code. The lit() function present in Pyspark is used to add a new column in a Pyspark Dataframe by assigning a constant or literal value.

Output:

πŸ‘ Image
Comment
Article Tags: