VOOZH about

URL: https://www.geeksforgeeks.org/python/convert-comma-separated-string-to-array-in-pyspark-dataframe/

⇱ Convert comma separated string to array in PySpark dataframe - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Convert comma separated string to array in PySpark dataframe

Last Updated : 23 May, 2021

In this article, we will learn how to convert comma-separated string to array in pyspark dataframe.

In pyspark SQL, the split() function converts the delimiter separated String to an Array.  It is done by splitting the string based on delimiters like spaces, commas, and stack them into an array. This function returns pyspark.sql.Column of type Array.

Syntax: pyspark.sql.functions.split(str, pattern, limit=-1)

Parameter:

  • str:- The string to be split.
  • limit:- an integer that controls the number of times pattern is applied
  • pattern:- The delimiter that is used to split the string.

Examples

Let's look at few examples to understand the working of the code.

Example 1: Working with String Values

Let's look at a sample example to see the split function in action. For this example, we have created our custom dataframe and use the split function to create a name contacting the name of the student. Here we are going to apply split to the string data format columns.

Output:

👁 Image

Example 2: Working with Integer Values

If we want to convert to the numeric type we can use the cast() function with split() function. In this example we are using the cast() function to build an array of integers, so we will use cast(ArrayType(IntegerType())) where it clearly specifies that we need to cast to an array of integer type.

Output:

👁 Image

Example 3: Working with both Integer and String Values

There may be a condition where we need to check for each column and do split if a comma-separated column value exists. The split() function comes loaded with advantages. There might a condition where the separator is not present in a column. The split() function handles this situation by creating a single array of the column value in place of giving an exception. This may come in handy sometimes.

Output:

👁 Image
Comment
Article Tags: