Many spark applications have now become legacy applications and it is very hard to enhance, test & run locally.
Spark has very good testing support but still many spark applications are not testable.
I will share one common error that appears when you try to run some old spark applications.
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.<init>(SparkContext.scala:376) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:909) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:901) at scala.Option.getOrElse(Option.scala:121)
When you see such an error you have 2 options:
β Forget that it canβt run locally and continue to work with this frustration
β Fix it to run locally and show the example of The Boy Scout Rule to your team
I will show a very simple pattern that will save you from such frustration.
def main(args: Array[String]): Unit = {
val localRun = SparkContextBuilder.isLocalSpark
val sparkSession = SparkContextBuilder.newSparkSession(localRun, "Happy Local Spark")
val numbers = sparkSession.sparkContext.parallelize(Range.apply(1, 1000))
val total = numbers.sum()
println(s"Total Value ${total}")
}This code is using isLocalSpark function to decide how to handle local mode. You can use any technique to make that decision like env parameter or command line parameter or anything else.
Once you know it runs locally then create spark context based on it.
Now this code can run locally or also via Spark-Submit.
Happy Spark Testing.
Code used in this blog is available @ runlocal repo
Published on Java Code Geeks with permission by Ashkrit Sharma, partner at our JCG program. See the original article here: Spark Run local design pattern Opinions expressed by Java Code Geeks contributors are their own. |
Thank you!
We will contact you soon.
Ashkrit SharmaJanuary 2nd, 2019Last Updated: January 11th, 2019

This site uses Akismet to reduce spam. Learn how your comment data is processed.