VOOZH about

URL: https://thenewstack.io/enhancing-the-flexibility-of-sparks-physical-plan-to-enable-execution-on-various-native-engines/

⇱ Enhancing Apache Spark for Enhanced Flexibility and Performance - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-06-08 10:00:44
Enhancing Apache Spark for Enhanced Flexibility and Performance
contributed,
DevOps / Platform Engineering

Enhancing Apache Spark for Enhanced Flexibility and Performance

This proposal aims to introduce the physical plan conversion, validation, and fallback mechanisms from the Gluten project into Apache Spark.
Jun 8th, 2024 10:00am by Mark Andreev
👁 Featued image for: Enhancing Apache Spark for Enhanced Flexibility and Performance
Image by Ulrike Mai from Pixabay.

This proposal aims to introduce the physical plan conversion, validation, and fallback mechanisms from the Gluten project into Apache Spark. This will allow Spark to have greater flexibility and robustness when executing physical plans, while also taking advantage of the performance optimizations provided by Gluten.

Motivation

Apache Spark currently lacks an official mechanism to support cross-platform execution of physical plans. The Gluten project offers a mechanism that utilizes the Substrait standard to convert and optimize Spark’s physical plans. By introducing Gluten’s plan conversion, validation, and fallback mechanisms into Spark, we can significantly enhance the portability and interoperability of Spark’s physical plans, enabling them to operate across a broader spectrum of execution environments without requiring users to migrate, while also improving Spark’s execution efficiency through the utilization of Gluten’s advanced optimization techniques.

Design Proposal

  1. To integrate Gluten’s plan conversion mechanism within Spark, we first defined an interface called TransformSupport, which inherits from SparkPlan. The TransformSupport interface includes two methods: def validate(): Boolean for validating if this operator/expression is supported in native code, and def doTransform(): SubstraitPlan for performing the plan conversion. Next, we defined three interfaces — LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport — to adapt to different types of operators. For example, ProjectExecTransformer will inherit from UnaryTransformSupport, HashJoinTransformer will inherit from BinaryTransformSupport, and DatasourceScanTransformer will inherit from LeafTransformSupport. This way, we can extend SparkPlan to support plan conversion.

👁 Image

2. During the physical plan validation process, we will pass the converted Substrait plan in Protobuf format to different native backends for validation. This process includes checking supported function signatures and data types. If the validation is successful, it returns true; otherwise, it returns false.

👁 Image

3. If the validation fails, we will continue to use Spark’s own operators to execute the plan, but at this point, support for column-to-row (C2R) and row-to-column (R2C) data transformations is required. Taking the Velox backend as an example, we need to convert Velox’s columnar data format to Spark’s UnsafeRow format in the C2R transformation; conversely, in the R2C transformation, we convert Spark’s UnsafeRow format to Velox’s columnar data format.

👁 Image

To implement the execution of Spark physical plans on a native engine, we need to complete three key steps: plan transformation, validation, and fallback. Given that these steps involve substantial modifications to Spark, we have decided to first focus on the plan transformation phase. By utilizing Substrait, we will convert Spark plans into a common format, thereby achieving compatibility with various backends. After the first step is completed, we will further consider the remaining validation and fallback steps.

Conclusions

We have adeptly incorporated this conversion methodology into Apache Gluten, enabling seamless support for both ClickHouse and Velox backends. Our integration efforts with either backend have culminated in a substantial performance breakthrough. For an in-depth look at the enhancements, please refer to the information provided at this link. Moreover, it is with great pride that we acknowledge the successful deployment of Gluten by numerous customers within their production settings.

TRENDING STORIES
Mark Andreev is a Senior Software Engineer living in London, UK, specializing in data-intensive applications. His focus is Java related projects that help to process a large amount of data. During his career Mark was a data scientist which influenced...
Read more from Mark Andreev
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: ClickHouse.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.