VOOZH about

URL: https://thenewstack.io/apple-comet-brings-fast-vector-processing-to-apache-spark/

⇱ Apple's Comet Brings Fast Vector Processing to Apache Spark - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-02-08 13:26:22
Apple's Comet Brings Fast Vector Processing to Apache Spark
sponsor-percona,sponsored-topic,
Data

Apple’s Comet Brings Fast Vector Processing to Apache Spark

Apple Software Engineer Chao Sun has submitted this Rust-based plug-in to become an Apache Software Foundation project, under the Apache Arrow umbrella.
Feb 8th, 2024 1:26pm by Joab Jackson
👁 Featued image for: Apple’s Comet Brings Fast Vector Processing to Apache Spark
Feature image via OpenClipart-Vectors. 

Consumer electronics giant Apple has released into open source a plug-in that would help Apache Spark execute vector searches more efficiently, making the open source data processing platform more appealing for large-scale machine learning data crunching.

The Apple engineers behind the Rust-based plug-in, called Apache Spark DataFusion Comet, have submitted it to become an Apache Software Foundation project, under the Apache Arrow umbrella. It is built on the extensible Apache DataFusion query engine (also written in Rust) and the Arrow columnar data format.

“Our goal is to accelerate Spark query execution via delegating Spark’s physical plan execution to DataFusion’s highly modular execution framework, while still maintaining the same semantics to Spark users,” explained Apple Software Engineer Chao Sun, on an Apache mailing list.

Sun noted that the project is not yet feature-complete, but parts of it are already used in production.

“This is a great example of the composable data system concept that everyone seems to be talking about lately,” noted Apache Arrow Project Management Committee Chair Andy Grove on X. “In this case, using Spark’s very mature planning and scheduling and delegating to DataFusion for native execution.”

What Is Apache Arrow DataFusion Comet?

Using the Apache Arrow DataFusion runtime, Comet can query data in the Apache Arrow columnar format, an approach designed to improve query efficiency and query runtime through native vectorized execution.

Apache Spark was created in 2010 for processing large amounts of distributed data in a variety of formatted and unformatted structures (“Big Data“).

Vector processing has become a favorite technique in the machine learning community thanks to how it can cut time in analyzing large amounts of data.

“Vectorized querying improves the performance, efficiency, scalability and memory footprint of analytical queries by operating on batches of data and processing multiple elements of data in parallel. It is inextricably linked with columnar database architecture, as it allows entire columns to be loaded into a CPU register and processed,” wrote Fivetran Senior Product Evangelist Charles Wang, in an analysis piece last month.

Comet was designed to keep feature parity with Spark itself (currently, it supports Spark versions 3.2 – 3.4). This means users can run the same queries, regardless if Comet extension is being used.

Spark built-in expressions and operators (Filter/Project/Aggregation/Join/Exchange) can work with Comet, as can Apache Parquet columnar storage format, in either read and write mode.

Comet also requires JDK 8 and up and GLIBC 2.17, and can run on either Linux or the Mac OS.

👁 Image

Other Spark Plug-ins That Speed Vector Processing

Apple is not the only member of the FAANG club interested in vector processing: Last year, Meta also released into open source its own project for Spark vector processing: Velox, noted software engineer Chris Riccomini.

Similar projects include Intel’s Gluten (recently accepted into ASF incubation), Nvidia‘s RAPIDS Spark accelerator for GPUs,  Blaze (which also works with Apache Arrow DataFusion), and the Ballista distributed SQL query engine.

Percona is widely recognized as a world-class open source database software, support, and services company for MySQL®, MongoDB®, and PostgreSQL® databases. We are dedicated to helping make your databases and applications run better through a unique combination of expertise and open source software.  
Learn More
The latest from Percona
TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.