VOOZH about

URL: https://thenewstack.io/celerdata-upends-real-time-data-analytics-with-dynamic-table-joins/

⇱ CelerData Upends Real-Time Data Analytics with Dynamic Table Joins - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-09-26 07:25:05
CelerData Upends Real-Time Data Analytics with Dynamic Table Joins
sponsor-celerdata,sponsored-topic,
C++ / Data

CelerData Upends Real-Time Data Analytics with Dynamic Table Joins

StarRocks, an open source, real-time OLAP database, performs joins on-the-fly for performance and cost advantages on data with low latency.
Sep 26th, 2023 7:25am by Jelani Harper
👁 Featued image for: CelerData Upends Real-Time Data Analytics with Dynamic Table Joins

The shift to real-time analytics, infrastructure, and architecture is impacting organizations across industries and use cases whether involving Internet of Things deployments like digital twins or wearables, horizontal concerns like supply chain management, or fraud detection and recommendation engines in AdTech, the need to analyze and act on data with low latency is only increasing.

The most accomplished OLAP databases for such tasks are written in C++ to accommodate these performance needs. Many integrate with streaming data platforms like Apache Flink or Spark Streaming to handle the preprocessing their architectures require for such timely analytics.

Regardless of the particular approach or database involved in such matters, there’s no getting around one simple fact that’s consistently proved determinative for real-time OLAP databases.

In almost all cases, the data that are analyzed is on more than one table.

CelerData helps enterprises accelerate business growth with a unified analytics platform that delivers 3X the performance of any other solution on the market while reducing operating costs by up to 80%. Powered by StarRocks, CelerData is used worldwide by leading brands including Airbnb and Lenovo.
Learn More
The latest from CelerData

“Aside from analyzing logs, or analyzing user behavior, and sometimes not even that, for every other scenario you actually need joins,” revealed CelerData product marketing manager Sida Shen. “There’s really not that many scenarios where you don’t need joins.”

CelerData’s real-time, open source OLAP database StarRocks is one of the few options in this space that dynamically performs join operations on tables with low latency data. Because of its architecture, this real-time database is considerably more flexible, swifter, and cost-effective than many of its competitors are — which produces tremendous advantages for users when it’s deployed at an enterprise scale.

On-the-Fly Joins

According to Shen, StarRocks’ ability to rapidly perform dynamic joins on real-time data is “unique” among OLAP databases in this field. From an architectural perspective, this advantage is largely due to the fact that StarRocks “has a natively built cost-based optimizer,” Shen remarked, which supports scalable join operations. Typically, other OLAP databases can only process single table queries on real-time data and require preprocessing to join tables so organizations can query across them.

Considering the speed and sizes of the data in real-time analytics use cases, preprocessing for joins is “one of the most expensive things you can do with OLAP databases, joining two large tables,” Shen commented. Since StarRocks can join tables on the fly for these low latency use cases, its users avoid those costs and the time spent denormalizing their tables to facilitate joins. “Data lake engines can do joins because they do ETL jobs, but real-time OLAP databases give up on that because it needs a lot of optimization on the query planning side,” Shen explained. “Our architecture supports joins internally.”

Denormalization Realities

Without the capability to dynamically join tables, other OLAP databases for real-time analytics account for this fact with denormalization processing that frequently entails platforms like Spark Streaming or Flink. “Denormalization is when you pre-join your tables together based on your query pattern,” Shen specified. After the tables are joined into a flat table in the preprocessing platform, the latter table is ingested and analyzed in the real-time OLAP database. It’s not uncommon for organizations to generate copious amounts of code for these operations, which may be tenuous.

“This is where it gets very complicated,” Shen admitted. “It’s very difficult to configure, it breaks a lot, and it requires a lot of resources. Just a lot of maintenance, and this is on the cost side, like hardware and man-hour costs.” Moreover, when schema changes arise, there’s a definite possibility of having to redo this preparation work. In that case, “you have to reconfigure the entire pipeline and sometimes you need to backfill all of the data for your flat table,” Shen observed. “Because one thing changes, the whole flat table can change.”

Architectural Advantages

Organizations can avoid such inflexibility, costs, and time preprocessing their tables by employing a real-time OLAP database that joins tables at enterprise scale for instant data analysis. StarRocks’ architecture enables it to support in-memory data shuffling, which helps with joins and complicated aggregation operations. Data shuffling becomes influential in distributed environments in which “one of the challenges is to send the data to the appropriate nodes, so the nodes can get the data and they all do their part,” Shen noted. “Data shuffling is, basically, you shuffle the two. Let’s say you join two tables and shuffle all the data on the join key to all of the nodes.”

This operation allows organizations to perform scalable joins. Without it, users would have to attempt what Shen termed a “broadcast join” that involves replicating a smaller table and sending it to all the nodes. According to Shen, for CelerData’s real-time OLAP competitors, “The most they can do without shuffling is to have a big table join a very tiny table on a cluster that’s not very big. But we can do a big table joining a big table or any other kind.”

Additionally, because StarRocks is based on C++, some of its performance gains — which become palpable when competing with other Java-based query engines like Presto or Trino for directly querying data lakes — are based on its utilization of Single Instruction, Multiple Data (SIMD) instructions. With SIMD, “you process multiple data points with one instruction, so you touch your memory a lot less by executing one query,” Shen said. This increased efficiency is characteristic of OLAP databases predicated on C++; Shen mentioned it’s not possible with JAVA-based options.

The End of Table Denormalization?

A real-time OLAP database that dynamically joins tables whenever organizations specify it has considerable consequences for real-time analytics. On the one hand, it could herald an end to denormalization and the time, effort, and costs denormalization exacts from organizations to pre-join tables according to specific query patterns. On the other, it could signal an era in which there’s much more flexibility for real-time databases to adjust to changes in schema, source data, and business requirements. Either way, this capability could further advance the usefulness of real-time data analytics.

CelerData helps enterprises accelerate business growth with a unified analytics platform that delivers 3X the performance of any other solution on the market while reducing operating costs by up to 80%. Powered by StarRocks, CelerData is used worldwide by leading brands including Airbnb and Lenovo.
Learn More
The latest from CelerData
TRENDING STORIES
Jelani Harper has worked as a research analyst, research lead, information technology editorial consultant, and journalist for over 10 years. During that time he has helped myriad vendors and publications in the data management space strategize, develop, compose, and place...
Read more from Jelani Harper
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.