VOOZH about

URL: https://thenewstack.io/sql-and-complex-queries-are-needed-for-real-time-analytics/

⇱ SQL and Complex Queries Are Needed for Real-Time Analytics - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-04-18 12:20:22
SQL and Complex Queries Are Needed for Real-Time Analytics
contributed,sponsor-rockset,sponsored,sponsored-post-contributed,
Cloud Native Ecosystem / Data

SQL and Complex Queries Are Needed for Real-Time Analytics

For complex analytical queries, SQL is unquestionably the best tool. It is easy to create queries, and even easier to tune and optimize them in order to accelerate results, shrink intermediate tables and reduce query costs.
Apr 18th, 2022 12:20pm by Dhruba Borthakur
👁 Featued image for: SQL and Complex Queries Are Needed for Real-Time Analytics
Featured image via Pixabay.
Rockset sponsored this post.
Dhruba Borthakur
Dhruba is CTO and co-founder of Rockset and is responsible for the company's technical direction. He was an engineer on the database team at Facebook, where he was the founding engineer of the RocksDB data store. Earlier at Yahoo, he was one of the founding engineers of the Hadoop Distributed File System. He was also a contributor to the open source Apache HBase project.

Note: This post is the fourth in the series Designing the Next Generation of Data Systems For Real-Time Analytics.”

Today’s data-driven businesses need not only fast answers derived from the freshest data, but they must also perform complex queries to solve complicated business problems.

For instance, customer personalization systems need to combine historic data sets with real-time data streams to instantly provide the most relevant product recommendations to customers. So must operational analytics systems providing mission-critical real-time business observability, such as the case of an online payments provider that needs to monitor its transactions worldwide for anomalies that could signal financial fraud.

Or imagine an e-learning platform that needs to provide up-to-the-minute insights into student and teacher usage for school district customers and internal customer-facing teams. Or a market news provider that needs to monitor and ensure that its financial customers are getting accurate, relevant updates within the narrow windows for profitable trades.

Limitations of NoSQL

SQL supports complex queries because it is a very expressive, mature language. Complex SQL queries have long been commonplace in business intelligence (BI). And when systems such as Hadoop and Hive arrived, it married complex queries with big data for the first time. Hive implemented an SQL layer on Hadoop’s native MapReduce programming paradigm. The tradeoff of these first-generation SQL-based big data systems was that they boosted data processing throughput at the expense of higher query latency. As a result, the use cases remained firmly in batch mode.

That changed when NoSQL databases such as key-value and document stores came on the scene. The design goal was low latency and scale. Now companies could take a massive data set, organize it into simple pairs of key values or documents and instantly perform lookups and other simple queries. The designers of these massive, scalable key-value stores or document databases decided that scale and speed were possible only if the queries were simple in nature. Looking up a value in a key-value store could be made lightning fast. By contrast, a SQL query, due to the inherent complexity of filters, sorts and aggregations, would be too technically challenging to execute fast on large amounts of data, they decided.

Pay No Attention to That Man Behind the Curtain

Unfortunately, due to the above, NoSQL databases tend to run into problems when queries are complex, nested and must return precise answers. This is intentionally not their forte. Their query languages, whether SQL-like variants such as CQL (Cassandra) and Druid SQL or wholly custom languages such as MQL (MongoDB), poorly support joins and other complex query commands that are standard to SQL, if they support them at all.

Vendors of NoSQL databases are like the Wizard of Oz, distracting you with smoke and mirrors and talking up speed so you don’t notice the actual weaknesses of NoSQL databases when it comes to real-time analytics. Developers working with NoSQL databases end up being forced to embed joins and other data logic in their own application code — everything from fetching data from separate tables to doing the join optimizations and other analytical jobs.

While taking the NoSQL road is possible, it’s cumbersome and slow. Take an individual applying for a mortgage. To analyze their creditworthiness, you would create a data application that crunches data, such as the person’s credit history, outstanding loans and repayment history. To do so, you would need to combine several tables of data, some of which might be normalized, some of which are not. You might also analyze current and historical mortgage rates to determine what rate to offer.

With SQL, you could simply join tables of credit histories and loan payments together and aggregate large-scale historic data sets, such as daily mortgage rates. However, using something like Python or Java to manually recreate the joins and aggregations would multiply the lines of code in your application by tens or even a hundred compared to SQL.

More application code not only takes more time to create, but it almost always results in slower queries. Without access to a SQL-based query optimizer, accelerating queries is difficult and time-consuming because there is no demarcation between the business logic in the application and the query-based data access paths used by the application. Something as common as an intermediate join table, which SQL can handle efficiently and elegantly, can become a bloated memory hog in other languages.

Finally, a query written in application code is also more fragile, requiring constant maintenance and testing, and possible rewrites if data volumes change. And most developers lack the time and expertise to perform this constant maintenance.

There is only one NoSQL system I would consider reasonably competent at complex queries: GraphQL. GraphQL systems can associate data types with specific data fields, and provide functions to retrieve selected fields of a document. Its query API supports complex operations, such as filtering documents based on a set of matching fields and selectively returning a subset of fields from matching documents. GraphQL’s main analytics shortcoming is its lack of expressive power to join two disparate datasets based on the value of specific fields in those two datasets. Most analytical queries need this ability to join multiple data sources at query time.

Rockset was founded in 2016 by experts in web-scale data management and distributed systems. The team includes engineers who created the online data and search infrastructure at Facebook, founded the Hadoop project at Yahoo, implemented the Gmail backend at Google, and built databases at Oracle.
Learn More
The latest from Rockset

Choosing the Best Tool for the Job – SQL

In technology as in life, every job has a tool that is best designed for it. For complex analytical queries, SQL is unquestionably the best tool. SQL has a rich set of powerful commands developed over half a century. It is easy to create queries, and even easier to tune and optimize them in order to accelerate results, shrink intermediate tables and reduce query costs.

There are some myths about SQL databases, but they are based on legacy relational systems from the 1990s. The truth is that modern cloud native SQL databases support all of the key features necessary for real-time analytics, including:

  • Mutable data for incredibly fast data ingestion and smooth handling of late-arriving events.
  • Flexible schemas that can adjust automatically based on the structure of the incoming streaming data.
  • Instant scaleup of data writes or queries to handle bursts of data.

SQL remains incredibly popular, ranking among the most in-demand of all programming languages. As we’ve seen, it supports complex queries, which are a requirement for modern, real-time data analytics. By contrast, NoSQL databases are weak in executing joins and other complex query commands. Plus, finding an expert in a lesser-known custom query language can be time-consuming and expensive.

The bottom line is that you’ll have no problem finding skilled data engineers and data ops folks who know SQL and its capabilities with complex queries. And they’ll be able to put that knowledge and power to use, propelling your organization’s leap from batch to real-time analytics.

Rockset was founded in 2016 by experts in web-scale data management and distributed systems. The team includes engineers who created the online data and search infrastructure at Facebook, founded the Hadoop project at Yahoo, implemented the Gmail backend at Google, and built databases at Oracle.
Learn More
The latest from Rockset
TRENDING STORIES
Dhruba is CTO and co-founder of Rockset and is responsible for the company's technical direction. He was an engineer on the database team at Facebook, where he was the founding engineer of the RocksDB data store. Earlier at Yahoo, he...
Read more from Dhruba Borthakur
Rockset sponsored this post.
SHARE THIS STORY
TRENDING STORIES
MongoDB is a sponsor of The New Stack.
TNS owner Insight Partners is an investor in: Real, Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.