VOOZH about

URL: https://thenewstack.io/accelerating-sql-queries-on-a-modern-real-time-database/

⇱ Accelerating SQL Queries on a Modern Real-Time Database - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-11-03 10:48:14
Accelerating SQL Queries on a Modern Real-Time Database
contributed,sponsor-aerospike,sponsored,sponsored-post-contributed,
Data / Storage

Accelerating SQL Queries on a Modern Real-Time Database

A Trino connector enables SQL access to real-time data and allows data architects and developers to expand fast analytics data accessible from it.
Nov 3rd, 2022 10:48am by Neel Phadnis and Yevgeny Rizhkov
👁 Featued image for: Accelerating SQL Queries on a Modern Real-Time Database
Image via Pixabay.
Aerospike sponsored this post.

When deploying large-scale real-time applications across a wide range of verticals, businesses require “as it happens” visibility, sometimes in near-real time, over these systems via notifications, ad-hoc queries, dashboards, and reports.

SQL is broadly used as a data access language for analytics, and Trino provides a powerful engine for SQL access to multiple data sources. A Trino connector enables SQL access to real-time data through Trino, and more broadly, allows data architects and developers to expand fast analytics data accessible from Trino.

For analytics use cases, you can synchronize transactional and operational data from globally distributed clusters to a system of record (SOR) or analytics store in near-real-time using an advanced XDR protocol.

Enabling Fast Queries at Scale

Superior databases provide fast access to large volumes of data with massive parallelism.

Aerospike is the real-time database built for infinite scale, speed, and savings. Our customers are ready for what’s next with the lowest latency and the highest throughput data platform. Cloud and AI-forward, we empower leading organizations like Adobe, Airtel, Criteo, Experian, and PayPal.
Learn More
The latest from Aerospike

Large Volume of Fast Storage

Modern databases have a cluster architecture spanning multiple nodes for scale, performance, and reliability. High density of fast storage is achieved through solid-state disks (SSDs). Hybrid memory architecture (HMA) stores indexes and data in dynamic random-access memory (DRAM), SSD, and other devices to provide cost-effective fast storage capacity.

Low-Latency Access

Disk (SSD) reads and writes are optimized for latency and throughput.

Indexes play a key role in realizing fast access to data. This requires supporting secondary indexes on integer, string, geospatial, map, and list columns.

Multiple Levels of Parallelism

The thread architecture on a cluster node is optimized to exploit parallelism of multicore processors of modern hardware, and also to minimize conflict and maximize throughput. The data is distributed uniformly across all nodes to maximize parallelism and throughput. The client library connects directly to individual cluster nodes and processes a request in a single hop, by distributing the request to nodes where it is processed in parallel, and assembling the results.

Fine-Grained Sub-Queries

Ideally, you want to distribute records in uniform partitions, and allow separate sub-queries over them for maximum parallelism. In other words, a query is split into independent parallel sub-queries over one or more partitions, for the needed parallelism to match the required throughput. Further, each data partition can be subdivided into N sub-partitions by adding the modulo filter expression `digest % N == i for 0 <= i < N`, where digest is the hashed key of the record.

Since digest is held in memory with other record metadata, the filter expression evaluation for a record’s membership in a sub-partition requires no access to data on the SSD. Therefore, a sub-partition query reads only the data in its sub-partition, minimizing the SSD reads across the multiple sub-partitions. This sub-partitioning scheme allows for an arbitrary number of parallel streams.

Using this scheme, a large number of parallel tasks in Trino worker nodes can uniformly split the data for processing via an equal number of mutually exclusive and collectively exhaustive splits or streams using partition queries in combination with the modulo filter expression. The appropriate data scale, throughput, and response time can be achieved by adjusting the cluster size as well as the number of attached SSD devices per node.

Stream Access

Query results can be retrieved and processed in a stream of smaller chunks by repeatedly asking for a specific number of remaining records.

The Trino Connector

The Aerospike Trino Connector enables access to real-time data through Trino for analytics use cases such as ad-hoc SQL queries, reports, and dashboards. Data in multiple clusters can be queried together using Trino’s data federation, which also makes it possible to merge Aerospike data with data from other sources.

Additional details about the Trino Connector can be found in the blog posts “Deploy Aerospike and Trino based analytics platform using Docker” and “Aerospike Trino Connector – Chapter Two.”

Starburst is a SQL-based MPP (massively parallel processing) query engine based on Trino that enables you to run Trino on a single machine, a cluster of machines, on-premise or in the cloud. The blog post “Analyze Data with Aerospike and Starburst Anywhere” describes how to use Starburst Enterprise. Recently released Aerospike SQL Powered by Starburst 1.1.0 supports Starburst Enterprise Platform (SEP), which is “a fully supported, enterprise-grade distribution of Trino that adds integrations, improves performance, provides security, and makes it easy to deploy, configure, and manage your clusters.”

Other SQL Options

Applications can access the data through SQL in a few other ways.

Spark

You can use Spark SQL to manipulate Aerospike data on the Spark platform. Aerospike Spark Connector provides parallel access to the Aerospike cluster from Spark.

Spark SQL merges two abstractions: replicated distributed datasets (RDDs) and relational tables, and is used to manipulate and process data in RDDs. Find examples of importing and storing Aerospike data to and from RDDs in these tutorials.

More details on the Spark Connector are available in the blog posts “Using Aerospike Connect for Spark” and “Accelerate Spark queries with Predicate Pushdown using Aerospike.”

JDBC

Application developers can use simple SQL to access Aerospike data with the community-contributed JDBC Connector. Please read more details in the blog post “Introducing Aerospike JDBC Driver.”

Aerospike API

Applications requiring the full functionality of the Aerospike API can use the [SQL patterns] available in the APIs to implement specific SQL CRUD operations easily.

Aerospike Database’s fast high-capacity storage and parallel processing aligns with Trino’s distributed SQL query engine to accelerate query processing over large data sets. Aerospike’s hybrid memory architecture (HMA) leverages SSDs along with DRAM to greatly expand fast storage capacity in its cluster. Further, Aerospike distributes data and processing over a large number of partitions, providing a high degree of parallelism. For Trino queries over data in Aerospike, the result is accelerated performance at scale.

Aerospike is the real-time database built for infinite scale, speed, and savings. Our customers are ready for what’s next with the lowest latency and the highest throughput data platform. Cloud and AI-forward, we empower leading organizations like Adobe, Airtel, Criteo, Experian, and PayPal.
Learn More
The latest from Aerospike
TRENDING STORIES
Neel Phadnis is the director of developer engagement at Aerospike. He is a technologist with leadership experience in building innovative products and bringing them to market. He has held senior engineering management roles at Tealeaf, Efficient Frontier, AOL and Netscape....
Read more from Neel Phadnis
Yevgeny Rizhkov is director of engineering at Aerospike. He is an open source contributor and polyglot technologist with more than 10 years of experience in various engineering positions.
Read more from Yevgeny Rizhkov
Aerospike sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, Docker.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.