VOOZH about

URL: https://thenewstack.io/apache-druid-a-real-time-database-for-modern-analytics/

⇱ Apache Druid: A Real-Time Database for Modern Analytics - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-05-18 10:00:25
Apache Druid: A Real-Time Database for Modern Analytics
contributed,
Cloud Native Ecosystem / Data

Apache Druid: A Real-Time Database for Modern Analytics

You need to look for a database that has an optimized architecture and data format, is built for interactivity and will scale based on your needs. One open source database that checks off all three of these needs is Apache Druid.
May 18th, 2022 10:00am by David Wang
👁 Featued image for: Apache Druid: A Real-Time Database for Modern Analytics
Feature image via Pixabay.
David Wang
David Wang is the VP of product marketing at Imply, where he is responsible for the company’s positioning, product messaging and technical content. He is an engineer turned marketer with a career building new categories and increasing awareness across next-gen data management, virtualization and enterprise storage technologies. Prior to joining Imply, David served in leadership roles at Hewlett Packard Enterprise (HPE), Nimble Storage and GE Digital.

Analytics have become the secret sauce for every company. But while it’s been useful for making decisions, it’s not just for internal stakeholders anymore. Companies like Twitter, Atlassian and Citrix are leading their industries because they are delivering insights to their customers.

For the technical leaders now charged with building an external analytics application, trying to figure out what’s the right database backend to use requires new considerations.

The easy answer is to default to a database like PostgreSQL or MySQL or even adapt a data warehouse outside of its standard BI dashboard and reporting functionality. While these options may seem to get the job done quickly at first, it’s important to remember that creating a customer-facing application can have grander implications than an internal application — a potential impact on revenue to name just one. Therefore, it’s critical that you start the job with a database that delivers the best user experience possible.

Loading… Please Wait

There’s nothing more frustrating than sitting and waiting for an application to return with results. It’s fine if your own employee has to wait a few seconds or even minutes for a query to process, but that wait time is unacceptable when it comes to external users like customers.

There are a few reasons why users often run into these long wait times: the amount of data you’re trying to analyze, the database’s processing power, user and API call numbers and more. Overall, it’s based on your database’s ability to run your application.

While it is feasible to use a generic OLAP database to create an interactive data experience when working with large amounts of data, you risk putting yourself at a costly disadvantage. If you try computing all the queries ahead of time, you wind up with an expensive and rigid architecture. Similarly, collecting all the data first can lead to a situation where you have minimized insights. If you only analyze data from recent events, you hinder your users from seeing the whole picture.

Therefore, to create an external-facing data analytics application, you need to look for a database that has an optimized architecture and data format, is built for interactivity and will scale based on your needs. One open source database that checks off all three of these needs is Apache Druid.

With its distributed and elastic architecture, Apache Druid prefetches data from a shared data layer into an infinite cluster of data servers. Because there’s no need to move data and you’re providing more flexibility to scale, this kind of architecture performs quicker as opposed to a decoupled query engine such as a cloud data warehouse.

Additionally, Apache Druid can process more queries per core by leveraging automatic, multilevel indexing that is built into its data format. This includes a global index, data dictionary and bitmap index, which goes beyond a standard OLAP columnar format and provides faster data crunching by maximizing CPU cycles.

High Availability Is a Must

When it comes to internal operations, experiencing an outage isn’t a huge deal, especially if it only lasts a few minutes. It may be a little inconvenient, but it’s not unheard of for OLAP databases and data warehouses to see some unplanned downtime and maintenance windows where services are unavailable.

However, it’s a completely different story when it comes to customer-facing, external analytics applications. If a customer experiences an unplanned outage, they could abandon the application temporarily or indefinitely, causing an impact on revenue. That’s why it’s incredibly important to prioritize resiliency for high availability and data durability when building these kinds of applications.

In order to achieve resiliency for customer-facing, external analytics applications, there are a few questions you should ask yourself: Can I safeguard from a node or cluster-wide failure? What would the impact be if I lost data? What is needed to protect my data and application?

It’s a fact of life that servers will eventually fail. Backing up your data and replicating your nodes is the standard method to ensure resiliency, but unless you maintain a frequent backup cadence, you’ll need to do more to mitigate data loss.

Instead, you need to make sure high availability and data durability are built into your database — specifically one that includes automatic, multilevel replication with shared data in S3/object storage. Apache Druid provides continuous backup capabilities, which automatically protect and can restore the latest version of your database even if your entire cluster is impacted.

Add Users without Adding to Your Costs

Architecting your backend for high concurrency is important because you need an application that can support scores of users and still provide an engaging experience. This can help you mitigate the possibility of angering your customers because their applications aren’t working properly.

It’s important to note that this isn’t the same as architecting for internal reporting since that typically has fewer regular users. Ultimately, that means you need one database for internal reporting and a different one for your highly concurrent applications.

There are three factors to consider when architecting a database for high concurrency: CPU usage, scalability and cost. Some may say that adding on more hardware can fix the issue, but that’s not always the best answer. While increasing the number of CPUs can allow you to run more queries, it will also come with a higher price tag.

Apache Druid provides a smarter and more economical choice because of its optimized storage and query engine that decreases CPU usage. “Optimized” is the keyword here; you want your infrastructure to serve more queries in the same amount of time rather than having your database read data it doesn’t need to.

Building for Today and the Future

Providing an external analytics application can be part of a fantastic customer retention strategy and revenue source. That’s why it’s essential to take the time to find the database that best supports your needs and build the right data architecture that will keep your customers happy.

TRENDING STORIES
David Wang, algorithm engineer at Zilliz, brings extensive expertise in computer vision and natural language processing. His contributions to advanced embedding algorithm research, including projects like Towhee and GPTCache, reflect his commitment to advancing AI technologies. Before joining Zilliz, he...
Read more from David Wang
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.