![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
While the term “real time” can be used as a marketing spin in some cases, there are genuine technical and functional differences between real-time analytic databases and conventional analytic databases. Real-time analytic databases (aka streaming databases) are a distinct category of analytic databases that are optimized for processing and analyzing high-volume, high-velocity data in near real time.
Conventional analytic databases are optimized for processing large volumes of historical data in batch mode. While these databases can provide valuable insights into past trends and patterns, they weren’t designed for real-time decision-making or infusing analytics into downstream applications at web scale.
Examples of conventional analytic databases include Snowflake, Greenplum, BigQuery, Redshift and Teradata, among others.
Real-time analytic databases, on the other hand, can process and analyze data as it arrives, allowing organizations to make informed decisions and take immediate actions based on the most up-to-date information.
They are designed to provide low-latency querying, fast ingestion and scalable processing of streaming data generated by modern devices, machines and sensors. Examples of databases in this class would include Kinetica, Pinot, Druid, Rockset, Materialize, ClickHouse, SingleStore and Aerospike, among others.👁 Image
The “steelman” argument against the notion that real-time analytic databases are fundamentally different from conventional analytic databases is that real-time analytic databases are simply an extension of the traditional analytic database paradigm with added real-time capabilities. The difference is more a matter of degree than a fundamental shift in the underlying technology.
Proponents of this argument point out that both real-time analytic databases and conventional analytic databases are designed to store and analyze large volumes of data and the underlying principles of data storage, indexing and querying are largely the same in both cases. Moreover, many conventional analytic databases now offer some level of real-time processing capabilities, such as micro-batch loads or the latest query-accelerating technique, blurring the distinction between the two categories.
To make the case for real-time analytic databases being a distinct category, consider the below framework based on data latency and query latency. Data latency refers to the time delay between when data is generated and when it is available for processing and analysis. This delay can be caused by a variety of factors, but primarily network speed and ingest overhead.
Query latency refers to the time delay between when a query is submitted to a data processing system and when the results of that query are returned. Query latency is primarily a function of query complexity, the amount of data being queried, the type of storage and the level of sophistication of the query engine.
Ingesting streams of data involves processing data as it arrives in real time, often through a continuous flow of data. A best-in-class real-time analytic database will have three essential features to radically reduce data latency.
Conventional analytic databases do not have native streaming connections to the source and sink, centralize ingestion through a coordination point and make extensive use of table locking that together drives significant data latency.
Once the data is available to query, the speed of the query matters. A best-in-class real-time analytic database will have three essential features to radically reduce query latency.
Conventional analytic databases are not fully vectorized because they simply have too much tech debt to take advantage of this innovation and vectorize all their operations. Conventional analytic databases still rely on the old model of materializing views and treat high-speed reads at scale as a task for a different database. Taken together, real-time analytic databases produce far and away the freshest possible insights for the next generation of data-infused apps.
It’s not to suggest that all real-time analytic databases are the same. Some provide better support for joins and ad-hoc queries. Some are open source. Some focus on weblog data while others have robust support for time series and spatial needed for sensor and machine data. Whatever your real-time database needs, there’s probably a good fit for your use case.