![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Frozen storage or various methods of discarding data (such as downsampling) often seem to be the only solutions when the volume of log data keeps going up and when costs are skyrocketing.
In recent years, a new approach that maximizes the performance of object storage provides a much better alternative. It’s now possible for enterprises to use solutions that are built on object storage to keep all their data hot for real-time analytics while remaining cost-effective.
Some of the most recent news in this space is the announcement of AWS S3 Tables, which uses Apache Iceberg to partition and optimize object storage. Tools like Iceberg provide wrappers around object storage, dramatically improving the performance of data lakes. Meanwhile, solutions like Hydrolix provide both real-time and long-term historical analytics of log data by maximizing the performance of object storage — all without needing to build a solution from the ground up using tools like Iceberg.
Let’s explore some of the issues with tiered storage, the benefits of keeping all data hot, and how modern data storage solutions are maximizing the performance of object storage to provide cost-effective, low-latency query performance for petabytes of data that can span years.
Frozen storage can cut costs compared to tightly coupled, expensive hot storage. That’s where the benefits of frozen storage end and the downsides begin. Frozen storage is inconvenient to rehydrate, so it’s rarely queried and quickly goes dark. It’s much slower than hot storage, inaccessible for machine learning runs and surprisingly expensive as a whole — mainly because there tends to be so much of it and it provides so little long-term value. In some cases, pipelines and data replicas are necessary to move data between tiers, leading to additional complexity and operational overhead.
As a result, the tiered data paradigm freezes teams into an outdated, legacy approach where log data is only valuable for short-term operational insights such as observability. From this perspective, only the last few weeks of data matter for high-performance analytics, and the only remaining value of log data is for compliance and security purposes.
However, this runs counter to the approach that many forward-thinking enterprises are taking to federate and democratize access to data, making that data available to teams and analysts in the tools they use. That includes not just operations but business intelligence (BI), data scientists, cybersecurity and teams developing machine learning models.
By eliminating the high costs that traditionally come with hot storage, enterprises can unlock a wide range of benefits, and these extend well beyond those listed in the use cases above. In contrast to frozen storage, once the cost consideration is gone, there is only upside to keeping all data hot. Fully hot storage also provides the following benefits:
Beyond the benefits, there are many use cases for long-term, historical hot data that are much harder, or even impossible, with frozen storage. The following three use cases — across the areas of cybersecurity, machine learning and business intelligence — are just a few examples of the importance of long-term hot data retention.
With long-term, cost-effective hot data, the question becomes, “What can we do to maximize the value of this data?” instead of, “How long can I keep data accessible without runaway costs?”
All of these benefits are only possible if object storage is performant enough for real-time analytics. But traditionally, object stores haven’t been the right approach for the low-latency queries needed for real-time. The distributed nature of object storage makes it both infinitely scalable and extremely cost-effective, but it also means that data is physically dispersed instead of closely coupled with query components, leading to higher latency. And it’s more common to see object storage used for data that’s cold or frozen, not hot.
To maximize the performance of object storage, solutions are built around the following core concepts:
With the right solution, it’s possible to reduce the “time to glass,” including data ingestion, transformation and querying, to a matter of seconds. For example, with Hydrolix, the typical time to glass is less than 10 seconds, even when an enterprise is ingesting millions of log lines per second.
While this is not true real-time latency to the order of milliseconds, many real-time use cases, such as analytics, do not require millisecond latency. According to Gartner’s definition of real-time analytics, “For some use cases, real time simply means the analytics is completed within a few seconds or minutes after the arrival of new data.” In the case of observability, business intelligence and many cybersecurity use cases, to name a few, latency in the range of seconds allows operations and other teams to quickly find and fix issues and uncover deeper insights in their data.
Object storage isn’t appropriate for use cases that require true millisecond latency, but at the same time, solutions that rely on in-memory stores or expensive, tightly coupled hardware are no longer appropriate for analytics of large volumes of data either. As always, it’s important to use the right tool for the job. And when it comes to ingesting, storing and analyzing large volumes of log data, it’s time to use solutions built on object storage instead of tiered storage that leaves your data in the cold.
Learn how Hydrolix can help you keep more data longer and more cost-effectively by maximizing the performance of disaggregated object storage.