![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Many enterprises generate terabytes of log data every day, resulting in high costs to ingest, store and analyze that data. Even worse, many observability platforms are walled gardens, making it hard to use log data for use cases beyond observability, such as business intelligence, data science and machine learning.
To solve both of these problems, it’s time for headless observability, a fresh approach that decouples the frontend (visualization, querying and analytics) from the backend (data ingestion and storage) — all while keeping operations simple.
Headless observability combines two core concepts: headless architecture and the decoupled observability stack. With a headless approach, you can have multiple “heads,” or visualization and analytical tools, for your log and telemetry data. In addition to observability tools, your teams can also use cybersecurity, business intelligence and other analytical tools to maximize the value of that data. And all observability components (such as analytics and telemetry data collection) are decoupled instead of consolidated into a single observability platform.
While the term “headless observability” is new, the concepts around it have been established. Confluent has written about the concept of headless data architecture where multiple “heads” can be used to analyze data. As that article discusses, there are some critical differences between headless data architecture and traditional data lakes. Data isn’t limited to one centralized location (as is typically the case with a data lake), and any service should be able to access that data.
Meanwhile, StarTree’s Neha Pawar argued in The New Stack that observability needs to move toward a disaggregated stack where components such as storage, ingest and visualization are decoupled, resulting in greater flexibility and lower costs. While Pawar focuses on data disaggregation through the prism of observability, you can easily apply all these concepts to data disaggregation in general with certain observability components such as analytics (“headless observability”) added on.
Ultimately, headless observability and disaggregation can solve similar problems, including reducing costs, increasing scalability and making telemetry data available for other use cases. However, the headless approach has advantages for organizations that don’t want to build a solution from the ground up. With headless, teams can simplify operations, use fewer engineering resources and take on less risk.
Disaggregated observability is about breaking down the observability monolith into smaller, composable services, which can be a daunting task for teams. Headless observability, on the other hand, is about making that decoupling simple — in the same way a headless content management system (CMS) simplifies the process of creating digital experiences.
Pawar’s article shows that building a composable system is possible, but it involves using collection tools like OpenTelemetry, streaming pipelines like Apache Kafka or Apache Flink, and cloud object storage combined with data lake wrappers like Apache Iceberg. Teams can DIY real-time analytics and headless observability if they have the wherewithal and resources needed to build a solution from scratch, but there are risks with this approach.
To make a solution that works for observability and other use cases, you need to build a data lake that’s fast enough for real-time analytics and cost-effective enough for long-term storage. That includes a real-time streaming pipeline with ETL (extract—transform—load) to standardize and contextualize log data; a system for structuring, compacting and merging data in object storage; and analytics platforms that can perform queries and run dashboards with subsecond query latency.
Netflix and other huge enterprises that use log data in many different ways might have the resources to build these kinds of systems — along with the risk tolerance to troubleshoot them, especially when systems use more compute than expected, driving up costs, and when they end up having disappointing performance. Combining tools like Flink or Kafka with a high-performance data lake and real-time analytics is a complex project, and a lot can go wrong. Even for major enterprises, building a system from scratch often isn’t feasible, nor is it the best use of engineering resources.
Headless observability can be much simpler and require much less maintenance for teams that use mature, fully established solutions for both the frontend and the backend. It’s akin to using a content delivery network (CDN) to deliver assets for a headless content management system (CMS) instead of a full-stack web application where engineers have to manage every aspect of delivery, performance and reliability.
The key for headless observability is the backend: a storage solution optimized for logs that handles all the heavy lifting of ingesting, storing and preparing data for analysis.
The cornerstone of headless observability is the storage system. At a high level, it needs to do the following:
Let’s break down these two attributes to a more granular level.
Going headless makes analytics use cases — from observability to business intelligence — much easier. Here are the more general benefits of a headless approach:
Let’s also take a look at a specific use case for headless observability.
Some operations teams and leaders might wonder whether it’s worth unlocking the value of log data for use cases beyond observability in the first place. After all, some services can generate huge volumes of data that don’t seem to have much use beyond immediate observability. As an example, why would a data science team want to analyze the logs of thousands of ephemeral Kubernetes containers?
Not all logs have long-term value, but that’s one of the advantages of headless observability and decoupled storage. Teams have the freedom and flexibility to determine which logs should be retained for longer periods. Web application firewall (WAF) and other security logs can be retained over the long term and made available to cybersecurity teams and threat hunters. Other application logs can provide long-term insights into how resources are being used for capacity planning and anomaly detection.
Let’s take a closer look at a real, tangible use case where observability data can be valuable for other teams: real user monitoring (RUM). In the realm of observability, RUM allows teams to proactively monitor how end users are experiencing web applications. Issues like slow page loads can be mitigated before they frustrate users.
Beyond observability, RUM data can also provide insights into how your end users are interacting with your brand and your products. This data is invaluable for marketing, advertising and leadership teams that need to plan strategy. For enterprises generating terabytes of telemetry data that show exactly how users are interacting with their web properties, ensuring fast page load times is critical — but it shouldn’t be the only use case for that data.
As a real-world example, many enterprises use CDN log data for real user monitoring. In the short term, monitoring CDNs is important for ensuring good user experiences and fast loading times of digital assets. However, being able to retain huge volumes of log data (including CDN data) long term and cost-effectively provides certain advantages to enterprises. For example, major streaming and media broadcasters that previously couldn’t retain log data long-term due to costs are now analyzing that data for capacity planning, detecting and mitigating stream piracy, and are better understanding how their end users are interacting with both live and on-demand streaming content.
For these enterprises, which often generate terabytes of CDN log data per day, even basic monitoring and observability use cases aren’t possible with traditional SaaS observability vendors because of the cost and scale. By using a headless observability approach for CDN logs (such as Hydrolix with Grafana dashboards), they’re able to unlock not just increased event observability and lower costs, but also use that data for a wide range of other use cases. They also don’t need to build entire disaggregated observability solutions from scratch.
The benefits of a headless approach are clear for users but potentially difficult for observability and other analytics platforms to implement and monetize. The traditional Software as a Service (SaaS) model of observability calls for storing log data within observability platforms for a short period of time and maximizing the usability and value of that data for operations teams. To that end, traditional platforms have built rich ecosystems of agents and connectors — but only for incoming data. Once the data is at rest, it’s essentially in a walled garden: challenging to migrate and not possible to federate.
Even if data federation were possible, the typical observability platform simply doesn’t store data long enough to maximize its usability with other analytics platforms, and they often use proprietary query languages instead of SQL. In order to solve this problem, observability platforms need to make the transition to disaggregated storage — but many have incurred too much technical debt with their current infrastructure, making this transition difficult or even impossible.
As a result, traditional SaaS observability platforms aren’t well-suited for headless observability, leading to a gap in the market that solutions like Iceberg and Hydrolix can fill.
Finally, while a lower total cost of ownership is an immediate advantage of going headless, the longer-term benefits of democratizing and maximizing the value of telemetry data are still intangible for many enterprises. This isn’t a surprise — many teams are stuck answering short-term questions like, “How long should I keep this data?” and “How can I reduce costs?” Long-term retention of high volumes of telemetry data for use cases beyond observability hasn’t been possible for these enterprises.
Leaders and their teams should take the time to consider the future of their businesses, not just the present moment, by asking: “What business-critical questions could we answer if we could keep all this data long term, without worrying about high costs, and make it accessible to all our teams?”
Learn how Hydrolix can help you keep more data longer and more cost-effectively by maximizing the performance of disaggregated object storage.