VOOZH about

URL: https://thenewstack.io/observability-do-you-need-a-data-lake/

⇱ Observability: Do You Need a Data Lake? - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-02-26 11:15:37
Observability: Do You Need a Data Lake?
sponsor-elastic,sponsored-topic,
Data / Observability

Observability: Do You Need a Data Lake?

Data lakes can gather and integrate data to help gain analytical insights and improve business operations. But observability doesn't always require them.
Feb 26th, 2025 11:15am by B. Cameron Gain
👁 Featued image for: Observability: Do You Need a Data Lake?
Image by Davey Gravy via Unsplash+.

Data lakes have emerged as an essential component for observability. This is because they can ideally gather and integrate data in various forms and structures that, if handled properly, can be used to gain analytical insights, improve business operations and enhance the capabilities that applied observability offers.

What’s a data lake? Gartner offers a reasonably comprehensive definition:

A data lake is a semantically flexible data storage repository combined with one or more processing capabilities. Most data assets are copied from diverse enterprise sources and are stored in their raw and diverse formats so they can be refined and repurposed repeatedly for multiple use cases. Ideally, a data lake will store and process data of any structure, latency or container, such as files, documents, result sets, tables, formats, binary large objects (BLOBs) and messages.

It’s hard to argue against the advantages that a data lake can offer for observability. A single repository that houses useful data as the foundation to improve data analysis, whether for business operations, DevOps or platform engineering.

However, creating and managing a proper data lake for observability requires significant know-how and infrastructure support, and it is not necessary for every organization. Not unlike Kubernetes, which is viable only beyond a certain threshold of scale, a data lake for observability may not benefit a small organization that still relies mostly on the cloud for its data or operations and may lack the budget to implement a data lake.

But the future may bring more interesting developments and changes in these dynamics as the technology evolves. Observability providers are also exploring the possibility of offering data lake-management platforms, potentially reshaping how businesses approach data analysis and operational insights.

The Benefits of Data Lakes in Observability

Data lakes enable data collection and advanced analytics, and complement traditional data warehouses. For example, the massive repository of source data in a data lake supports broad, flexible and unbiased data exploration, which is a prerequisite for data mining, statistics, machine learning (ML) and other analytics techniques.

A data lake can also provide scalable and high-performance data acquisition, preparation and processing, either to be refined and loaded into a data warehouse or for processing within the data lake.

Since data lakes store raw, diverse data and offer advanced analytics, observability — with the right platform — can leverage data lakes for debugging, insight generation and prediction, said Jason Soroko, senior fellow at Sectigo, a provider of comprehensive certificate life-cycle management, to The New Stack.

Integrating AI and automated tools enhances monitoring across the entire stack. Unified data ingestion via OpenTelemetry further streamlines operations, reducing silos, Soroko said.

“Data lakes are essential for coherent observability,” he said. “So, with that idea of coherence in mind, choosing the right platform is key.”

Without a data lake, observability platforms tied to proprietary storage risk fragmentation. Mandating separate servers or cloud resources complicates data consolidation and restricts unified analysis, Soroko said.

“A centralized data lake approach unifies disparate sources, enabling scalable processing and clear insights. The idea of a data lake seems simple to envision, but we know there is a lot of underlying complexity in implementation,” he said. “The guiding principle should be to ensure coherence, which is the whole point of why a data lake is effective.”

Why Data Lakes Aren’t for Everyone

With the right platform, observability can be applied to a data lake not only for debugging, but also to gain business insights, make predictions and properly monitor the entire stack. Additionally, data stacks and AI will play a significant role in observability, as they already do today and will continue to do in the future for automated observability functionalities.

Still, data lakes are not necessary for every organization. Indeed, for observability, there are even downsides, Richard “RichiH” Hartmann, director of community and office of the CTO at Grafana Labs, told The New Stack.

“While data lakes offer powerful capabilities for data science and analytics, they’re not an optimal foundation for observability systems,” Hartmann said. “The latency and cost overhead of data lakes make them poorly suited for the real-time, high-performance requirements of modern observability.”

Then there is the task of integrating — and storing — the data. While storage is relatively cheap compared to the cost of using cloud tools and platforms to support observability, integration costs can run high.

“How do we get past the huge cost of integrating all this data? One thing is going to be more of this post-processing, using AI tools to stitch things together in new ways, instead of building and making sure all your relationships are clean,” Nic Benders, chief technical strategist at New Relic, told The New Stack.

“Organizations are going to have thousands of lakes. This is going to be a situation where companies, a few years from now, are going to largely be keeping their data in readable formats in place.”

Data lakes are also not a binary proposition, as in 0 or 1, where having a data lake is necessary and not having one is not viable.  The integration, ease of use, cost, security and other considerations of supporting a data lake are only as good as the observability platform that makes use of the data in the repository.

Data lakes and proper observability tools are both required. Indeed, the key challenge is not to force organizations to choose between observability platforms and data lakes. Rather, organizations can choose to leverage open standards and flexible integrations to “get the best of both worlds,” Hartmann said.

“Through tools like OpenTelemetry and extensible platforms that integrate with hundreds of data sources — with the ability to support customers’ own data lakes — organizations can build monitoring solutions that match their specific needs,” he said. “Where this gets particularly interesting is with the meta-monitoring aspect — observability platforms can actually help organizations optimize their data lake performance, track data usage patterns and identify opportunities for cost optimization.”

Some observability providers may require organizations to store data on their servers or cloud resources, potentially creating a data lake through their services. However, this approach may be less than ideal for organizations seeking to consolidate diverse data sources through OpenTelemetry and other methods to build a unified data pool from different observability providers and cloud sources.

CSS Electronics has relied on Grafana for visualization of data lake data, Martin Falch, co-owner and head of sales and marketing at CSS Electronics, described in a post on the Grafana Labs blog. The integration of a data lake was used to build Controller Area Network (CAN) bus data loggers. (CAN bus is a protocol used for communicating sensor data within vehicles and machinery, including trucks, cars, ships and robots.)

The users, Falch wrote, leverage the data source as part of a broader workflow for data visualization — specifically, automatic data processing via AWS Lambda functions to create Parquet data lakes, Glue (a data integration servers for serverless) and other tasks running in AWS to map those data lakes.

However, there are many data lake solutions being offered, and buyers need to beware. “When providers force users to store data in their proprietary data lakes,” Hartmann said, “they’re essentially creating expensive, limited-functionality replicas of what best-of-class data platforms already do better.”

Not All Telemetry Data Is Priceless

Data lakes are becoming essential for observability functionality for many organizations. But data lakes are certainly not the be-all and end-all, either. The idea is to have ready access to observability intelligence that parses through the right data.

However, not all data, whether in a data lake or not, enables the accessible and automated observability on which business decisions, operations analytics, developers tests and security actions can be made. In other words, not all telemetry data is priceless.

“Essentially, when responding to incidents, you don’t have time to process raw data from data lakes. You need ready-to-use dashboards, alerts and monitoring systems,” Hartmann said. “There are certain use cases for data lakes, but you first need to identify which business insights are most valuable, then selectively optimize those specific data paths.”

This approach, he said, “lets you maintain the real-time responsiveness needed for operational monitoring while strategically leveraging data lakes where they make business sense.”

Elastic, the Search AI Company, integrates its expertise in search technology with artificial intelligence to help everyone transform data into answers, actions, and outcomes. Elastic’s Search AI Platform — the foundation for its search, observability, and security solutions — is used by more than 50% of the Fortune 500.
Learn More
The latest from Elastic
Hear more from our sponsor
TRENDING STORIES
BC Gain is founder and principal analyst for ReveCom Media. His obsession with computers began when he hacked a Space Invaders console to play all day for 25 cents at the local video arcade in the early 1980s. He then...
Read more from B. Cameron Gain
SHARE THIS STORY
TRENDING STORIES
Amazon Web Services is a sponsor of The New Stack.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.