VOOZH about

URL: https://thenewstack.io/boost-devops-maturity-with-a-data-lakehouse/

⇱ Boost DevOps Maturity with a Data Lakehouse - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-05-17 10:20:53
Boost DevOps Maturity with a Data Lakehouse
sponsor-dynatrace,sponsored-post-contributed,
Data / DevOps / Storage

Boost DevOps Maturity with a Data Lakehouse

With a data lakehouse, organizations can cope with high-cardinality data in a time- and cost-effective manner with long retention and granularity.
May 17th, 2023 10:20am by Guido Deinhammer
👁 Featued image for: Boost DevOps Maturity with a Data Lakehouse
Dynatrace sponsored this post.

In a world riven by macroeconomic uncertainty, businesses increasingly turn to data-driven decision-making to stay agile.

That’s especially true of the DevOps teams tasked with driving digital-fueled sustainable growth. They’re unleashing the power of cloud-based analytics on large data sets to unlock the insights they and the business need to make smarter decisions. From a technical perspective, however, that’s challenging. Observability and security data volumes are growing all the time, making it harder to orchestrate, process, analyze and turn information into insight. Cost and capacity constraints are becoming a significant burden to overcome.

Data Scale and Silos Present Challenges

DevOps teams are often thwarted in their efforts to drive better data-driven decisions with observability and security data. That’s because of the heterogeneity of the data their environments generate and the limitations of the systems they rely on to analyze this information.

Most organizations are battling cloud complexity. Research has found that 99% of organizations have embraced a multicloud architecture. On top of these cloud platforms, they’re using an array of observability and security tools to deliver insight and control — seven on average. This results in siloed data that is stored in different formats, adding further complexity.

This challenge is exacerbated by the high cardinality of data generated by cloud native, Kubernetes-based apps. The sheer number of permutations can break traditional databases.

Dynatrace redefines developer experience by unifying logs, metrics, traces, AI model telemetry, infrastructure, and security data into a single, scalable platform that integrates directly into IDEs and CI/CD pipelines.
Learn More
The latest from Dynatrace
Hear more from our sponsor

Many teams look to huge cloud-based data lakes, a repository that stores data in its natural or raw format, to help teams centralize disparate data. A data lake enables teams to keep as much raw, “dumb” data as they wish, at relatively low cost, until teams in the business find a use for it.

When it comes to extracting insight, however, data needs to be transferred to a warehouse technology so it can be aggregated and prepared before it is analyzed. Various teams usually then end up transferring the data again to another warehouse platform, so they can run queries related to their specific business requirements.

When Data Storage Strategies Become Problematic

Data warehouse-based approaches add cost and time to analytics projects.

As many as tens of thousands of tables may need to be manually defined to prepare data for querying. There’s also the multitude of indexes and schemas needed to retrieve and structure the data and define the queries that will be asked of it. That’s a lot of effort.

Any user who wants to ask a new question for the first time will need to start from scratch to redefine all those tables and build new indexes and schemas, which creates a lot of manual effort. This can add hours or days to the process of querying data, meaning insights are at risk of being stale or are of limited value by the time they’re surfaced.

The more cloud platforms, data warehouses and data lakes an organization maintains to support cloud operations and analytics, the more money they will need to spend. In fact, the storage space required for the indexes used to support data retrieval and analysis may end up costing more than the data storage itself.

Further costs will arise if teams need technologies to track where their data is and to monitor data handling for compliance purposes. Frequently moving data from place to place may also create inconsistencies and formatting issues, which could affect the value and accuracy of any resulting analysis.

Combining Data Lakes and Data Warehouses

A data lakehouse approach combines the capabilities of a warehouse and a lake to solve the challenges associated with each architecture, thanks to its enormous scalability and massively parallel processing capabilities. With a data lakehouse approach to data retention, organizations can cope with high-cardinality data in a time- and cost-effective manner, maintaining full granularity and extra-long data retention to support instant, precise and contextual predictive analytics.

But to realize this vision, a data lakehouse must be schemaless, indexless and lossless. Being schema-free means users don’t need to predetermine the questions they want to ask of data, so new queries can be raised instantly as the business need arises.

Indexless means teams have rapid access to data without the storage cost and resources needed to maintain massive indexes. And lossless means technical and business teams can query the data with its full context in place, such as interdependencies between cloud-based entities, to surface more precise answers to questions.

Unifying Observability Data

Let’s consider the key types of observability data that any lakehouse must be capable of ingesting to support the analytics needs of a modern digital business.

  • Logs are the highest volume and often most detailed data that organizations capture for analytics projects or querying. Logs provide vital insights to verify new code deployments for quality and security, identify the root causes of performance issues in infrastructure and applications, investigate malicious activity such as a cyberattack and support various ways of optimizing digital services.
  • Metrics are the quantitative measurements of application performance or user experience that are calculated or aggregated over time to feed into observability-driven analytics. The challenge is that aggregating metrics in traditional data warehouse environments can create a loss of fidelity and make it more difficult for analysts to understand the relevance of data. There’s also a potential scalability challenge with metrics in the context of microservices architectures. As digital services environments become increasingly distributed and are broken into smaller pieces, the sheer scale and volume of the relationships among data from different sources is too much for traditional metrics databases to capture. Only a data lakehouse can handle such high-cardinality data without losing fidelity.
  • Traces are the data source that reveals the end-to-end path a transaction takes across applications, services and infrastructure. With access to the traces across all services in their hybrid and multicloud technology stack, developers can better understand the dependencies they contain and more effectively debug applications in production. Cloud native architectures built on Kubernetes, however, greatly increase the length of traces and the number of spans they contain, as there are more hops and additional tiers such as service meshes to consider. A data lakehouse can be architected such that teams can better track these lengthy, distributed traces without losing data fidelity or context.

There are many other sources of data beyond metrics, logs, and traces that can provide additional insight and context to make analytics more precise. For example, organizations can derive dependencies and application topology from logs and traces.

If DevOps teams can build a real-time topology map of their digital services environment and feed this data into a lakehouse alongside metrics, logs and traces, it can provide critical context about the dynamic relationships between application components across all tiers. This provides centralized situational awareness that enables DevOps teams to raise queries about the way their multicloud environments work so they can understand how to optimize them more effectively.

User session data can also be used to gain a better understanding of how customers interact with application interfaces so teams can identify where optimization could help.

As digital services environments become more complex and data volumes explode, observability is certainly becoming more challenging. However, it’s also never been more critical. With a data lakehouse-based approach, DevOps teams can finally turn petabytes of high-fidelity data into actionable intelligence without breaking the bank or becoming burnt out in the effort.

Dynatrace redefines developer experience by unifying logs, metrics, traces, AI model telemetry, infrastructure, and security data into a single, scalable platform that integrates directly into IDEs and CI/CD pipelines.
Learn More
The latest from Dynatrace
Hear more from our sponsor
TRENDING STORIES
Guido Deinhammer is a chief product officer at Dynatrace with a technical background in developing enterprise software and monitoring solutions. Guido looks after a wide range of Dynatrace technologies, from OneAgent to the UI. He's always looking for ways of...
Read more from Guido Deinhammer
Dynatrace sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.