VOOZH about

URL: https://thenewstack.io/beyond-the-3-pillars-of-observability/

⇱ Beyond the 3 Pillars of Observability - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2021-06-09 13:00:56
Beyond the 3 Pillars of Observability
contributed,
DevOps / Observability

Beyond the 3 Pillars of Observability

The traditional three pillars observability -- logs, metrics, and distributed traces -- outdated, overly-focused on technical instrumentation and underlying data formats, rather than outcome.
Jun 9th, 2021 1:00pm by Martin Mao
👁 Featued image for: Beyond the 3 Pillars of Observability
Feature image via Pixabay.
Martin Mao
Martin Mao is the co-founder and CEO of Chronosphere, the company redefining monitoring for the cloud native world. He was previously at Uber, where he led the development and SRE teams that created and operated M3, one of the largest production monitoring systems in the world storing tens of billions of time series and analyzing billions of data points per second in real-time. Prior to that, he was a technical lead on the EC2 team at AWS and has also worked for Microsoft and Google.

Gartner defines observability as the evolution of monitoring into a process that offers insight into digital business applications, speeds innovation, and enhances customer experience. Today, the DevOps movement and cloud-native architecture are enabling digital businesses to become more competitive, which is driving a need for great observability.

Before DevOps, engineers rarely thought about operating the systems they built. Engineers now need to think about building systems that are easier to observe. To better understand how observability impacts outcome, engineers should think about the answers to three critical questions:

  1. How quickly do I get notified when something is wrong? Is it BEFORE a user/customer has a bad experience?
  2. How easily and quickly can I triage the problem and understand its impact?
  3. How do I find the underlying cause so I can fix the problem?

Regardless of what instrumentation exists, and what tools or solutions are employed, the ability to answer the above three questions is what observability should be focused on.

What Observability Is Not

Today, there are many who define observability as a collection of data types — the three pillars: logs, metrics, and distributed traces. Rather than focusing on the outcome, this siloed approach to observability is overly focused on technical instrumentation and underlying data formats.

Simply having systems emit all three data types doesn’t guarantee better outcomes. What’s more, many companies find little correlation between the amount of observability data produced and the value derived from this data.

Break Observability Down into 3 Phases

We’re not the first to criticize the three pillars. We agree with much of the critique that others — like Charity Majors and Ben Sigelman — have put out there. Instead of the three pillars of observability, we’ve developed an approach to observability that is focused on the outcomes instead of the inputs, and we call it the three phases. The phases are focused on positive observability outcomes and the steps teams can take to achieve these goals.

The traditional three pillars observability — logs, metrics, and distributed traces — outdated, overly-focused on technical instrumentation and underlying data formats, rather than outcome.

During each phase, the focus is on alleviating the customer impact — or remediating the problem — as fast as possible. Remediation is the act of alleviating the customer pain and restoring the service to acceptable levels of availability and performance. At each phase, the engineer is looking for enough information to remediate the issue, even if they don’t yet understand the root cause.

👁 Image

Phase 1: Know about the Problem

Knowing an issue is occurring is enough to trigger a remediation. For example, if you deploy a new version of a service and an alert triggers for that service, rolling back the deployment is the quickest path to remediating the issue without needing to understand the full impact or diagnose the root cause during the incident. Introducing changes to a system is the largest source of production issues, so knowing about problems as these changes are introduced is key.

Keys to success:

  • Fast alerting: Shrink the time between a problem occurring and a notification firing.
  • Scope notifications to just the teams that need to act: Scope the problem and route it to the right teams from the start.
  • Improve signal-to-noise ratio: Ensure that alerts are actionable.
  • Automate alert setup: Automated or templatized alerting can help engineers know about problems without a complicated setup process.

Tools and data:

  • Alerts
  • Metrics (native metrics as well as metrics generated from logs and traces)

Phase 2: Triage the Problem

Understanding the scope of an issue can lead to remediation. For example, if you determine that only customers in one experiment group are impacted, turning off that experiment would likely remediate the issue.

To help engineers triage issues, they need to be able to quickly put the alert into the context of understanding how many customers or systems are impacted, and to what degree. Great observability allows engineers to pivot the data and shine a spotlight on the contextualized data to diagnose issues.

Keys to success:

  • Contextualized dashboards: Having alerts directly link to dashboards that show not only the source of the alert, but related and relevant contextual data.
  • High cardinality pivots: Allowing engineers to further slice and dice the data allows them to further isolate the problem.
  • Leverage existing instrumentation: It’s not practical to always assume that every use-case is instrumented perfectly, so it’s important to be able to leverage existing instrumentation, but have them link as best possible for best contextualization.

Tools and data:

  • Dashboards
  • Metrics
  • Logs

Phase 3: Understand the Problem.

Doing a post mortem on an incident is often an exercise in navigating a twisted web of dependencies and trying to determine which service owner you need to work with.

Great observability gives engineers a direct line of sight linking their metrics and alerts to the potential culprits. Additionally, it provides insights that can help fix underlying problems to prevent the recurrence of incidents.

Keys to success:

  • Easy understanding of service dependencies: Identifying the direct upstream and downstream dependencies of the service experiencing the active issue.
  • Ability to jump between tools and data types: For complex issues, you need to repeatedly jump between details given by logs and traces to the trends and outliers given by metrics on dashboards and ideally in a single tool.
  • Time to root cause: Sometimes it’s impossible to avoid having to perform root cause analysis during an incident and in those situations, having probable causes surface in alert notifications or during triage using dashboards reduces time to root cause.

Tools and data:

  • Traces
  • Logs
  • Metrics
  • Dashboards

Conclusion

Great observability can lead to competitive advantage, world-class customer experiences, faster innovation, and happier developers. But organizations can’t achieve great observability by just focusing on the input and data (three pillars). By focusing on the three phases and the outcomes outlined here, teams can achieve the promise of great observability.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Simply.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.