![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Gartner defines observability as the evolution of monitoring into a process that offers insight into digital business applications, speeds innovation, and enhances customer experience. Today, the DevOps movement and cloud-native architecture are enabling digital businesses to become more competitive, which is driving a need for great observability.
Before DevOps, engineers rarely thought about operating the systems they built. Engineers now need to think about building systems that are easier to observe. To better understand how observability impacts outcome, engineers should think about the answers to three critical questions:
Regardless of what instrumentation exists, and what tools or solutions are employed, the ability to answer the above three questions is what observability should be focused on.
Today, there are many who define observability as a collection of data types — the three pillars: logs, metrics, and distributed traces. Rather than focusing on the outcome, this siloed approach to observability is overly focused on technical instrumentation and underlying data formats.
Simply having systems emit all three data types doesn’t guarantee better outcomes. What’s more, many companies find little correlation between the amount of observability data produced and the value derived from this data.
We’re not the first to criticize the three pillars. We agree with much of the critique that others — like Charity Majors and Ben Sigelman — have put out there. Instead of the three pillars of observability, we’ve developed an approach to observability that is focused on the outcomes instead of the inputs, and we call it the three phases. The phases are focused on positive observability outcomes and the steps teams can take to achieve these goals.
The traditional three pillars observability — logs, metrics, and distributed traces — outdated, overly-focused on technical instrumentation and underlying data formats, rather than outcome.
During each phase, the focus is on alleviating the customer impact — or remediating the problem — as fast as possible. Remediation is the act of alleviating the customer pain and restoring the service to acceptable levels of availability and performance. At each phase, the engineer is looking for enough information to remediate the issue, even if they don’t yet understand the root cause.
Knowing an issue is occurring is enough to trigger a remediation. For example, if you deploy a new version of a service and an alert triggers for that service, rolling back the deployment is the quickest path to remediating the issue without needing to understand the full impact or diagnose the root cause during the incident. Introducing changes to a system is the largest source of production issues, so knowing about problems as these changes are introduced is key.
Keys to success:
Tools and data:
Understanding the scope of an issue can lead to remediation. For example, if you determine that only customers in one experiment group are impacted, turning off that experiment would likely remediate the issue.
To help engineers triage issues, they need to be able to quickly put the alert into the context of understanding how many customers or systems are impacted, and to what degree. Great observability allows engineers to pivot the data and shine a spotlight on the contextualized data to diagnose issues.
Keys to success:
Tools and data:
Doing a post mortem on an incident is often an exercise in navigating a twisted web of dependencies and trying to determine which service owner you need to work with.
Great observability gives engineers a direct line of sight linking their metrics and alerts to the potential culprits. Additionally, it provides insights that can help fix underlying problems to prevent the recurrence of incidents.
Keys to success:
Tools and data:
Great observability can lead to competitive advantage, world-class customer experiences, faster innovation, and happier developers. But organizations can’t achieve great observability by just focusing on the input and data (three pillars). By focusing on the three phases and the outcomes outlined here, teams can achieve the promise of great observability.