VOOZH about

URL: https://thenewstack.io/rethinking-observability/

⇱ Rethinking Observability - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-01-03 10:00:26
Rethinking Observability
Observability

Rethinking Observability

Two best practices to better align observability practices with the goal of delivering exceptional user experiences.
Jan 3rd, 2024 10:00am by Chenxi Wang, Ph.D.
👁 Featued image for: Rethinking Observability
Feature image via Pixabay.

Organizations need a way to understand what is happening in highly distributed systems. Today, observability is the approach of choice, and suddenly observability projects are everywhere.

But observability has not delivered its promise. Many organizations have tried it for environments large and small. In many cases, observability projects resulted in a considerable amount of data and cognitive overload, without bringing visible change to the reliability of the system.

In addition, implementing observability requires a massive integration effort: Developers have to instrument their code to emit the right traces, metrics and logs to make the system observable. Instrumentation is still very much an art today. Little is known of the most efficient way to instrument code, resulting in many trial-and-error efforts and friction everywhere.

But perhaps more importantly, observability teaches you to focus on operations-centric, myopic metrics rather than thinking about the service like a user: what the user wants to achieve with the service, how she wants to achieve them, etc. These are the levels of insight not readily available through low-level metrics.

The result? Reliability engineering teams are overwhelmed with an explosion of data, but still lack the insight or tools to drive meaningful outcomes in system reliability or user experience.

Critical User Journeys

We argue that instead of observability, you need to focus on critical user journeys (CUJ) and mechanisms to deliver and preserve CUJs.

A CUJ is a sequence of user interactions vital for the successful operation of a service. It directly affects the user’s satisfaction and engagement with the service. A CUJ can be anything from checking out a shopping cart to retrieving an account balance or submitting a form response.

By focusing on critical user journeys, we can discard useless details about the internal behavior of services. Consequently, we can direct our attention and resources on what truly matters to the user — for example, moving away from “service_db_be is alerting” to “half of the login CUJ is broken.” A critical benefit of the CUJ approach is that you start to view the service through the lens of the user, a mindset mostly missing from current observability approaches.

Furthermore, magic happens when you combine critical user journeys with service-level objectives (SLOs).

An SLO defines specific, measurable goals that the service aims to achieve. When you apply SLOs to a specific user journey, you have a measure of true user experience as well as a mechanism for predicting and managing that experience.

Monitoring a critical user journey with a defined service-level objective can deliver proactive signals that reliability thresholds are in danger of being violated. For example, a Taylor Swift ticket overload incident can happen if you have no way to maintain separate service-level objectives for different user journeys. Under extreme bot activity and high demand from real human users, if you could not divert system resources away from non-essential traffic to preserve service-level objectives for ticket purchasing journeys, that’s when your ticket-serving services can melt under pressure.

Journey-Specific SLOs

Like tracing, CUJs observe data across services, but additionally they aggregate signals across transactions to identify patterns and trends that traditional tools might miss. By looking at critical user journeys as a whole rather than system performance in isolation, operations teams as well as business decision-makers can be informed where and when they should apply effort to build better, more robust and reliable user experiences.

In practice, one way to achieve CUJs with SLOs is smart traffic management. In sophisticated environments, operations teams increasingly use traffic shaping as a strategic tool to deliver desired business outcomes, enhance overall user experience and service reliability.

More specifically, traffic shaping allows you to:

  • Prioritize critical user journeys and maintain SLOs: Traffic shaping allows you to redirect network and system resources to focus on critical user journeys. By guarding the paths that are most important for user satisfaction and business outcomes, traffic can be managed to ensure that critical user journeys receive the bandwidth and speed they require. During peak load times, traffic shaping can deprioritize less critical traffic and apply graceful degradation to critical user experiences, ensuring that the performance of high-priority journeys remains within SLO thresholds.
  • Enhancing user experience: Predictive and adaptive traffic shaping can significantly improve the end-user experience. Advanced traffic shaping tools can predictively adjust call patterns based on user behavior, time of day or other factors. This proactive approach, rather than reactive traffic shaping, helps in maintaining user journey SLOs consistently and delivering a seamless and engaging user experience.

An astute user would know that all this is just a different way of delivering observability. But we feel strongly that the future of observability lies in offering a more comprehensive and accurate measure of user experiences. CUJs and journey-specific SLOs represent a significant stride in moving beyond the confines of system-centric metrics and toward a more user-centric approach.

By embracing concepts like critical user journeys and journey-specific SLOs, we can better align observability practices with the ultimate goal of delivering exceptional user experiences. This is not just about keeping pace with technological advances; this is about rethinking how we measure reliability from a user-centric perspective.

TRENDING STORIES
Dr. Chenxi Wang is the Founder and General Partner of Rain Capital, a Silicon Valley-based venture fund investing in emergent technology solutions in Cyber, Infrastructure, and AI. Dr. Wang also serves on the Board of Directors for MDU Resources, a...
Read more from Chenxi Wang, Ph.D.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.