VOOZH about

URL: https://thenewstack.io/4-key-observability-best-practices/

⇱ 4 Key Observability Best Practices - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-09-01 09:11:29
4 Key Observability Best Practices
sponsor-chronosphere,sponsored-post-contributed,
DevOps / Observability

4 Key Observability Best Practices

Without observability, precious engineering time is wasted trying to sift through data to spot where a problem lies, rather than shipping new features.
Sep 1st, 2023 9:11am by Sophie Kohler
👁 Featued image for: 4 Key Observability Best Practices
Image from alphaspirit.it on Shutterstock.
Chronosphere sponsored this post.

With bigger systems, higher loads and more interconnectivity between microservices in cloud native environments, everything has become more complex. Cloud native environments emit somewhere between 10 and 100 times more observability data than traditional, VM-based environments.

As a result, engineers aren’t able to make the most out of their workdays, spending more time on investigations and cobbling together a story of what happened from siloed telemetry, leaving less time to innovate.

Without the right observability set up, precious engineering time is wasted trying to sift through data to spot where a problem lies, rather than shipping new features — potentially introducing buggy features and affecting the customer experience.

So, how can modern organizations find relevant insights in a sea of telemetry and make their telemetry data work for them, not the other way around? Let’s explore why observability is key to understanding your cloud native systems, and four observability best practices for your team.

Chronosphere, a Palo Alto Networks company, is the observability platform built for control in the modern, containerized world. Recognized as a leader by major analyst firms, Chronosphere empowers customers to focus on the data and insights that matter to reduce data complexity, optimize costs, and remediate issues faster. Visit chronosphere.io.
Learn More
Hear more from our sponsor

What Are the Benefits of Observability?

Before we dive into ways your organization can improve observability, lower costs and ensure smoother customer experience, let’s talk about what the benefits of investing in observability actually are.

Better Customer Experience

With better understanding and visibility into relevant data, your organization’s support teams can gain customer-specific insights to understand the impact of issues on particular customer segments. Maybe a recent upgrade works for all of your customers except for those under the largest load, or during a certain time window. Using this information, on-call engineers can resolve incidents quickly and provide more detailed incident reports.

Better Engineering Experience and Retention

By investing in observability, site reliability engineers (SREs) benefit from knowing the health of teams or components of the systems to better prioritize their reliability efforts and initiatives.

As for developers, benefits of observability include more effective collaboration across team boundaries, faster onboarding to new services/inherited services and better napkin math for upcoming changes.

Four Observability Best Practices

Now that we have a better understanding of why teams need observability to run their cloud native system effectively, let’s dive into four observability best practices teams can use to set themselves up for success.

1. Integrating with Developer Experience

Observability is everyone’s job, and the best people to instrument it are the ones who are writing the code. Maintaining instrumentation and monitors should not be a job just for the SREs or leads on your team.

A thorough understanding of the telemetry life cycle — the life of a span, metric or log — is key, from setting up configuration to emitting signals and any modifications or processing done before getting stored. If there is a high-level architecture diagram, engineers can better understand if or where their instrumentation gets modified (like aggregating or dropping, for example.) Often, this processing falls in the SRE domain and is invisible to developers, who won’t understand why their new telemetry is partially or entirely missing.

You can check out simple instrumentation examples in this OpenTelemetry Python Cookbook.

If there are enough resources and a clear need for a central internal tool, platform engineering teams should consider writing thin wrappers around instrumentation libraries to ensure standard metadata is available out of the box.

Viewing Changes to Instrumentation

Another way to enable developers is by providing a quick feedback loop when instrumenting locally, so that they can view changes to the instrumentation before merging a pull request. This recommendation is helpful for training purposes and for those teammates who are new to instrumenting or unsure about how to.

Updating the On-Call Process

Updating the on-call onboarding process to pair a new engineer with a tenured one for production investigations can help distribute tribal knowledge and orient the newbie to your observability stack. It’s not just the new engineers who benefit. Seeing the system through new eyes can challenge seasoned engineers’ mental models and assumptions. Exploring production observability data together is a richly rewarding practice you might want to keep after the onboarding period.

You can check out more in this talk from SRECon, “Cognitive Apprenticeship in Practice with Alert Triage Hour of Power.”

2. Monitor Observability Platform Usage in More than One Way

For cost reasons, becoming comfortable with tracking the current telemetry footprint and reviewing options for tuning — like dropping data, aggregating or filtering — can help your organization better monitor costs and platform adoption proactively. The ability to track telemetry volume by type (metrics, logs, traces or events) and by team can help define and delegate cost-efficiency initiatives.

Once you’ve gotten a handle on how much telemetry you’re emitting and what it’s costing you, consider tracking the daily and monthly active users. This can help you pinpoint which engineers need training on the platform.

These observability best practices for training and cost will lead to better understanding the value that each vendor is providing you, as well as what’s underutilized.

3. Center Business Context in Observability Data

Deciphering the business context in a pile of observability data can help shortcut high stakes in different ways:

  • By making it easier to translate incidents affecting workflows and functionality from a user perspective.
  • By creating a more efficient onboarding process for engineers.

One way to center business context in observability data is by renaming default dashboards, charts and monitors.

👁 Image

4. Un-Silo Your Telemetry

Teams need better investigations. One way to ensure a smoother remediation process is through an organized process like following breadcrumbs rather than having 10 different bookmark links and a mental map of what data lives where.

One way to do this is by understanding what telemetry your system emits from metrics, logs and traces and pinpointing the potential duplication or better sources of data. To achieve this, teams can create a trace-derived metric that represents an end-to-end customer workflow, such as:

  • “Transfer money from this account to that account.”
  • “Apply for this loan.”

Regardless of whether you’re sending to multiple vendors or a mix of DIY in-house stack and vendors, ensuring that you are able to link data between systems — such as adding the traceID to log lines, or a dashboard note with links to preformatted queries for relevance — will add that extra support for your team to perform better investigations and remediate issues faster.

Explore Chronosphere’s Future-Proof Solution

Engineering time comes at a premium. The more you can invest in getting high-fidelity insights and supporting engineers in understanding what telemetry is available, instrumenting will become fearless, troubleshooting faster and your team will make future-proof, data-informed decisions when weighing options.

As companies transition to cloud native, uncontrollable costs and rampant data growth can stop your team from performing successfully and innovating. That’s why cloud native requires more reliability and compatibility with future-proof observability. Take back control of your observability today, and learn how Chronosphere’s solutions manage scale and meet modern business needs.

Chronosphere, a Palo Alto Networks company, is the observability platform built for control in the modern, containerized world. Recognized as a leader by major analyst firms, Chronosphere empowers customers to focus on the data and insights that matter to reduce data complexity, optimize costs, and remediate issues faster. Visit chronosphere.io.
Learn More
Hear more from our sponsor
TRENDING STORIES
Sophie Kohler is part of the content team at Chronosphere where she writes blogs as well as creates videos and other educational content for a business-to-business audience. 
Read more from Sophie Kohler
Chronosphere sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.