![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
To bring observability with OpenTelemetry (OTEL) into your organization, you need a rollout strategy to keep everyone on the same page and avoid different teams doing their own things. There are three steps to getting started with OpenTelemetry:
Folks in your organization won’t know you want to use OpenTelemetry unless you tell them. This is where a little advocacy goes a long way.
Start by communicating OpenTelemetry’s benefits so that people in your organization understand why you want to use it. These include:
Rolling out OpenTelemetry across an organization is a big initiative. By declaring that OpenTelemetry is happening, people in your organization know it’s serious, especially when the directive comes from leadership. Make the announcement through a combination of Slack, Teams or whatever collaboration tool your organization uses and a town hall.
But people just don’t want to be told what to do. Transformation fatigue or initiative fatigue are real, so you must…
If you want people to follow you down this path, they need to know what they’re getting into. To explain what OpenTelemetry is and its benefits, use the resources you have in your organization, such as engineers who “geek out” on observability.
Put a call out to folks who are interested in OpenTelemetry, recruit them to be your champions and consider forming an observability practices team with them. It can serve as an advocacy team, focusing on the benefits of OTEL, creating practices around its implementation and rollout, and becoming subject matter experts in OpenTelemetry. Include engineers who can dig into OpenTelemetry to produce a set of practices within the organization and become the go-to folks for any OTEL-related questions. Recruit a mixture of individual contributors and managers. They don’t have to be OpenTelemetry experts; they can grow into that. What’s important is that they believe in it and want to help roll it out.
Also, connect with folks outside your company to learn how other organizations are rolling out OTEL. Join the OTEL End User Working Group (EUWG) on CNCF Slack to connect with fellow OTEL practitioners who can share tips. Some may be willing to speak with your engineers to answer burning questions or concerns. Also, I’m one of the co-chairs of the EUWG, so I can help make some introductions!
Create an OTEL rollout plan with milestones and dates for reaching them to demonstrate your commitment to the project. Make sure your timelines are realistic by getting input from your engineers and managers. Have them work with your observability practices team to put a plan in place, then communicate the plan.
During planning, ask your engineers:
You need to understand your system’s landscape to put your plan together accurately.
Your application code probably comprises multiple services. For each service, take inventory of the language it’s written in so you can determine what OTEL instrumentation library (or libraries) your dev teams need to use.
Also inventory any third-party frameworks and libraries (e.g., Python Django, Java Hibernate) you’re using, since OTEL auto-instrumentation is available for many popular libraries and frameworks.
Finally, identify your homegrown frameworks and libraries. More on that shortly.
Next, dig in a bit deeper to identify your most critical transactions. You’ll want to instrument them first because, according to OpenTelemetry co-founder and Lightstep director of developer education Ted Young, “It ensures that complete traces are being created, and you can start to investigate important issues early, without having to wait for the entire organization to complete their migration.”
If any code has already been instrumented, find out if it’s using OpenCensus, OpenTracing or something else. OpenTelemetry is backward-compatible with OpenTracing and OpenCensus, so you won’t need to make any major code changes initially. However, plan to eventually migrate over to OpenTelemetry to take advantage of all it offers. For example, OpenCensus and OpenTracing don’t support logs and metrics or the integration between traces, metrics and logs. If you’re using any homegrown libraries or frameworks, be prepared to reinstrument your application using OpenTelemetry.
Along with application tracing data, you’ll want to send metrics data to your observability backend for a nice holistic system view. This means you need to identify your metrics sources. Is it Kubernetes? Kafka? Docker? Nomad? Virtual machines? Also, ask what application metrics you want to capture.
Now you’re ready to start instrumenting with OpenTelemetry. Here are some recommended instrumentation practices to help teams get started.
If your system is experiencing frequent reliability issues, this is usually a sign you need better observability. Therefore, you might need to delay some planned features to instrument your code or reevaluate what’s already been instrumented.
If your language supports auto-instrumentation, then take advantage of it. This is a low barrier to entry into OpenTelemetry, with low effort and high reward. Auto-instrumentation is currently available for Java, .NET, Python, JavaScript and PHP. There is also auto-instrumentation in Go, but the approach is slightly different.
You’ll eventually want to supplement your auto-instrumentation with manual instrumentation, so you’ll want to look at instrumenting homegrown libraries and frameworks. This will give you most of the tracing coverage you need as a chunk of your code will probably touch these libraries and frameworks.
It is possible to overly instrument, which means you end up with so much irrelevant data that it becomes hard to troubleshoot. This can often happen from auto-instrumentation. So, once you start auto-instrumenting your code, take a step back to see if the libraries being auto-instrumented are ones you need to collect instrumentation from. Fortunately, there are ways to limit what gets auto-instrumented, including in Java and Python.
Just as test-driven-development (TDD) is about writing tests alongside your application code, observability-driven development (ODD) is the act of adding instrumentation as you write your application code. By instrumenting as you code, you know exactly what to instrument, as the code is fresh in your mind. It also prevents new technical debt related to observability, as you won’t have to go back to your code to instrument it later.
Application teams should instrument their own code. They should never rely on an external team to instrument their code, because they know their code best. They work on it daily and know what to look for when they troubleshoot. It’s tempting to get a third party to instrument your application code when you’re in a time crunch, but it will not end well.
Although you can send telemetry data directly from your instrumented code to your observability backend, you should use at least one OpenTelemetry Collector. The OTEL Collector acts as a central point for collecting and processing data from multiple data sources at once and then exports the data to your preferred observability backend for analysis. If you decide to change observability backends, you can send data simultaneously to multiple backends to decide which you like best by simply updating the Collector’s YAML configuration file. Once you’ve selected an observability backend, you just need to change the YAML in the OTEL Collector.
Rolling out OpenTelemetry in your organization is no trivial task, but having guidance for getting started goes a long way. Remember to communicate, do your homework and follow instrumentation practices. And if you get stuck, we’re here to help!