![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Today’s software is orders of magnitude more complex than the software of 20+ years ago, which has brought new challenges when it comes to troubleshooting our code. Fortunately, we’ve come pretty far in understanding how our applications are performing and where issues are occurring by implementing observability into our systems.
However, it’s not just software that has evolved — the process of creating and developing it has also changed. DevOps introduced the concept of CI/CD. With delivery cycles shortening from monthly, to quarterly, to now weekly or even multiple times a day, we’re embracing automation across the software delivery pipeline.
Unfortunately, observability for CI/CD pipelines has not progressed much compared to application software. Considering these pipelines are the backbone of the software delivery process, it’s surprising: If you don’t have visibility, then how do you troubleshoot issues when something goes wrong and you can’t get software into production?
That’s what we’ll focus on in this article: observability of CI/CD pipelines. First, we’ll define a few things; then, we’ll dive into why being able to observe pipelines matters and how to make them observable; finally, we’ll wrap up by talking about some of the remaining challenges.
Here are some definitions to know:
There are multiple definitions of observability, so we’ll narrow it down to our favorite:
Observability, or o11y (pronounced “ollie”), lets you understand a system from the outside by letting you ask questions without knowing the inner workings of that system. Fun fact: The 11 in “o11y” represents the number of characters between the “o” and the “y” in the word “observability.”
This means that even though you don’t understand all the nitty-gritty underlying business logic of a system, the system emits enough information for you to follow the breadcrumbs to answer: “Why is this happening?” However, you can’t have observability if your system doesn’t emit information. How do you get that information? One way is with OpenTelemetry.
OpenTelemetry (OTel) is an open source observability framework for generating, collecting, transforming and exporting telemetry data. It provides a set of APIs, software development kits (SDKs), instrumentation libraries and tools to help you accomplish this. Since its official inception in 2019, it has become the de facto standard for application instrumentation and telemetry generation and collection, used by companies including eBay and Skyscanner.
One of its biggest benefits is freedom from vendor lock-in. You can instrument your applications once and send your telemetry to whichever backend works best for you. It also provides some pretty cool tools, such as the Collector.
The Collector is a vendor-neutral service used to ingest, transform and export data to one or more observability backends.
👁 Diagram of the OTel Collector components
The Collector consists of four main components that access telemetry:
You can think of the OTel Collector as a data pipeline.
CI/CD is an automated approach to software delivery that draws on two key practices:
👁 CI/CD pipeline GIF with a cat moving around it
Automated pipelines enable fast product iterations by allowing you to get any new features, bug fixes and general updates out to your customers faster. They remove the risk of manual errors and standardize the feedback loop to your developers.
When your pipeline is healthy, your team can write, build, test and deploy code and configuration changes into production continuously. You can also improve or achieve development agility, which means you can change your operations and minimize the time it takes to figure out whether those modifications had a positive or negative impact on your application’s health.
Conversely, when your pipeline is unhealthy, you may run into one or more of the following problems:
👁 Cat in a burning room saying, "this is fine."
Although pipelines may not be a production environment external users interact with, they’re most certainly a production environment that internal users — e.g., software engineers and site reliability engineers (SREs) — interact with. Being able to observe your prod environment means:
CI/CD pipelines are run by code that defines how they work, and despite your best and most careful efforts, code can still fail. Making application code observable helps you make sense of things when you run into production issues. Similarly, having visibility into your pipelines can help you understand what’s going on when they fail.
Having observable pipelines helps answer questions such as:
To answer these questions, you need to collect information about your pipelines. But what should that information be? Capture things like:
Recall that a system is observable when it emits enough information to answer the question, “Why is this happening?” First, you need a means to emit that information; then, you need a place to send it to; and finally, you need to analyze it and figure out what you need to fix.
This is where OpenTelemetry comes in. You can implement OpenTelemetry in your systems to emit the information you need to achieve observability of your systems. And just like you use it for applications, you can also use it for CI/CD pipelines! You still need to send the generated telemetry to a backend for analysis, but we’ll focus on the first piece, instrumentation.
OpenTelemetry makes a lot of sense for instrumenting CI/CD pipelines because many people already instrument applications with it; adoption and implementation have steadily increased in the last couple years.
Currently, this is a bit of a mixed bag. There are:
You can also integrate these tools into your CI/CD pipelines; they emit OpenTelemetry signals, thereby helping make your pipelines observable:
This diagram shows how to gain pipeline observability with some of the tools mentioned above. Suppose you’re building and deploying a Java application. You’re using Jenkins to orchestrate build and deployment.
👁 OTel-enabled Jenkins CI/CD pipeline
While it makes sense to use OpenTelemetry to enable CI/CD pipeline observability, there is a lack of standardization, and the tooling landscape is kind of all over the place.
OpenTelemetry isn’t built into most CI/CD tooling. And while there’s a desire to add observability capabilities to CI/CD tools like GitLab and GitHub Actions, these initiatives have been slow-moving. For example, while there has been activity on the GitLab request for pipeline observability with OTel, that item has been open for two years. The OTel proposal for observability of CI/CD pipelines was opened in January 2023, but (as of November 2023), there hasn’t been activity since July.
Therefore, you’re at the mercy of individuals and organizations who create their own thing if you want to use that tooling. What happens if they decide not to maintain these tools anymore?
Making your CI/CD pipelines observable helps you troubleshoot them more effectively, achieve development agility and gain insights into their inner workings so that you can tweak them to help them run more efficiently.
A healthy pipeline means you can write, build, test and deploy new code continuously. Conversely, an unhealthy pipeline can mean slower deployments, testing issues and technical debt
You can use OpenTelemetry to add observability into your pipeline; while options are limited at this time, things are moving in the right direction, and we’re excited for what the future of CI/CD holds!
Further reading: