VOOZH about

URL: https://thenewstack.io/service-level-objectives-in-kubernetes/

⇱ Service Level Objectives in Kubernetes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2020-11-11 12:00:18
Service Level Objectives in Kubernetes
contributed,sponsor-cncf,sponsored,sponsored-post-contributed,
DevOps / Kubernetes / Observability

Service Level Objectives in Kubernetes

For a Kubernetes operator, Service Level Objectives (SLOs) can provide a way of characterizing the health of the services running on their clusters.
Nov 11th, 2020 12:00pm by William Morgan
👁 Featued image for: Service Level Objectives in Kubernetes
CNCF sponsored this post.

Cloud Native Computing Foundation sponsored this post, in anticipation of the virtual KubeCon + CloudNativeCon North America 2020 – Virtual, Nov. 17-20.

Service Level Objectives (SLOs) are an increasingly common tool for software reliability. Popularized by Google, SLOs are usually characterized as a tool for service owners to balance the risks versus rewards for making changes to a given application. Should we ship this new product feature, given that we just had an outage? How do we quantify that risk and have a conversation about it with all stakeholders?

Less well-known is that SLOs can also be a powerful tool for platform owners. For a Kubernetes operator, SLOs can provide a way of characterizing the health of the services running on their clusters, that can be interpreted without any knowledge of the underlying application or its operational history. This means platform owners can use SLOs to sort through a huge set of applications and rapidly determine if anything needs immediate attention — especially critical as the number of applications grows.

SLOs in a Nutshell 

William Morgan
William is the co-founder and CEO of Buoyant, the creator of the open source service mesh projects Linkerd. Prior to Buoyant, he was an infrastructure engineer at Twitter, where he helped move Twitter from a failing monolithic Ruby on Rails app to a highly distributed, fault-tolerant microservice architecture. He was a software engineer at Powerset, Microsoft, and Adap.tv, a research scientist at MITRE, and holds an MS in computer science from Stanford University.

At its most basic level, an SLO is simply a metric, a goal for that metric, and a time period. For instance: “the success rate for service A must be at least 99.7% percent over the past 30 days.” The metric is known as the “service level indicator” (SLI) and the goal is the “objective.”

The output of an SLO is the error budget, which is a measure of how the metric is doing relative to the goal over that time period. For example, if your SLO is defined as 99% successful over a 30-day period, and the success rate over that period is 99.75%, your error budget is 75%.

The error budget is a measure of how much leeway is remaining before the objective is violated. For a service owner, the error budget represents a way to quantify the amount of risk they can incur — an indicator of whether you should hold off on new deployments until things cool off, for example.

But for a platform owner, the error budget acts as something else: a kind of context-free judgment of the health of the service. If the error budget for an SLO is 100% and steady, then we know things are going well for that service. If it’s close to 0 (or below 0!) and dropping, then we know things are going poorly. It doesn’t matter what the underlying metric is, what the application does, or how it performed last month — the error budget is a universal number.

This universality and context-free nature of error budget values is the key to the value an SLO provides in the context of the Kubernetes platform.

👁 Image

SLO compliance, SLIs, and error budget for a Kubernetes workload (Dive dashboard).

SLOs for Kubernetes Platform Owners

The Kubernetes platform owner may be responsible for hundreds or thousands of applications running across tens or hundreds of Kubernetes clusters. And they may understand none of them. (Arguably, this lack of understanding is the mark of a healthy platform!)

In this context, the utility of metrics starts to break down. If a given service currently has a 97% success rate, is that good or bad? If it drops to 95%, is that cause for concern? If its success rate is 100%, but the 99th percentile of latency is slowly raising to 1200ms, should anyone be paged? Without context about how this service is supposed to be behaving, there’s no way for the platform owner to know.

SLOs provide a way out of this situation. In contrast to metrics, the universality of error budgets actually does give platform owners a way to make value-based judgments about the health of those services. In other words, by wrapping these metrics in SLOs, the platform owner gains a universal way of assessing service health, observing trends, and identifying which services need immediate attention.

The Challenges of Using SLOs

Despite their many benefits, implementing SLOs for a Kubernetes platform can be difficult. As a first challenge, consistent SLOs require consistent metrics — what are the success rates, latencies, etc, of your Kubernetes workloads at any point in time? Next, you must formulate the SLOs with appropriate SLIs, objectives, and time periods — what is the “right” parameterization of SLOs that you want to track? Finally, you must actually compute the error budgets. While the math is simple, selecting the correct metrics data points from the correct workloads during the correct time periods can be non-trivial, especially when services and workloads change over time.

For the metrics challenge, at least, there are some simple options. A service mesh like the open source CNCF project Linkerd can provide a consistent and uniform layer of metrics for all HTTP and gRPC services on your Kubernetes clusters, without requiring any configuration.

Formulating the SLOs on top of these metrics is the next step. Here, there are a spectrum of options — ranging from “get all stakeholders in a meeting and hammer it out from first principles” to “just use the current metric value as the objective and see what happens.” Tooling here can help immensely, especially with the latter approach, by providing suggestions based on historical data.

Finally, computing the error budget. The Kubernetes ecosystem provides good options here in the form of open source tools like Prometheus and Grafana — with Linkerd metrics in place, for example, SLOs can be expressed as Prometheus queries and error budgets plotted as Grafana dashboards. Alternatively, hosted tools like Dive can make use of these same Linkerd metrics and allow you to set up and track SLOs with the click of a button, across arbitrary numbers of clusters and workloads.

No matter which approach you take, adopting SLOs can play a vital role in helping platform owners understand the state of their applications in a way that’s both uniform and context-free, which means they can prioritize their efforts and ensure that both the applications — and the platform on which they run — remain reliable.

To learn more about Kubernetes and other cloud native technologies, consider coming to KubeCon + CloudNativeCon North America 2020, Nov. 17-20, virtually.

The Cloud Native Computing Foundation is a sponsor of The New Stack.

Feature image via Pixabay.

The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure including Kubernetes, OpenTelemetry, and Argo. CNCF is the neutral home for cloud native collaboration, bringing together the industry’s top developers, end users, and vendors.
Learn More
The latest from CNCF
TRENDING STORIES
William is the co-founder and CEO of Buoyant, the creator of the open source service mesh projects Linkerd. Prior to Buoyant, he was an infrastructure engineer at Twitter, where he helped move Twitter from a failing monolithic Ruby on Rails...
Read more from William Morgan
CNCF sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.