VOOZH about

URL: https://thenewstack.io/why-chaos-engineering-isnt-just-for-operations/

⇱ Why Chaos Engineering Isn’t Just for Operations - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-02-09 03:00:59
Why Chaos Engineering Isn’t Just for Operations
feature,sponsor-chaosnative,sponsored,sponsored-event-coverage,
CI/CD / Observability / Security

Why Chaos Engineering Isn’t Just for Operations

Increasingly, developers are directly involved with testing applications across the CI/CD pipeline. Here's what that changing role means for DevOps teams.
Feb 9th, 2022 3:00am by B. Cameron Gain
👁 Featued image for: Why Chaos Engineering Isn’t Just for Operations
Featured image by Hans-Peter Gauster on Unsplash. 
ChaosNative sponsored this post. Insight Partners is an investor in ChaosNative and TNS.

The days are largely gone when a developer creates code or an application, uploads it and then lets operations engineers take over for the rest.

With the massive adoption of highly distributed Kubernetes and microservices environments, the dynamics have shifted. Now, DevOps teams increasingly share tasks and participate in workflows previously relegated to operations or software reliability engineers (SREs).

The end result is there is much work to be done by the developer that is not necessarily tied strictly to application development. Among other things, devs will often be directly involved with testing — and, increasingly, chaos engineering with operations and other teams across a continuous integration/continuous development (CI/CD) pipeline.

“My assessment during the past two to three years is that the dynamism of cloud native is forcing other personas, such as developers, to integrate chaos engineering in their workflows” along with operations and QA teams and SREs, Uma Mukkara, ChaosNative‘s CEO and maintainer of LitmusChaos, told The New Stack ahead of ChaosNative’s annual users’ conference, Chaos Carnival, in January.

Here, we explore why and how chaos engineering involves the entire production pipeline with developer support, and how it should properly be implemented and integrated into CI/CD.

What Is Chaos Engineering?

Chaos engineering can be described as finding and fixing weaknesses in distributed applications and their interactions with different components, such as microservices and APIs, when faults are purposely introduced as experiments.

By introducing “chaos at will” through experimentation, it is possible to help avoid and to be better prepared for the eventual outcomes of failure, Mukkara said. Improvements in the meantime to recovery (MTTR) following an outage is one example of the benefits chaos engineering offers.

A fault is injected into an application, service, network or even hardware in order to induce an application or service to malfunction in some way as the first step in a chaos experiment. “It’s an art of preventing losses at large,” Mukkara said.

The second and most important part of the three-part process is steady-state hypothesis validation to see if a service works the way it should once faults are induced.

For example, transactions that a service offers should continue to maintain a certain task completion rate if the network connections are functioning at only 80% of the load in order. The experiment is used to confirm the so-called “steady-state hypothesis validation.”

CEO @Uma_Mukkara ‘s #keynote: “Chaos engineering is also about integrating the scale of experiments and observation of chaos execution, with your existing monitoring and observability systems.” https://t.co/v8KzrcYmIL #ChaosCarnival2022 #keynote @chaosnative pic.twitter.com/EIevkBosPP

— BC Gain (@bcamerongain) January 28, 2022

The third part of chaos engineering consists of observability. “This involves a lot of monitoring systems for business-critical services, and when you introduce chaos, you are able to see if there is sufficient recovery so that the service is maintained in a viable way,” Mukkara said.

During a Chaos Carnival conference talk, Henrix Rexed, senior staff engineer for Dynatrace, showed how observability plays a major role in chaos engineering. With the use of LitmusChaos and other tools, he showed how Prometheus and Dynatrace are useful for gathering the required metrics for Kubernetes clusters.

#Observability is key. @Dynatrace‘s Henrix Rexed: Chaos engineering requires “the right level of observability” for metrics “we need to collect and understand.” https://t.co/v8KzrcYmIL #ChaosCarnival2022 #keynote @chaosnative @thenewstack pic.twitter.com/8haRKkJPgg

— BC Gain (@bcamerongain) January 27, 2022

“You need the right level of observability for chaos engineering,” Rexed said.

The Developer’s Role in Chaos Engineering

A developer’s role in creating applications for Kubernetes and microservices environments can cover a number of tasks, involving both direct and indirect access to clusters. With a CI/CD workflow, for example, the developer might use Jenkins CI to regularly change and commit their code for peer review.

Once reviewed and approved, the code is merged with Git and the developer updates the Kubernetes YAML file to reference the latest artifacts or images built with Jenkins.

The developer might undertake more testing, such as determining how the application will function in a highly interdependent environment with microservices. In case of failure, the application is delegated back to the development team.

The application or new feature update might also fail in production, and Kubernetes lets you roll the application back to a previous version. In both cases, the developer will likely need to be able to troubleshoot and monitor logging and performance data for Kubernetes to fix the code.

The developer’s tests not only involve testing how the application performs in the stack, but how the entire stack with the new code interacts with other services in different environments, APIs and interfaces. This is where chaos engineering begins to become relevant.

Testing is typically limited to gauging the performance of a single component or service.  Chaos engineering, on the other hand, extends beyond traditional testing. It involves the validation of a dependent component required to deliver a service, such as an app or a combination of microservices that run in a network, Mukkara said.

Chaos engineering also involves observing what happens when a fault is introduced in the network to see if the app or microservices continue to run as they should.

An example might involve seeing what happens when a failure is induced in a Stripe payment system API in a mock environment. The chaos experiment will gauge whether or not the service continues to function 99.99% of the time while maintaining the necessary transactional-processing speed if the service properly switches to an alternative API if the Stripe system fails.

“Chaos engineering helps to ensure that no matter what, there is no financial liability due to a network, microservices, API or another failure in the environment that could potentially interrupt a service,” Mukkara said.

ChaosNative Inc. provides products and services for the reliability of cloud native DevOps built on top of the popular open source Chaos engineering project LitmusChaos. ChaosNative offers the hosted Litmus service at cloud.chaosnative.com. ChaosNative and TNS are under common control.
Learn More
The latest from ChaosNative

The developer can integrate chaos engineering from the beginning of the development process once executable code is added to a container image and is integrated in different environments. This might be done, Mukkara suggested, by testing the code performance against failures in a Google Cloud Platform, Microsoft Azure or another network.

“Cloud native developers need to start thinking about chaos engineering at the beginning of the production cycle,” he said. “You write your code and test it before doing chaos testing to find if there are any weaknesses” once deployed in the different environments.

Chaos for All Stakeholders

While developers often have a lot to gain from chaos engineering, it should remain a team effort. Many DevOps teams might opt for developers and QA teams to conduct chaos experiments jointly, or an SRE might take the lead in the process.

Once services are deployed, operations teams will often see performance issues and solve them with chaos engineering. “It is actually difficult to define which teams must absolutely use chaos engineering,” Mukkara said. “But everybody can benefit.”

While implementing chaos engineering can seem intimidating from the outset, developers, as well as QA and operations teams, need to embrace the practice. Reduced to its essentials, chaos engineering ensures reliability.

Metrics and data always reveal something. Container Solutions’ Charlotte Mach says: “Even if an experiment fails, or doesn’t meet your expectations, you still learn something, right?” https://t.co/v8KzrcYmIL #ChaosCarnival2022 #keynote @thenewstack #ChaosEngineering pic.twitter.com/2J8Xv7loLz

— BC Gain (@bcamerongain) January 28, 2022

“Chaos engineering can help DevOps teams have a feeling of safety when trying something new,” said Charlotte Mach, an engineering manager at the cloud native consulting company Container Solutions, during The New Stack’s pancake breakfast panel at Chaos Carnival.

“What happens when something breaks or we break something and do we actually get the outcome that we want to have or did something else happen?”

Chaos engineering, Mach said, “is kind of like a safety net you can give people in the beginning” of the production cycle or anywhere else during CI/CD.

ChaosNative Inc. provides products and services for the reliability of cloud native DevOps built on top of the popular open source Chaos engineering project LitmusChaos. ChaosNative offers the hosted Litmus service at cloud.chaosnative.com. ChaosNative and TNS are under common control.
Learn More
The latest from ChaosNative
TRENDING STORIES
BC Gain is founder and principal analyst for ReveCom Media. His obsession with computers began when he hacked a Space Invaders console to play all day for 25 cents at the local video arcade in the early 1980s. He then...
Read more from B. Cameron Gain
ChaosNative sponsored this post. Insight Partners is an investor in ChaosNative and TNS.
SHARE THIS STORY
TRENDING STORIES
ChaosNative and Dynatrace are sponsors of The New Stack.
TNS owner Insight Partners is an investor in: Pragma, ChaosNative.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.