VOOZH about

URL: https://thenewstack.io/running-more-low-severity-incidents-is-improving-our-culture/

⇱ Running More Low-Severity Incidents Is Improving Our Culture - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-08-26 07:59:56
Running More Low-Severity Incidents Is Improving Our Culture
sponsor-firehydrant,sponsored-post-contributed,
Observability / Tech Culture

Running More Low-Severity Incidents Is Improving Our Culture

One simple step we made to improve incident management is to expand the definition of an incident and provide a safe and predictable place to investigate.
Aug 26th, 2022 7:59am by Dan Condomitti
👁 Featued image for: Running More Low-Severity Incidents Is Improving Our Culture
Feature image via Pixabay
FireHydrant sponsored this post.

I’ve seen some great discussions recently about moving away from a culture where incidents are a four-letter word. Some of the most prevalent — and best — advice on the subject encourages teams to declare more incidents and democratize who can declare incidents.

Dan Condomitti
Dan is a co-founder and the head of engineering at FireHydrant, an incident management platform that lets you integrate tools, streamline processes and quickly resolve incidents all without leaving Slack. In this role, Dan draws on his experience at companies like Red Hat and CoreOS to guide the team in building the incident management technology used by companies like CircleCI, Spotify, Snyk and thousands of others.

While that advice seems fairly straightforward, I’ve found that there are often cultural barriers that make it hard for teams to put it into practice.

Telling people to declare incidents doesn’t erase the fear that often comes with it. In my experience, a lot of small things, done well and consistently over time, are what ultimately amount to a positive incident management culture.

Our own internal engineering team at FireHydrant has been building steps into our incident management program intended to increase psychological safety around incident declaration and management.

One small and rather simple step we made recently is to expand the scope of what is considered an incident and provide a safe and predictable place to investigate. It’s had an outsized, positive impact on the team, so I wanted to share the details in hopes it might help others.

FireHydrant gives you incident management for every developer. Integrate your tools, streamline your processes, and quickly resolve incidents — all without leaving Slack.
Learn More
The latest from FireHydrant

If It Seems Weird, Call It an Incident

We tend to think of incidents as those really, really bad — and embarrassingly public — moments where customers are frustrated, the organization is losing money and everything is, well, on fire. But in reality, there’s a spectrum of activities that we can classify as incidents. And the more we normalize lower-impact incidents, the more confidence and experience we build for Sev1 situations.

Our first step toward solving some of that fear around incidents was to ask the team to start thinking differently about how they define incidents. At its core, this was a cultural shift more than a process change. And as is the case with many cultural changes, it has been a gradual shift that folks on the team move toward as they continue to see the behavior modeled.

There wasn’t a grand plan to kick this off; it just kind of happened. Someone on the team declared an incident for a sharp increase in transient failures in our test suite. This doesn’t fit the classic definition of an incident for most teams, but it was a great way for us to capture context that had spread across multiple communication channels, understand the issue, implement a fix and prioritize work in the future to improve the dependability of our test suite. Over the course of the incident, we realized how much of this work is happening already — teams are just avoiding the label.

In the end, it came down to a memory leak in a specific version of Node.js resulting in test timeouts. Five people were involved over a few days, but despite being labeled an “incident,” it didn’t derail their daily work. If anything, it provided the kind of structure and space that lower cognitive load rather than raise it. We used FireHydrant to provide structure and draw context about our system (recent changes, dashboards, etc.) into a shared space.

It felt good to give a name to the work that so often happens as a hard-to-prioritize distraction, so we started talking about it — a lot. We’re a small-enough company that other teams wanted to try out the if-it-seems-weird approach to incident declaration. When our marketing team was getting error notices while trying to deploy to the site, someone declared an incident. Ten minutes of poking around in Netlify, Gatsby, GitHub and Contentful, and we discovered a permissions issue that was easy to fix and unblocked a full day’s worth of work.

Right-Size the Response

We wanted to reinforce the behavioral change with technical foundations. How could we give people a safe way to investigate whether something actually was an incident without worrying about the implications that usually come with incident declaration like alerting and distracting your coworkers?

We created a new severity type of “triage” with the simplest possible runbook condition: Create a Slack channel. This ensures that if something simply feels off, the engineer who spots the problem has a place to write down stream-of-consciousness or play-by-play notes and see what happens next. We can add charts we looked at, alerts we saw, a running history of everything we thought contributed to the problem (even red herrings), and then reference it later. If it becomes clear that there’s a larger issue at play, it’s easy to escalate and get the right people up to speed because the information is already documented in the channel.

👁 Image

If the severity doesn’t evolve, it still provides valuable insight into the health of our systems. At a previous company, my CTO approached every problem with a notebook. He’d write down all his notes on an incident then look back later with more context. Those notes might tell him that a major problem started six weeks back with a minor incident that then took a back seat to work that was deemed higher priority at the time.

Where Do We Go from Here?

As more “triage” severity incidents are declared and resolved, it’s becoming clear that our team’s shared definition of an incident is changing. And with that redefinition, we’re seeing evidence that incidents are becoming less scary for everyone involved.

👁 Image

This redefinition has me thinking a lot about where we go next. It’s become more obvious to me lately that incidents are in the eye of the beholder. An engineer’s definition of an incident might be very different from someone on the customer support, sales or marketing team who is blocked from doing an aspect of their job, or doing it without a lot of friction.

As we continue to evolve our incident definition, it makes sense for us to develop a deeper understanding of how our customers, internal and external, use our services. The truth is, every incident matters to someone. More on that to come as we continue to build and refine our internal incident management program at FireHydrant. And if you liked (or hated) what you read here, I’d love to hear from you. I’m dan@firehydrant.com.

FireHydrant gives you incident management for every developer. Integrate your tools, streamline your processes, and quickly resolve incidents — all without leaving Slack.
Learn More
The latest from FireHydrant
TRENDING STORIES
Dan Condomitti is a co-founder and the head of engineering at FireHydrant, an incident management platform that lets you integrate tools, streamline processes, and quickly resolve incidents all without leaving Slack. In this role, Dan draws on his past experience...
Read more from Dan Condomitti
FireHydrant sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.