VOOZH about

URL: https://thenewstack.io/how-to-build-past-basic-automated-incident-response/

⇱ How to Build Past Basic Automated Incident Response - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-04-04 11:09:47
How to Build Past Basic Automated Incident Response
contributed,sponsor-pagerduty,sponsored,sponsored-post-contributed,
Observability

How to Build Past Basic Automated Incident Response

Rather than flagging incidents to whoever’s on call, AIR solutions should link to the right people at the right time.
Apr 4th, 2022 11:09am by Michael Cucchi
👁 Featued image for: How to Build Past Basic Automated Incident Response
Featured image via Pixabay.
PagerDuty sponsored this post.

Over the past 18 months, two important trends have emerged in digital operations. First, as organizations doubled down on digital during the pandemic, the volume of incidents teams have to resolve has exploded sevenfold.

Michael Cucchi
Michael is the vice president of product at PagerDuty. He has over 20 years of engineering, product management and marketing experience in the high-tech and software industries. Michael creates and drives PagerDuty's overall product and ecosystem positioning, product strategy, community advocacy and competitive intelligence.

At the same time, customer expectations continued to rise thanks to the continued demand for near-instant response and new dependency on digital ways of doing business.

This double whammy has made manual incident response processes unmanageable for digital operations teams. A new lens on the technologies used to meet this challenge has been dubbed “automated incident response” (AIR) by industry analysts.

Unfortunately, when it comes to AIR, current thinking is too narrow.

Automating the human processes that define incident response is one thing. But to drive much greater value, solutions need to connect these processes with machine learning and automation in real time to deliver operational maturity, continuous improvement and superb customer experiences.

The Push for Automation

While cloud migration and adoption of containers and microservices deliver the agility, scale and speed development teams crave to drive business strategies, it also means more change and complex service dependencies, which in turn causes an exponential increase in the volume of alerts and incidents and the difficulty of resolving them.

Legacy processes are a block on this kind of innovation. In fact, most (91%) organizations agree that traditional IT operations functions were not built for the digital era. Gartner articulates well the challenges this presents from an incident response perspective, in its July 2021 “Hype Cycle for Monitoring, Observability & Cloud Operations” report. First, responders often spend too long trying to identify and contact subject matter experts (SMEs) because operations teams use different methods of managing the on-call roster. And beyond this, contact information is often inaccurate. The distributed nature of teams, complex on-call schedules and different notification preferences make rapid triaging even more challenging. Often, a single source of incident data is also lacking. That’s a recipe for long incident response times, poor outcomes and angry customers. In fact, in the 451 Research report “Practitioners Weigh In: Tips for Modernizing Incident Response,” 75% of organizations agree they spend too much time on IT operations and maintenance. Process and task automation is increasingly viewed as the answer to many of these problems. That same study finds that a third of organizations believe their IT is “mostly automated” and another fifth want to achieve full automation in the near future.

What the Analysts Say

So what should automation in incident response look like? Gartner, in its aforementioned Hype Cycle report, rightly points to manual processes and poor collaboration between teams as the main roadblocks to improvement. AIR solves this by automating most of these incident response steps, in their words:

“AIR solutions automate incident response processes by enabling centralized alert or incident routing. Using a policy or rule-based engine, on-call scheduler, or streamlined collaboration, this can improve operational efficiencies with action-oriented insights.”

This is certainly an important element of AIR, but it must go further than just managing the human-to-human process. Machine learning-powered capabilities exist today that offer much more by first reducing noise and false alarms, and then automatically notifying not just responders on call, but also the specific SME who is best placed to remediate. Combined with task and process automation, escalations can be completely avoided and hours cut out of resolution. Today, event orchestration can even make decisions automatically in real time to accelerate or automate the whole remediation process without needing a human.

Digital Operations under Pressure

Why is real time important? It comes back to those two overarching trends: pressure on digital services and heightened customer expectations. Research shows that time spent, and wasted, on inefficient incident response can have a serious impact. In 2021, 40% of organizations say they have lost revenue because of incidents, 25% have lost customers to rivals and on average they spent $3.4 million in staff time firefighting.

It’s not just the immediate impact of slow incident response that can put customers off. If teams are tied up resolving incidents, they have less time for innovation that could differentiate the brand in an increasingly competitive environment. Some 89% of millennials expect brands to use technology to shape their customer experiences, no matter what kind of business it is. And 60% of American consumers believe online experiences will become more important than in-person ones. One could argue they already have.

The only way to innovate at pace is to solve incidents rapidly and efficiently or avoid them altogether. That means optimizing automation with real-time operations that solve problems automatically, and if needed, mobilize response teams in seconds, drive collaboration and give deep context on digital incidents. Two-thirds of IT and development decision-makers agree that only with real-time digital operations can they reduce the cost of ITOps and accelerate innovation.

Right Expert, Right Time

However, it’s not just about speed. It’s also about joining up human processes with machine automation and adding the intelligence to proactively drive optimal outcomes. As 451 Research explains, when you do need a human, “automatically identifying the correct responders, attaching the appropriate automation … and sending status updates can streamline the incident response process and drive major time savings.”

Rather than automatically flagging incidents to whoever’s on call, as Gartner suggests, AIR solutions should be linking their incident monitoring service with the right people at the right time. Where necessary, they will automatically and proactively connect relevant stakeholders together via an operations hub or cloud. By the time a human is interrupted, automated workflows will already have been initiated diagnostic and remediation steps at the first-responder level, so SMEs don’t even need to get involved.

This isn’t just about delivering an exceptional customer experience and minimizing operational overheads, as important as these outcomes are. It’s about freeing up the time of in-house experts to work on innovation projects crucial to future growth. In so doing, organizations will create a working environment in which the brightest and best want to stay and do exceptional work for them. In a new era of intense competition for coding expertise, that in itself will be a major win.

PagerDuty is the global leader in AI-first operations management serving more than 35,000 organizations worldwide. The PagerDuty Operations Cloud is a comprehensive, multi-product operations cloud platform that sits at the center of the enterprise technology stack.
Learn More
The latest from PagerDuty
Hear more from our sponsor
TRENDING STORIES
Michael is the vice president of product at PagerDuty. He has over 20 years of engineering, product management and marketing experience in the high-tech and software industries. Michael creates and drives PagerDuty's overall product and ecosystem positioning, product strategy, community...
Read more from Michael Cucchi
PagerDuty sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.