VOOZH about

URL: https://thenewstack.io/ai-powered-automation-is-critical-to-it-resilience-and-adaptability/

⇱ AI-Powered Automation Is Critical to IT Resilience and Adaptability - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-03-28 07:45:32
AI-Powered Automation Is Critical to IT Resilience and Adaptability
contributed,sponsor-pagerduty,sponsored,sponsored-post-contributed,
AI / Observability

AI-Powered Automation Is Critical to IT Resilience and Adaptability

Organizations able to harness AI, ML and automation will unleash the talent on their incident response teamswhile improving IT resilience and adaptiveness.
Mar 28th, 2022 7:45am by Heath Newburn
👁 Featued image for: AI-Powered Automation Is Critical to IT Resilience and Adaptability
Photo by Michael Dziedzic on Unsplash
PagerDuty sponsored this post.
Heath Newburn
Heath is the senior solutions specialist for AIOps at PagerDuty. He has a long background in monitoring, event management and operations in many organizations and is focused on enabling the personal success of individuals and teams across IT. Heath lives in Georgetown, Tex., and is passionate about cooking and finding great Texas barbecue.

The modern world runs on code, and with every company now a software company, it’s become more important than ever to move quickly when things go wrong. That’s why incident response has become such a critical endeavor for organizations.

Unfortunately, traditional manual approaches are riddled with inefficiency. This leads to excessive mean time to repair (MTTR), which damages not only customer loyalty and the bottom line, but also employee morale.

Fortunately, leaning on automation and machine learning (ML) capabilities can help organizations plot a better path. Teams are looking to reduce repetitive work and human error, optimize responder productivity and drive all-around better outcomes as they adopt automated incident response.

In order to take advantage of this trend and build a culture of resilience, teams must look for opportunities to improve and upgrade manual operational processes with technology that can remove toil, save human cycles and give them an edge.

How Manual Processes Affect Resilience

Many organizations have accelerated their digital transformation plans, in some cases by several years. But we’ve learned that running fast can break things, and it’s not uncommon for greater velocity to also introduce more exposure to operational risk.

The infrastructure supporting new digital services could contain hundreds of millions of lines of code and billions of dependencies, so digital incidents are inevitable. Research shows that there was a 19% rise in critical incidents from 2019 to 2020.

To keep up with the pace of innovation required to maintain high availability and deliver on customer experience, organizations need to invest in best practices and develop robust processes to streamline incident response to proactively address and resolve issues when they arise.

Infrastructure and operations won’t magically attain the adaptive resilience Gartner talks about with current manual and reactive incident response.

Looking for Opportunities to Harness Automation in Incident Response

In many organizations, the tools, scripts and manual commands that responders use to get to the bottom of incidents exist in the heads of just a few subject matter experts (SMEs). They may also require manual intervention. This does not make for rapid or effective incident response. All too often, organizations waste previous resources by swarming the problem with maybe dozens of responders. This won’t fix the underlying issue.

Manual processes can also lead to copy-and-paste errors, unnecessary repetition of steps, limited collaboration between technical and customer support teams, and use of incorrect documentation. The result is slower MTTR, angry customers and frustrated employees.

Instead, organizations should automate as much of their incident response as possible — driving resilience and enhancing their ability to learn from events, and proactively improve on a continuous basis.

Machine learning-powered runbook automation is a great example. At a very basic level, incident response is all about completing repetitive tasks, such as restarting servers, copying artifacts, running scripts and manipulating files. By intelligently capturing these processes and documenting them into runbooks, they can be automatically executed by responders other than SMEs.

Democratizing incident response in this way could have a major impact on MTTR. First responders spend an average of 15 minutes triaging an alert when it first comes in before escalating to a SME who spends another 15 minutes running diagnostics. But by running automated workflows from the outset, first responders could collect that information straight away and potentially fix recurring problems using automated remediation. If not, they can escalate to the SME with the information they need to start working on fixing the issue immediately.

In the most mature organizations, automation and artificial intelligence (AI) can even be used to remediate commonly occurring incidents before responders are even paged. In this scenario, escalations to SMEs and developers only occur for unusual and complex problems.

PagerDuty is the global leader in AI-first operations management serving more than 35,000 organizations worldwide. The PagerDuty Operations Cloud is a comprehensive, multi-product operations cloud platform that sits at the center of the enterprise technology stack.
Learn More
The latest from PagerDuty
Hear more from our sponsor

Step by Step

This is not an overnight journey. Yes, the right tools will go a long way to achieving these goals, but organizations might also have to overcome cultural barriers, which can take longer. The key is to start small with achievable goals, learning as you go. Organizations need to walk before they can run.

That could mean starting with simple, low-risk automated diagnostics that have no impact on service performance or availability, and which require little processing. With automation that runs commands, gathers log information and tackles other common troubleshooting steps, teams can reduce MTTR and potentially avoid mobilizing some responders if nothing out of the ordinary is discovered.

From there, organizations could move to reflex actions for the most common problems (for example, removing temp files to clear up disk space). Once those simpler problem signatures are codified, they can move to automating multistep sequences for remediating common problems. And only automate complex actions with a potentially major impact on performance or availability after successfully working through those earlier stages.

The bottom line is that machines are faster than humans at some tasks, and they don’t mind work that is boring and repetitive. Organizations able to use this to their advantage through AI, ML and automation will unleash the talent on their incident response teams while improving IT resilience and adaptiveness. That’s the way not only to happier customers and a burnished brand reputation, but more motivated staff with more time to spend on innovation. And in the post-pandemic digital world, innovation will be key to survival.

PagerDuty is the global leader in AI-first operations management serving more than 35,000 organizations worldwide. The PagerDuty Operations Cloud is a comprehensive, multi-product operations cloud platform that sits at the center of the enterprise technology stack.
Learn More
The latest from PagerDuty
Hear more from our sponsor
TRENDING STORIES
Heath Newburn is a Distinguished Field Engineer, and works as a field CTO on strategic evolution for PagerDuty customers. He has a long background in AIOps, monitoring, software development and operations in many organizations and is focused on enabling the...
Read more from Heath Newburn
PagerDuty sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.