VOOZH about

URL: https://thenewstack.io/tale-of-2-responders-how-automation-can-save-time-and-toil/

⇱ Tale of 2 Responders: How Automation Can Save Time and Toil - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-02-11 06:55:01
Tale of 2 Responders: How Automation Can Save Time and Toil
contributed,sponsor-pagerduty,sponsored,sponsored-post-contributed,
Cloud Services / Observability

Tale of 2 Responders: How Automation Can Save Time and Toil

Platforms with intelligent, automated and centralized event orchestration and noise suppression can greatly reduce time and effort in incident response.
Feb 11th, 2022 6:55am by Sean Scott
👁 Featued image for: Tale of 2 Responders: How Automation Can Save Time and Toil
Feature image via Pixabay
PagerDuty sponsored this post.
Sean Scott
As chief product officer of PagerDuty, Sean is responsible for its multiproduct digital operations management platform. He has more than 20 years of experience in the technology industry, with the majority of that time at Amazon. Sean holds a bachelor’s degree in computer science and an M.B.A. from the Red McCombs School of Business, both from the University of Texas at Austin.

Running an effective digital operations team has never been more critical to long-term business success. Some 96% of customers say they’ll leave a brand after a bad experience.

Yet it’s also becoming harder than ever to stay on top of spiraling incidents and provide the service that customers expect.

It’s not just brand reputation and the bottom line that’s at risk. When incident response is characterized by too many manual processes, interruptions and escalations, team morale suffers, and burnout rates can surge.

This is where AI-powered automation can provide tremendous value to digital operations teams and first responders, intelligently reducing noise and driving more efficient event routing.

The Pressure Is On

Our research shows that nearly three-quarters (72%) of large enterprises are doing more with digital today. But a similar number (78%) are facing extra pressure due to mounting incidents. Nine out of 10 senior IT and development leaders who responded to our survey admit that current ITOps approaches just aren’t cutting it anymore. Teams are spending nearly half of their time each week dealing with incidents rather than innovating for future growth, amounting to a financial hit of over $3 million per company per year. A quarter have lost customers to rival services as a result, and many admit losing money because of incidents.

Mean time to resolve (MTTR) and mean time between incidents (MTBI) have never been more important. Yet siloed incident response systems and an overreliance on manual processes is making life much harder than it should be for digital ops teams. One 2020 study revealed that for 50% of development teams, workloads have increased as a result of disparate event data coming from multiple monitoring tools. Alert noise is another constant in many organizations, distracting and disrupting responders who could be spending their time more productively.

The result in many cases is an increased likelihood of employee burnout. Research indicates that the average team spends 17 hours per week dealing with incidents alone. That can add up to weeks of extra work in the average year.

In the Eye of the Storm

Incident responders like these are at ground zero when events come in. Let’s imagine two responders logging on at 7 a.m. to start their day. They are about to find out that a core dependency failure just started affecting the entire business, triggering incidents and alerts across multiple tools and siloed systems. Confusion is rife among their colleagues, but the clock is ticking. With a global customer base, the company knows that every second lost could have a significant financial and reputational impact.

In short, it’s time to get moving to find out what’s going on and fix it. Here’s how that journey might pan out with and without the right tool sets.

Filtering Out the Noise

Production systems generate a huge number of events. Not all of these are flagged as alerts that indicate that something has gone wrong. But when a major incident such as a core dependency failure hits, there could be hundreds or thousands of alerts triggered by various monitoring and/or event processing systems. Without noise-suppression capabilities, Responder A is bombarded with signals, many of which may be irrelevant or duplicate alerts for the same event.

Now consider Responder B, who works at an identical company and has also logged on at 7 a.m. to find a major dependency failure has occurred. The difference is that they have a range of tools in place to filter out anything irrelevant, nonessential or duplicated. This could include a function to automatically add incoming alerts to relevant open services and group them according to a specific time window. Responder B’s organization may have gone further still with machine learning-powered algorithms capable of looking at patterns in alerts and grouping them accordingly.

Or they may have the ability to manually “pause” flapping incident notifications for a predefined amount of time while they work on the problem. Such capabilities can also be automated by intelligent algorithms, helping to overcome the challenge responders face of interruptions for non-urgent incidents.

PagerDuty is the global leader in AI-first operations management serving more than 35,000 organizations worldwide. The PagerDuty Operations Cloud is a comprehensive, multi-product operations cloud platform that sits at the center of the enterprise technology stack.
Learn More
The latest from PagerDuty
Hear more from our sponsor

The bottom line is noise-suppression technology enables Responder B to work quickly and efficiently, removing the extraneous to focus on what’s important. While they’ve found and remediated the failure and returned to a high-value development project, Responder A is still toiling away under the weight of alerts several hours later.

Event Orchestration from a Single Location

Responder A also is forced to use a variety of manual monitoring tools. The management overhead for these is high. Rule configurations must be maintained, increasing the effort needed to process event data, and that data also needs to be aggregated and orchestrated across multiple siloed systems. During the morning of our critical system failure, they’re forced to waste valuable time manually running health checks, monitoring CPUs and memory caches and other possible root causes. Then they’ll need to take further remedial action to fix it or escalate to someone who can.

However, Responder B has a unified platform to handle all event data and optimize how events are processed. By having already added business logic and contextual rules to process all incoming events, they can trigger automated routing of events to the right teams based on event conditions at scale. And they can automatically trigger diagnostic and remediation actions, such as a server restart or clearing memory caches via runbook automation. These teams can handle most of the commodity, repetitive incidents that occur, only getting developer or engineer subject matter experts (SMEs) involved when escalations are absolutely necessary.

Fast forward several months and a demotivated, burnt-out Responder A has left the company, or worse, the industry, while their employer continues to bleed customers following lengthy service outages. However, with AIOps solutions in the form of an intelligent, automated and centralized event orchestration and noise-suppression platform, Responder B is able to focus more of their time on the projects they care about. Platforms such as PagerDuty can help organizations achieve 44% fewer incidents through noise reduction and event orchestration, freeing up much-needed time and enabling teams to focus more on innovative new products to drive competitive advantage for the business.

PagerDuty is the global leader in AI-first operations management serving more than 35,000 organizations worldwide. The PagerDuty Operations Cloud is a comprehensive, multi-product operations cloud platform that sits at the center of the enterprise technology stack.
Learn More
The latest from PagerDuty
Hear more from our sponsor
TRENDING STORIES
As chief product officer of PagerDuty, Sean Scott is responsible for its multiproduct digital operations management platform. He has more than 20 years of experience in the technology industry, with the majority of that time at Amazon. Sean holds a...
Read more from Sean Scott
PagerDuty sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.