VOOZH about

URL: https://thenewstack.io/incident-response-three-ts-to-rule-them-all/

⇱ Incident Response: Three Ts to Rule Them All - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-08-24 08:56:49
Incident Response: Three Ts to Rule Them All
sponsor-pagerduty,sponsored-post-contributed,
DevOps / Operations / Security

Incident Response: Three Ts to Rule Them All

The best operations platforms can quickly resolve high-impact incidents and elevate continuous learning, in which teams are ahead of issues before they start.
Aug 24th, 2023 8:56am by Debora Cambe
👁 Featued image for: Incident Response: Three Ts to Rule Them All
Image from Chaosamran_Studio on Shutterstock
PagerDuty sponsored this post.

The growing momentum in adopting generative AI is one of the most exciting trends of recent history. But as developers begin producing more code with AI-assisted programming, are your operational processes keeping up?

Incidents will still happen, and the ability to orchestrate real-time incident response is more critical than ever, as digital infrastructures get increasingly complex and customer expectations rise.

Operational excellence is key to effectively managing these macroenvironmental changes, and to do so effectively, it’s imperative to take a pulse check on your organization’s own operational maturity. The Three Ts — teams, techniques and technology — can guide you toward balancing growth with operational efficiency.

PagerDuty is the global leader in AI-first operations management serving more than 35,000 organizations worldwide. The PagerDuty Operations Cloud is a comprehensive, multi-product operations cloud platform that sits at the center of the enterprise technology stack.
Learn More
The latest from PagerDuty
Hear more from our sponsor

Teams

Effective incident response teams are typically structured in three hierarchical levels: command, liaison and operations.

  • Command: At this level, the goal is to coordinate response efforts, while reminding, reviewing and delegating external communications during that period and implementing post-mortem exercises. The incident commander leads this team and can be assisted by a deputy and a scribe, depending on the incident’s scale and complexity. The deputy takes on critical supporting tasks to help the commander stay focused on the incident. The scribe documents the timeline of an incident and ensures that important decisions and data are captured for review.
  • Liaison: During an incident, it’s vital to reach out to both customers, either directly or via public channels, and internal stakeholders, keeping them updated and mobilizing them if needed. And those are the customer liaison and internal liaison responsibilities, respectively.
  • Operations: Subject-matter experts (SMEs) are domain experts or designated owners of a component/service within an organization’s technical ecosystem. These SMEs are the boots-on-the-ground folks working to bring the incident to a close.

Note: This is only a proposed team structure. Different incidents require different needs. For example, during smaller incidents a single person can take on multiple roles. Determine ahead of time what severity of incident requires which people so that incident response teams are right-sized for the scope of an issue.

Techniques

Preparation, clearly defined roles and actions, communication, documentation and learning are key to set up incident response teams for success. Here are techniques to standardize your incident response process while ensuring continual learning:

  • Define what constitutes an incident and a major incident. Use simple, unambiguous language. Define the severity levels of an incident to outline what kind of response should be taken. Tip: If an incident appears to fall between two levels, treat it as if it is higher in severity.
  • Define how and when to mobilize responders. As a best practice, incidents should be created automatically, and ideally, that same automation should be able to resolve them. If processes are still manual, set up a dedicated phone bridge and chat room in advance, with the relevant numbers and connection information documented.
  • Create a postmortem process. Detail the cause of the incident, how it played out and what steps could prevent something similar from happening again. This is a vital part of continuous learning, which can help organizations to iteratively improve incident response.

An important footnote is to practice. The mental shift required between “peacetime” and “wartime” can be challenging for responders. That’s why running fake incidents during “game days” is a good idea. Our long-running “Failure Friday” initiative helps not only to uncover issues that could affect resilience, but also builds stronger team culture by bringing everyone together to share knowledge.

Technology

People and processes are a vital part of any incident response strategy. But so is technology. Organizations should be looking for software designed to manage the entire life cycle of an incident, from alerting to diagnostics and remediation. This way, it’s possible to overcome limits on responder resources, facilitate faster resolution by assigning operational issues and incidents to the right person or teams to address in real time, arm those responsible with the right context about an incident, and resolve incidents without human intervention.

The right tools will:

  1. Keep stakeholders informed while managing higher incident volumes and continuously improving response processes.
  2. Equip the right people in your organization with self-service access to IT operations tasks, resolving requests and incidents while reducing escalations and interruptions.
  3. Leverage machine learning and event-driven automation while grouping alerts, creating event orchestration and speeding up triage.
  4. Break down barriers between customer service and development teams to keep teams and customers in the loop at all times.

The strongest operations platform includes all of the above acting as a single source of truth for urgent, unplanned work. It ingests data from monitoring and observability, DevOps and DataOps tools to detect and diagnose urgent disruption, mobilize a response and automate workflows to improve mean time to resolution (MTTR). Combining automation with machine learning also enables intelligent alert grouping and event orchestration, to reduce noise and further enhance responder productivity.

Minimizing Disruption, Maximizing Brand Value

As digital infrastructures come under increasing strain, a fresh look at incident response helps you enhance your operational maturity. Ultimately, the best operations platforms can quickly resolve high-impact incidents and elevate digital operations to a preventative state of continuous learning, in which teams are ahead of issues before they start. It’s the only way to minimize disruption to customers, employees and brand reputation.

Read the PagerDuty incident response Ops guide for more helpful information to improve your operational processes.

PagerDuty is the global leader in AI-first operations management serving more than 35,000 organizations worldwide. The PagerDuty Operations Cloud is a comprehensive, multi-product operations cloud platform that sits at the center of the enterprise technology stack.
Learn More
The latest from PagerDuty
Hear more from our sponsor
TRENDING STORIES
Débora Cambé is a product marketing manager at PagerDuty supporting the company's Incident Response go-to-market initiatives. Her 10+ years of experience as a marketing professional include working as owned media manager at PlayStation and as social media consultant for Yorn,...
Read more from Debora Cambe
PagerDuty sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.