![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Being on call can be challenging. If you’ve been woken up at 2 a.m. by an alert, or had to interrupt dinner with a friend to jump on an incident, you know the emotional toll being on call brings. Eliminating all possibility of failure from the system is an impossibility. Even with code freezes, load testing, game days and more, failure can still happen. You might be holding the proverbial pager when it does.
It’s important to focus on making improvements to on-call processes and iterating on them regularly to improve on-call life. In this blog post, we’ll share how you can help yourself and team weather the storm.
Anyone who’s been on call knows that there’s a learning curve before you get used to the way things work. If you’re newer to it or haven’t been on call before, you’re in good company: It’s intimidating! As with anything new, it takes practice and familiarity to ease into it. This is super-important for people new to going on call as well as seasoned on-call engineers who have changed teams or companies. Here are some things you can do to build on-call confidence.
Shadowing is a common technique for training new team members. During a shadowing session, the new teammate will follow an experienced teammate during their on-call shift, usually during business hours. It’s a similar practice to pair programming. After shadowing, the new teammate may or may not feel ready to be in the driver’s seat, so it’s important to allow them to test drive. This is where reverse shadowing comes in. During this time period, the roles are reversed, and the shadow now responds to alerts. If the new teammate finds themself in need of some help, their mentor can jump in.
At PagerDuty, this period of shadowing and reverse shadowing is very common during month two and three of a new engineer’s tenure. This process isn’t just valuable for new hires. It’s also great for those who might be nervous about being on call. You can set up shadowing opportunities within your teams at any time.
Unsure of how to resolve an incident? We’ve all been there. In a recent webinar, engineering manager Dileshni Jayasinghe spoke about how the past 18 months changed the way her team approached triage. Prior to the pandemic, her team could turn in their chairs before needing to kick off an incident to ask one another questions if an alert was triggered.
Now, Jayasinghe says it’s important to err on the side of caution. Trigger incidents first, escalate without shame as soon as you need to and know that your team is there to support you. The time lost debating with yourself whether an incident is bad enough to loop in teammates is crucial. When downtime can cost thousands of dollars per minute, especially during the holidays, acting fast can save revenue and customer satisfaction.
Psychological safety, according to Harvard professor Amy Edmonson, is a belief that the workplace is safe for speaking up. She notes that to be successful, teams need both a commitment to excellence as well as psychological safety to enter the learning zone, which is the best and most productive way to work.
To help build psychological safety, try to focus on empathy and blamelessness. Empathy might look like recognizing how a teammate, or yourself, could have made a mistake and affirming that this is normal. This also requires blamelessness. Rather than naming and shaming someone or calling yourself out for failure, focus on the systemic problems that contributed to this failure. Was there a lack of tooling or documentation? Was the responder tired after a night of being woken up by alerts?
By having empathy and focusing on blamelessness, you can foster psychological safety within your team. This will increase everyone’s confidence in their ability to be on call and remind them that even if you do fail, you can learn and bounce back.
Even the most confident on-call engineer can be shaken when they find that the processes and documentation they’re supposed to use don’t reflect the current state. At regular intervals throughout the year, you should review your processes and documentation to ensure everything is up to date. Here are some of the most important things to check:
With the right processes and documentation, you’ll be better prepared to handle any problems during your on-call shift. This preparation can help you save time when every second means major dollars for your organization. Beyond thinking about the bottom line, there’s an additional and even more important consideration, however.
Keeping in touch with how you’re faring is the most important thing you can do. You’ll need to understand how to disclose qualitatively and quantitatively how your on-call rotation was, and what support you need.
According to a report created from PagerDuty platform data comparing 2019 to 2020, burnout has become an even bigger issue over the past 18 months. We compared the number of off-hour interruptions users experienced and broke them down into 3 categories:
👁 Infographic showing the 3 cohorts of burnout levels: the good, the bad, and the ugly.
The good category had only two interruptions per month. The bad category had seven. The ugly category had 19. These after-hour interruptions accounted for an extra two hours of work per day, an extra 12 weeks of work per year. According to our data, this cohort is also the most likely to leave the PagerDuty platform (our proxy for attrition). It’s everyone’s responsibility to make sure that burnout does not reach this level.
One way to do this is to keep track of how many times you’re interrupted and at what time these interruptions occur. After all, an interruption at 2 p.m. is much less disruptive than one at 2 a.m. Analytics tools can help you understand what your on-call shift looked like so you can bring data to your team and manager. If you’re a manager, you want to look at these metrics and check in to gauge how your team is doing.
Additionally, you can ask your manager about override and day-off policies, which should be detailed in your on-call documentation. Overrides and recovery days aren’t “work perks” for engineers. They’re table stakes for health.
To learn more about excelling at on-call and staying well while you do it, you can check out these resources:
Or see how PagerDuty can help you adopt best practices with a 14-day free trial.