VOOZH about

URL: https://thenewstack.io/the-evolution-of-the-site-reliability-engineer-sre/

⇱ The Evolution of the Site Reliability Engineer - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2019-07-23 17:00:12
The Evolution of the Site Reliability Engineer
podcast,the-new-stack-makers,
Cloud Native Ecosystem / Observability / Security

The Evolution of the Site Reliability Engineer

An interview with two site reliability engineers about maintaining uptime with cloud native systems.
Jul 23rd, 2019 5:00pm by Jennifer Riggins
👁 Featued image for: The Evolution of the Site Reliability Engineer
Feature image via Unsplash.


The Evolution of the Site Reliability Engineer

In this episode of The New Stack Makers podcast, we are with two DigitalOcean alumni and co-chairs of SREcon 2020 Americas conference who have led two very different journeys to become one of the most wanted roles in tech — site reliability engineers. As the name suggests, an SRE is someone focused on the reliability of an organization’s most important systems.

Google coined the term “site reliability engineer” in 2003, but it certainly has existed for decades more in different forms — disaster recovery and production testers, for example — as engineers have always tried to keep essential services like healthcare and finance online. The growing demand for SRE came as we went cloud native and needed these engineers to work in production and on operations, with a heavy focus on automation and observability. As systems become increasingly distributed, this is a role that has evolved from just shoring up uptime for a monolith to a relationship broker who has views into organization-wide systems, a knack for problem-solving, and a love of metrics.

Emil Stolarsky is a front-end turned infrastructure engineer who has built scriptable load balancers for Shopify and an internal Kubernetes platform for DigitalOcean and is now writing a book on how the enterprise SRE role can be adapted to smaller orgs. Tammy Bütow began with disaster recovery testing in banking over a decade ago, then went over to Digital Ocean in incident response, before she joined Dropbox for an official SRE role. Finally, in 2017, she joined chaos-as-a-service Gremlin as its principal SRE.

For Bütow, an SRE is focused on the reliability and durability of your systems and their data. This role is focused on the most important parts of those systems that when they break, everyone — from incident management to business management to devs burning out to customer support to the actual customers — feel the pain. Stolarsky added to this that an SRE treats reliability as a first-class feature that needs special attention, tooling, practices and targets.

👁 Image

Stolarsky pointed to Google’s Service Reliability Hierarchy as a good overview of this role, and a good visualization of what the guests said is most important: “SREs are people who can work across the company.” This makes for a different culture fit than most engineering roles, an SRE is someone who is good at communication and also prioritization — and communicating those priorities. But you still need the tech to back up those relationships.

Perhaps you could call site reliability engineering an offshoot of the DevOps movement. It’s definitely an alternative to the usual sysadmin approach to service management that sees development and operations as two distinct teams. SREs straddle both sides of what is a hopefully disappearing barrier, as engineers who spend half their time in operations.

Our guests said that the difference is that an SRE is focused on the external value the company can reliably offer customers, while DevOps is more about internally increasing velocity. However, both roles share principles like continuous learning and failure embracing, reducing silos for more transparency and shared responsibility, and automating to accelerate innovation. Both DevOps and SRE are very tied to business-level objectives.

SRE in some ways has been around since the start of this century, but certainly it’s growing in-demand, but also seems to be democratized and more and more people are starting to identify as already doing it. Stolarsky says it’s because any size organization can benefit from following SRE best practices and service-level objectives.

Certainly more orgs need more SREs! Listen to this podcast to learn more about SRE best practices and what’s needed to become one yourself.

In this edition:

  • 1:58: The difference between small companies and large companies’ SRE.
  • 6:58: How to define SRE, and why do companies need it.
  • 10:14: The differences between SRE and DevOps.
  • 14:14: Basic SRE roles that any company needs.
  • 17:27: Recommended tooling.
  • 23:54: Diversity in SRE, and discussing SRECon Americas 2020

Tooling Mentioned in this Episode:

TRENDING STORIES
Jennifer Riggins is a tech storyteller and journalist, event and panel host. She bridges the gap between business, culture and technology, with her work grounded in the developer experience. She has been a working writer since 2003, and is based...
Read more from Jennifer Riggins
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: LaunchDarkly, Honeycomb.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.