![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Gremlin sponsored this post.
2020 was an interesting year… to say the least. The pandemic changed our lives in ways that will outlast the virus itself. Add to the mix a rise in civil unrest, and I think it’s fair to say that we need a reboot in 2021.
In the world of DevOps, there’s quite a bit to be optimistic about. Funding of technology startups has actually increased over the past year. Digital transformations have been accelerated, as companies across all industries prioritize the online experience in a distributed world. Modern tooling like Slack and Zoom have made it possible for many of us to continue working, to stay in touch with loved ones, and even to be entertained as we are stuck at home.
Reliable technology has played a critical role in helping maintain a sense of normalcy and connectedness.
And so I wanted to get a panel together that consisted of some of the premier thought leaders in the space. These founders and executives are on the frontlines building solutions that help companies modernize, solve problems, and become more resilient.
Watch the full video below:
Yes, it’s true that Amazon can lose millions of dollars if they are down for even a few minutes and that Robinhood might lose countless users each time they crash during a major market movement. But for startups, even if they aren’t losing millions of dollars or hundreds of customers, the relative impact on their business can actually be much greater. Losing even a single big customer for a startup can mean losing a significant chunk of revenue. So while big companies make for big headlines, startups can feel the pain of major outages just as much — if not more.
Creating a culture that accepts failure and learns from it is a major and important shift for many companies. Too often when something goes wrong within traditional organizations, people that weren’t even there (e.g. management) dole out punishment and blame as the primary response. In modern incident management, blameless postmortems are a way to formally document what went wrong and why, in an effort to better understand the incident and prevent it from happening again. These documents should not only be shared with your team — they should also be shared publicly so that anyone interested can learn from what happened. (Cross-company resilience FTW)
The best way to get software developers to care about the reliability of their applications… is to put them on call! Skin in the game can make a world of difference. If the engineer knows it’s their pager that will fire in the middle of the night or over the holiday break, they are much more likely to write code that stands up.
This is a core promise of DevOps: That the daylight between the code being written, and then who is responsible for that code’s behavior in production, becomes narrower and narrower. When we think of shifting more of the operational burden upfront (i.e. Proactive Ops), we may also think of the cutting-edge discipline of Chaos Engineering. Like a vaccine, it’s important to inject a little failure upfront, on your own terms, in order to build longer-term resilience. And for software developers, resilience often means more than just checking if systems are up or down; it means being able to debug customer-facing issues on the fly, and provide a seamless online experience even when the unexpected happens.
Among the panelists, there was a near-unanimous reaction to the term “AIOps” (eye roll). While machines solving all of our problems make for good headlines, the truth is that the human is still very much needed in attributing value to machine-detected anomalies. You’re also adding another project for your engineers to be concerned about — before they wanted to just improve resilience, but now they have to build and maintain the AI to help with that resilience! Simply adopting the best DevOps/SRE practices will likely get you further, for now.
Lightstep is a sponsor of The New Stack.