![]() |
VOOZH | about |
As businesses and other organizations increasingly depend on internet-based services, developers and sysadmins are focusing their attention on creating reliable infrastructure that minimizes costly downtime.
A website, API, or other service being unavailable can have significant monetary costs resulting from lost sales. Additionally, downtime can lead to:
The modern field that has coalesced to address these issues is called Site Reliability Engineering or SRE. SRE was created at Google starting in 2003, and the strategies they developed were gathered into a book titled Site Reliability Engineering. Reading up on this field is a good way to explore techniques and best practices for minimizing downtime.
In this article, we will discuss three areas where improvements could lead to less downtime for your organization. These areas are monitoring and alerting, software deployments, and high availability.
This is not an exhaustive list of strategies, but is intended to point you toward some common solutions you should consider when improving the production readiness of your services.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Senior Technical Writer at DigitalOcean
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
It’s always quite difficult to manage time. Everyone has a sense of time. I think the only thing that can help is digital technology, because that’s what people do.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.