VOOZH about

URL: https://thenewstack.io/the-advent-of-automated-observability/

⇱ The Advent of Automated Observability - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-20 10:00:52
The Advent of Automated Observability
contributed,
AI / Observability / Operations

The Advent of Automated Observability

AI may never be a cure-all for observability, but it can certainly be a valuable companion.
Mar 20th, 2024 10:00am by Ozan Unlu
👁 Featued image for: The Advent of Automated Observability
Image via Pixabay.

The cost of downtime is well documented, impacting everything from revenue to productivity to compliance to brand reputation. Over the past year, there have been several examples of major airlines experiencing technical glitches in their customer-facing check-in and electronic ticketing systems, resulting in thousands of canceled and delayed flights. This past April, online discount brokerage Robinhood was slammed with a $10M fine for outages in 2020.

When we look at the headlines, we often see coverage of bigger companies and outages. Often, their response breaks down into two components: increased monitoring and troubleshooting.

  • Monitoring means identifying metrics that are indicative of whether you’re meeting your service level objectives (SLOs), and then relying on human-defined alerting thresholds to fire when metrics are trending outside of expected behavior.
  • Troubleshooting means that when an alert fires, you have to sift through logs looking for a “needle in the haystack” to determine the root cause of the issue. Often, this means relying on “institutional knowledge” — who knows our systems the best, has seen this issue before and knows how to solve it?

Monitoring and troubleshooting, as outlined above, are reactive. You’re dedicating significant manpower hours towards manual tasks. Plus, you have incomplete coverage of anomalies because you’re only alerting on known behaviors. As a byproduct of both the above, you might experience slow resolutions, dependent entirely on (a) whether or not you caught the issue and (b) whether or not you can locate the relevant log data.

There’s a significant problem with this approach. The rare nature of events that can occur in a production environment makes “predicting” them impractical in the traditional sense. In the course of day-to-day life, certain unavoidable casualties and events with a lasting business impact can be impossible to predict. For example, prior to 2020, could anyone have foreseen a once-in-a-lifetime pandemic that would result in a major hit to the U.S. economy?

The longtail of potential errors in application development is analogous to this, and it’s the reason why, in 2024, it’s still so hard to foresee and prevent production outages. In a production environment, many specific issues may happen only once, such that you may never see them happen again, while other types of degradation may occur much more regularly, even daily. It’s impossible to completely understand and predict all the ways things could go wrong in an application development context.

Larger organizations that have built sophisticated observability practices might be able to thrive under these conditions. But what about small and even mid-market organizations that have limited operations resources? And where observability is just one of their many responsibilities? Superior performance (speed and reliability) is critical for anyone who builds revenue-generating software, no matter how big or small.

AI as an Observability “Copilot”

As we noted above, in a production environment, many causes of production outages may only happen once. Smaller teams likely don’t have the resources or foresight to predict every scenario that can cause a system to fail. This is exactly the kind of scenario where AI can help maximize monitoring coverage.

More specifically, AI can be used to baseline data sets and detect anomalies. In this use case, AI algorithms can recognize normal activity across different timeframes — from months to weeks, even down to individual days — and flag when an abnormality crops up. In this way, AI can be valuable in providing proactive signals when an issue may be brewing — without requiring the user to define alert conditions. It can even detect “unknown unknowns,” so engineers don’t have to attempt to predict the future in the form of specific indicators or thresholds.

Another area where AI can help is as a troubleshooting copilot. AI can be used to interpret the log data correlated to an alert. Then generative AI can summarize the behavior and recommend a path to resolution in conversational text. When an anomaly is detected, AI can:

  • Analyze the contents of the logs contributing to the anomaly
  • Communicate the severity of the issue and what it’s impacting
  • Summarize the negative behavior in a conversational text
  • Provide a recommendation on how to resolve the issue

In this way, AI can help organizations move through the troubleshooting process more quickly. It’s almost as if a colleague has investigated the issue for you. It is very powerful when AI can predict and recommend, enabling professionals to decide on remediation.

Today, AI is disrupting many industries — from marketing to retail to legal and more. The common theme across these use case scenarios is that AI is automating a lot of the “heavy lifting,” freeing human beings to focus on their core tasks. Observability is no different, as IT and operations teams will always have more pressing concerns than “build this thing in case something happens.” AI may never be a cure-all for observability, but it can certainly be a valuable companion. It’s “on call” 24/7, so you don’t need to be; it can build and refine alerts on your behalf, and it can locate the data you need to deliver a better user experience for your customers.

TRENDING STORIES
Ozan Unlu is the CEO and Founder of Edge Delta, an edge observability platform. Previously he served as a Senior Solutions Architect at Sumo Logic; a Software Development Lead and Program Manager at Microsoft; and a Data Engineer at Boeing....
Read more from Ozan Unlu
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.