VOOZH about

URL: https://thenewstack.io/who-monitors-ai-agents/

⇱ Who's monitoring the agents? - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-05-24 12:00:00
Who's monitoring the agents?
contributed,
AI Agents / AI Operations / Observability

Who’s monitoring the agents?

Multi-agent AI systems are live in production, but who is monitoring them? Discover the operational gaps in tracking autonomous agents.
May 24th, 2026 12:00pm by Moshe Bar
👁 Featued image for: Who’s monitoring the agents?
Dimitar Donovski for Unsplash

Over the past few months, something quietly shifted. Frameworks like CrewAI, AutoGen, and LangGraph are no longer just showing up in demos—they’re running in production.

Teams are wiring together planners, tool-using agents, retrievers, and external APIs, then handing them real work. Incident response, internal copilots, automation pipelines – it’s all starting to look less like experimentation and more like infrastructure.

And once these systems are live, the problems become obvious very quickly. Not the usual “LLMs hallucinate” problem. Something more operational.

Right now, we’re very good at building agents and not very good at operating them. The frameworks make composition easy, but they stop short of giving you real control once things are running at scale.

And that gap shows up immediately in production.

The uncomfortable reality is that a lot of teams deploying multi-agent systems today are operating them with less visibility than they had for microservices 10 years ago. They’re trusting outputs without fully understanding the path that produced them.

That works for a demo. It doesn’t hold up when these systems start touching real data, real users, and real money.

What actually breaks is the system itself. A request that should take one or two steps turns into dozens of model calls. Agents bounce off each other, retrying, rephrasing, looping just enough to stay functional but not enough to be efficient. Latency creeps up. Costs follow. Nothing crashes, so nothing alerts. You just notice that things feel… off.

“A request that should take one or two steps turns into dozens of model calls. Nothing crashes, so nothing alerts. You just notice that things feel… off.”

Or worse, everything appears to work, but the answer is subtly wrong. One agent times out, another compensates, a third fills in gaps with partial context. By the time you see the output, the failure is buried somewhere deep in a chain of decisions you can’t easily reconstruct.

Then, there is data. Not a single obvious leak, but a gradual propagation. One agent reads something sensitive, another summarizes it, a third includes it in a prompt to an external model. At no point does anything look explicitly dangerous, yet the system as a whole crosses boundaries it shouldn’t.

The common thread here is that nobody really sees what is going on.

Most teams try to bolt on the tools they already have. Logs, traces, maybe some prompt capture. That helps at the edges, but it doesn’t answer the core question: how did the system actually arrive at this outcome?

Agent systems aren’t just distributed systems with more API calls. They behave more like evolving execution graphs, where decisions are made dynamically and paths change depending on intermediate results. Watching individual calls is like looking at a single stack frame and trying to infer the entire program.

“Agent systems aren’t just distributed systems with more API calls. They behave more like evolving execution graphs.”

What is missing is visibility at the level where these systems actually operate.

You need to see how a request unfolds across agents, how deep the reasoning chain goes, where it branches, and where it loops back on itself. You need to understand not just that tokens were consumed, but why they kept growing across steps. And you need to track how data moves – not just where it started, but how it was transformed and where it ultimately ended up.

Without that, you’re left debugging symptoms. A slow response here, a higher bill there, an occasional wrong answer. The underlying behavior remains opaque.

What is especially interesting is that these systems do develop patterns over time. Even though they’re not deterministic, they’re not random either. Certain flows become common, certain depths of reasoning become typical. That baseline is incredibly useful because the real signal is when the system deviates from it. When an agent suddenly takes a path it never took before, or starts accessing data it normally wouldn’t, or expands a reasoning chain far beyond its usual shape.

That’s where monitoring should live – not in static rules, but in understanding the system’s normal behavior well enough to recognize when it drifts.

The question isn’t whether agents need monitoring. It’s whether we’re willing to treat them like the systems they’ve already become.

Right now, most aren’t and that needs fixing.

TRENDING STORIES
Moshe Bar is a serial entrepreneur. He was previously co-founder of Qumranet (sold to Red Hat) which created the industry standard KVM hypervisor, which today powers nearly all cloud offerings. He also co-founded software company XenSource, the makers of the...
Read more from Moshe Bar
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: CrewAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.