VOOZH about

URL: https://thenewstack.io/hpe-agentic-ai-ops-burnout/

⇱ HPE's AI agents cut root cause analysis time in half - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-03-25 07:14:19
HPE's AI agents cut root cause analysis time in half
sponsor-hpe,sponsored,sponsored-post,
AI Agents / AI Operations / Operations

HPE’s AI agents cut root cause analysis time in half

HPE's new agentic AI operations system cuts root cause analysis time in half by using skills-based AI agents that work alongside human operators, not replace them.
Mar 25th, 2026 7:14am by Jennifer Riggins
👁 Featued image for: HPE’s AI agents cut root cause analysis time in half
barsrsind For Unsplash+
HPE sponsored this post.

Operational fatigue, in the face of increasing complexity and risks, is a real problem. Can partnerships with skills-based AI agents offer a solution?

AI has quickly become a trusted collaborator or “copilot” throughout the software development lifecycle. Particularly in the operations space, sysadmins, DevOps, and site reliability engineering (SRE) teams have embraced conversational, prompt-based AI to aid in the still overwhelmingly manual execution of incident response. Generative AI is enabling operations and security teams to shift further from TicketOps.

Until now, the security, compliance, and always-on requirements of most ops teams have left them reluctant to move to the next stage of agentic AI. That could be about to change.

In the face of enterprise IT complexity and sprawl, Phanidhar Koganti, senior distinguished technologist in Hewlett Packard Enterprise (HPE) hybrid cloud, tells The New Stack that ops is entering its “Agentic Era,” where AI agents have specialized knowledge, capabilities, and workflows, referred to as agent skills. These agent fleets work to bridge persistent enterprise data and operational silos and, when expressly permitted and auditable, can take autonomous actions based on goal-oriented reasoning.

“The AI is able to point them in the right direction,” Koganti explains, but then “the human operator has to build trust by verifying.” 

In his whitepaper “Copilots to Operators: The Agentic Evolution of Enterprise IT,” Koganti contends that this change must occur with the human operator in the loop, serving as the orchestrator.

HPE is releasing an enterprise-grade, multi-domain agentic operations system, including its agentic operations copilot, now in beta, as part of the OpsRamp IT operations management platform. Expected to go generally available later in 2026, this agentic ops application has, for some early adopters, cut time to root cause by at least half.

Expected to go generally available later in 2026, this agentic ops application has, for some early adopters, cut time to root cause by at least half.

AI, as the amplifier of everything, has only made establishing AI for DevOps more urgent, as always-short-staffed operations teams scramble — and sometimes fail — to keep up with the speed of AI-produced code, and its inherent security risks. AI is likely the solution to this problem, as the data show.

Pressure is on for ops teams

Fewer than half of enterprises believe they are operationally prepared for AI adoption across infrastructure, data, risk, and talent. Which means so much of the success or failure of AI at scale rests on the shoulders of already overworked operations teams. 

Respondents of a recent study of cybersecurity and operations leaders find that the most pressing issues (they could select more than one) are:

  • Alert fatigue – 76%
  • Burnout and staffing shortages – 73%
  • Manual and time-consuming alert investigations – 64%
  • Tool sprawl and complexity – 59%
  • Evolving threats outpacing detection – 55%

Osterman Research finds that 40% of alerts in large enterprises are never investigated due to sheer volume, while 73% of organizations experienced outages in 2025 that were directly linked to these ignored or suppressed alerts.

This increases exponentially alongside system complexity. 

For the majority of enterprises taking the hybrid or multi-cloud route, a staggering two-thirds lack confidence in real-time threat detection and response capabilities. This technological complexity is a direct driver of emotional exhaustion. While engineers are likely to push through in the short term, it creates a cognitive drag that leads to long-term attrition. These highly specialized ops roles have always been tough to fill, and organizations are losing important shared institutional information. 

Beyond employee retention, ops burnout also negatively impacts productivity and incident response time, increasing the likelihood of avoidable mistakes.

All while cybersecurity risks and code-generation speed are way up. It’s more code, more alerts, and simply not enough people.

Agentic root cause analysis

Agentic AI for DevOps — the application of agentic AI solutions to operational tasks — offers an opportunity to help human operators lighten their workload, reduce alert noise, and dramatically improve response time. 

But AI isn’t a silver bullet. Instead of reducing manual triage, many AI tools increase alert noise, which further erodes trust in the technology. A worrying 66% of AI tools are known to generate false positives, which only increases stress and errors. Stale data within models and a lack of transparency in how AI makes decisions are among the reasons for these false positives.

To create transparency across complex, distributed systems, any enterprise-grade operational agentic AI solution must break down cross-organizational data silos. Platform engineering has emerged as the preferred pathway not only to unite disparate data sets but also to establish guardrails and gates for quality, security, and compliance — for both human and agentic developers.

The HPE whitepaper contends that, when it’s done right, agentic operations can:

  • Overcome ops silos with persona-based explainability 
  • Bridge data silos while reducing data duplication
  • Enable proactive operations with multi-variate predictive analytics, like for adaptive thresholds 
  • Reduce operator burnout 
  • Avoid blind spots 
  • Track changes with auditability 

Results from the HPE beta program for its agentic operations copilot show that AI agents make particularly good partners for root cause analysis, helping overcome blind spots. An ops team simply cannot know every release that happened in an enterprise environment across any given week, while machines don’t sleep and AI is particularly good at pattern recognition, as well as cross-organizational memory. 

“During our beta program, a lot of our customers have told us that many issues that happen will typically be related to a change they made four or five days previously,” Koganti says. “They explicitly want us to track the changes they are making and take that as an additional context when agentically root case analyzing a particular issue.”

The whitepaper outlines the planning stages of how an agentic operator investigates across its root cause analysis:

  • OODA feedback loops – observe, orient, decide, act
  • Hypothesis generation – including extraction of metrics and logs 
  • Agentic skill dispatch – like a “trace analysis skill” can be applied to isolate a faulty microservice, a “metrics analysis skill” can be called upon to identify covariants and deviating patterns
  • Synthesis – the agent presents a narrative, both of what it has found to be the likely culprit, and what it has ruled out

As SREs, DevOps, and sysadmin teams bring important institutional knowledge that is also fed back into the agentic memory, enabling both agents and humans to improve their cross-organizational understanding.

Skills-based AI agents

The trick, Koganti argues, is not to apply a general large language model (LLM) to the specifics of enterprise operations. That’s where operational agent skills come in.  

“You are not giving it 100% of the details, but you’re giving it high-level guidance on the skeleton. In the operations world, let’s say you get a particular type of alert with a particular symptom, like virtualization issues, then you know you have a knowledge or a skill saying that: For these kinds of alerts related to virtualization, you want to go and look at the CPU utilization in the VM and look at the storage IO with respect to a particular other detail and so on,” Koganti explains. “Providing high-level directional guidance, captured in skills,” is necessary, “because all this agentic stuff, if you leave it 100% to LLMs, they hallucinate anything.”

Agent skills are already popular among developers. HPE is trying to bring it to operations. 

“That’s a unique thing, and we believe it’s only a matter of time until the rest of the vendors in the market will also align with that, similar to how Infrastructure as Code was adopted primarily from the developer side of the ecosystem at first,” he continues, as they look to encode curated ops skills past root cause analysis and incident investigation to include specific ones to deal with virtualization and networking.

Agentic auditability is key 

AI in ops has to work to close the trust gap. For compliance, cybersecurity, and operators’ demands, AI agents must be able to explain and substantiate their thought processes.

With this in mind, HPE’s brand of autonomous operators is being built with an audit trail, reasoning, and observability.

Full audit trail

  • Every conversation persists with tenant isolation
  • User attribution per message, who said/did what
  • All API calls are audit-logged through MCP tool invocations within the IT operations platform

Transparent reasoning

  • Hypotheses shown before conclusions
  • A step-by-step plan is visible to the user
  • Sources cited for every insight
  • Tool calls disclosed with what data was queried

Observability and traceability

  • OpenTelemetry-based agent execution traces
  • Decision path logging — why this agent, why this tool
  • Reproducible evaluations that ensure the same inputs result in the same reasoning path 

“Operators do get burnt out, especially in high pressure moments when these issues typically happen, and they do make a lot of mistakes, whereas the machine doesn’t miss a piece of data, doesn’t make any mistakes in gathering the right pieces of data, as well as doing a very fast and objective analysis,” says Koganti, on the value of agentic root cause analysis.

However, the HPE team is not going all in on agentic-driven remediation just yet. The AI operations agent will make a suggestion, but it won’t act without permission. Even so, this approach can cut the often-frustrating time to discover the root cause by up to half.

“The actual remediation, which involves, perhaps, touching the particular deployment — let’s say you want to reboot something — is up to the operator. OpsRamp does have the ability to automatically trigger selective fixes,” he continues, “that must be configured by the human. None of our agents will take autonomous actions. It is policy-driven, and that policy will be that it is human-configured.”

As the report contends, by adopting agentic skills, enterprises are beginning to move away from reactive fixes toward the proactive building of systems that fix themselves.

Learn more about HPE’s agentic operations copilot feature in its new whitepaper, “Copilots to Operators: The Agentic Evolution of Enterprise IT.

HPE Software, powered by HPE GreenLake, delivers a unified hybrid cloud platform experience that allows enterprises to simplify IT, reduce costs, and accelerate transformation with automated provisioning, unified observability, and data protection across hybrid and multi-vendor environments.
Learn More
The latest from HPE
TRENDING STORIES
Jennifer Riggins is a tech storyteller and journalist, event and panel host. She bridges the gap between business, culture and technology, with her work grounded in the developer experience. She has been a working writer since 2003, and is based...
Read more from Jennifer Riggins
HPE sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Enable.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.