![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Operational fatigue, in the face of increasing complexity and risks, is a real problem. Can partnerships with skills-based AI agents offer a solution?
AI has quickly become a trusted collaborator or “copilot” throughout the software development lifecycle. Particularly in the operations space, sysadmins, DevOps, and site reliability engineering (SRE) teams have embraced conversational, prompt-based AI to aid in the still overwhelmingly manual execution of incident response. Generative AI is enabling operations and security teams to shift further from TicketOps.
Until now, the security, compliance, and always-on requirements of most ops teams have left them reluctant to move to the next stage of agentic AI. That could be about to change.
In the face of enterprise IT complexity and sprawl, Phanidhar Koganti, senior distinguished technologist in Hewlett Packard Enterprise (HPE) hybrid cloud, tells The New Stack that ops is entering its “Agentic Era,” where AI agents have specialized knowledge, capabilities, and workflows, referred to as agent skills. These agent fleets work to bridge persistent enterprise data and operational silos and, when expressly permitted and auditable, can take autonomous actions based on goal-oriented reasoning.
“The AI is able to point them in the right direction,” Koganti explains, but then “the human operator has to build trust by verifying.”
In his whitepaper “Copilots to Operators: The Agentic Evolution of Enterprise IT,” Koganti contends that this change must occur with the human operator in the loop, serving as the orchestrator.
HPE is releasing an enterprise-grade, multi-domain agentic operations system, including its agentic operations copilot, now in beta, as part of the OpsRamp IT operations management platform. Expected to go generally available later in 2026, this agentic ops application has, for some early adopters, cut time to root cause by at least half.
Expected to go generally available later in 2026, this agentic ops application has, for some early adopters, cut time to root cause by at least half.
AI, as the amplifier of everything, has only made establishing AI for DevOps more urgent, as always-short-staffed operations teams scramble — and sometimes fail — to keep up with the speed of AI-produced code, and its inherent security risks. AI is likely the solution to this problem, as the data show.
Fewer than half of enterprises believe they are operationally prepared for AI adoption across infrastructure, data, risk, and talent. Which means so much of the success or failure of AI at scale rests on the shoulders of already overworked operations teams.
Respondents of a recent study of cybersecurity and operations leaders find that the most pressing issues (they could select more than one) are:
Osterman Research finds that 40% of alerts in large enterprises are never investigated due to sheer volume, while 73% of organizations experienced outages in 2025 that were directly linked to these ignored or suppressed alerts.
This increases exponentially alongside system complexity.
For the majority of enterprises taking the hybrid or multi-cloud route, a staggering two-thirds lack confidence in real-time threat detection and response capabilities. This technological complexity is a direct driver of emotional exhaustion. While engineers are likely to push through in the short term, it creates a cognitive drag that leads to long-term attrition. These highly specialized ops roles have always been tough to fill, and organizations are losing important shared institutional information.
Beyond employee retention, ops burnout also negatively impacts productivity and incident response time, increasing the likelihood of avoidable mistakes.
All while cybersecurity risks and code-generation speed are way up. It’s more code, more alerts, and simply not enough people.
Agentic AI for DevOps — the application of agentic AI solutions to operational tasks — offers an opportunity to help human operators lighten their workload, reduce alert noise, and dramatically improve response time.
But AI isn’t a silver bullet. Instead of reducing manual triage, many AI tools increase alert noise, which further erodes trust in the technology. A worrying 66% of AI tools are known to generate false positives, which only increases stress and errors. Stale data within models and a lack of transparency in how AI makes decisions are among the reasons for these false positives.
To create transparency across complex, distributed systems, any enterprise-grade operational agentic AI solution must break down cross-organizational data silos. Platform engineering has emerged as the preferred pathway not only to unite disparate data sets but also to establish guardrails and gates for quality, security, and compliance — for both human and agentic developers.
The HPE whitepaper contends that, when it’s done right, agentic operations can:
Results from the HPE beta program for its agentic operations copilot show that AI agents make particularly good partners for root cause analysis, helping overcome blind spots. An ops team simply cannot know every release that happened in an enterprise environment across any given week, while machines don’t sleep and AI is particularly good at pattern recognition, as well as cross-organizational memory.
“During our beta program, a lot of our customers have told us that many issues that happen will typically be related to a change they made four or five days previously,” Koganti says. “They explicitly want us to track the changes they are making and take that as an additional context when agentically root case analyzing a particular issue.”
The whitepaper outlines the planning stages of how an agentic operator investigates across its root cause analysis:
As SREs, DevOps, and sysadmin teams bring important institutional knowledge that is also fed back into the agentic memory, enabling both agents and humans to improve their cross-organizational understanding.
The trick, Koganti argues, is not to apply a general large language model (LLM) to the specifics of enterprise operations. That’s where operational agent skills come in.
“You are not giving it 100% of the details, but you’re giving it high-level guidance on the skeleton. In the operations world, let’s say you get a particular type of alert with a particular symptom, like virtualization issues, then you know you have a knowledge or a skill saying that: For these kinds of alerts related to virtualization, you want to go and look at the CPU utilization in the VM and look at the storage IO with respect to a particular other detail and so on,” Koganti explains. “Providing high-level directional guidance, captured in skills,” is necessary, “because all this agentic stuff, if you leave it 100% to LLMs, they hallucinate anything.”
Agent skills are already popular among developers. HPE is trying to bring it to operations.
“That’s a unique thing, and we believe it’s only a matter of time until the rest of the vendors in the market will also align with that, similar to how Infrastructure as Code was adopted primarily from the developer side of the ecosystem at first,” he continues, as they look to encode curated ops skills past root cause analysis and incident investigation to include specific ones to deal with virtualization and networking.
AI in ops has to work to close the trust gap. For compliance, cybersecurity, and operators’ demands, AI agents must be able to explain and substantiate their thought processes.
With this in mind, HPE’s brand of autonomous operators is being built with an audit trail, reasoning, and observability.
“Operators do get burnt out, especially in high pressure moments when these issues typically happen, and they do make a lot of mistakes, whereas the machine doesn’t miss a piece of data, doesn’t make any mistakes in gathering the right pieces of data, as well as doing a very fast and objective analysis,” says Koganti, on the value of agentic root cause analysis.
However, the HPE team is not going all in on agentic-driven remediation just yet. The AI operations agent will make a suggestion, but it won’t act without permission. Even so, this approach can cut the often-frustrating time to discover the root cause by up to half.
“The actual remediation, which involves, perhaps, touching the particular deployment — let’s say you want to reboot something — is up to the operator. OpsRamp does have the ability to automatically trigger selective fixes,” he continues, “that must be configured by the human. None of our agents will take autonomous actions. It is policy-driven, and that policy will be that it is human-configured.”
As the report contends, by adopting agentic skills, enterprises are beginning to move away from reactive fixes toward the proactive building of systems that fix themselves.
Learn more about HPE’s agentic operations copilot feature in its new whitepaper, “Copilots to Operators: The Agentic Evolution of Enterprise IT.”