Skip to main content
Definitions matter, because “agent” is the most abused word in enterprise software.

3.1 A working definition

DEFINITIONAgentic infrastructure operations (AgenticOps) is an operating model in which autonomous AI agents carry out the core loop of operational work — detecting conditions, analyzing causes, resolving issues, and validating outcomes — across cloud and on-premise infrastructure, under explicit human-defined policy, with humans supervising on the loop rather than executing in the loop.
Unpacking the definition: an agent in this sense is not a chatbot with a runbook, and not a script with an LLM bolted on. A true operations agent has five properties:
  1. Goal-directed. It is given outcomes (“keep checkout latency under 300ms”; “keep monthly cloud spend within budget”), not step-by-step instructions.
  2. Perceptive. It continuously consumes telemetry — metrics, logs, traces, events, configuration state, cost data — rather than waiting to be prompted.
  3. Reasoning. It forms and tests causal hypotheses, weighs alternative remediations, and explains its thinking in language an engineer can audit.
  4. Tool-using. It acts through the same interfaces engineers use — cloud APIs, kubectl, Terraform, SQL, CI/CD — with scoped, auditable credentials.
  5. Self-verifying. After acting, it checks whether the intended outcome was achieved, and escalates or rolls back when it wasn’t.

3.2 The autonomy spectrum

Autonomy is not binary. Mature agentic platforms expose autonomy as a policy dial, typically per action class and per environment:
LevelNameAgent behaviorHuman role
L0ObserveMonitors and reports; takes no actionExecutes everything
L1AdviseInvestigates and recommends with evidenceDecides and executes
L2Act with approvalPrepares full remediation; waits for sign-offOne-click approve/reject
L3Act with notificationExecutes pre-approved action classes; informs humansReviews after the fact
L4Autonomous in domainOwns a bounded domain end-to-end within policySets policy; audits outcomes
In practice, organizations run different levels simultaneously: L3–L4 for reversible, low-blast-radius actions (restart a pod, clear a cache, scale a replica set, rotate a credential), L2 for consequential changes (schema migrations, security group changes, failovers), and L1 for anything novel. The art of agentic operations is moving action classes up the ladder as evidence accumulates — never faster.
BIG TECH PRACTICE: THE SPECTRUM IS NOW PRODUCT REALITYThe L0–L4 spectrum is not a theoretical construct — it is how the hyperscalers ship. Google’s Gemini Cloud Assist proactive investigations run at L1 by explicit design (investigate everything, change nothing). AWS’s own adoption guidance for DevOps Agent is to start in recommendation-only mode and measure for weeks before granting action. Azure SRE Agent exposes the dial directly: a Review mode where every action awaits an “Approve” click, and a privileged mode for pre-authorized action classes, governed per tool. When all three clouds independently converge on the same graduated-autonomy posture, that is the industry’s collective answer to how much trust an agent starts with: none — it earns it.
Figure 3 — The autonomy dial: action classes graduate from L0 to L4 on evidence, per environment.

3.3 What agentic operations is not

“Agent washing” is now common enough that Gartner has named it: vendors rebranding assistants, chatbots, and RPA as “agents” without meaningful agentic capability. In mid-2025, Gartner estimated that of the thousands of vendors claiming agentic AI, only around 130 were real. A precise negative definition is therefore a buyer’s best defense:
  1. Not a chatbot over your dashboards. Conversational access to telemetry is a feature, not the model. If a human must read the answer and then go do the work, you are still in Gen 3 — whatever the marketing says.
  2. Not lights-out operations. No credible practitioner advocates removing humans. The target is human leverage: one engineer supervising work that used to take a team.
  3. Not a replacement for engineering discipline. Agents amplify the environment they are given. Weak observability, absent IaC, and undocumented systems produce weak agents. Garbage context in, garbage autonomy out.
  4. Not one giant model that does everything. As the next chapter shows, production systems are converging on orchestrated teams of specialists, not monolithic super-models.
THE FIVE-QUESTION VENDOR TESTAsk any “agentic” vendor:
  1. Can the system execute a remediation end-to-end, or only recommend?
  2. Does it verify its own outcomes and roll back on failure?
  3. Can autonomy be set per action class and per environment?
  4. Does every action carry a full, immutable reasoning trail?
  5. What were its rollback and intervention rates in its last three production deployments?
A real platform answers all five with evidence. Agent washing fails by question two.

3.4 The scope of operational work agents can own today

DomainRepresentative agent tasksTypical autonomy (2026)
Incident responseTriage, correlation, root-cause analysis, remediation, post-incident reportsL1–L3
Cloud cost (FinOps)Rightsizing, idle-resource cleanup, commitment planning, anomaly detectionL2–L4
Kubernetes operationsPod/node health, resource tuning, upgrade assistance, capacity planningL2–L3
Database operationsSlow-query analysis, index advice, replication health, storage forecastingL1–L3
Security operationsMisconfiguration detection, CVE triage, IAM hygiene, compliance evidenceL1–L2
Change & releasePre-deploy risk analysis, canary monitoring, automated rollbackL2–L3
Infrastructure as CodeDrift detection, module generation, plan review, state hygieneL1–L3