Chapter 2 · From Automation to Autonomy

Agentic operations is the fourth generation of a forty-year evolution. Understanding the lineage explains both its power and its prerequisites.

Figure 2 — Four generations of operations. Each absorbs the last; Gen 4 closes the loop.

2.1 Four generations of operations

Generation	Era	Core idea	Limitation
Gen 1 · Manual + scripts	1990s–2010	Humans operate; shell scripts handle repetitive steps	Everything waits on a person; tribal knowledge
Gen 2 · Infrastructure as Code	2010–2018	Declarative desired state; CI/CD pipelines; config management	Automates provisioning, not operations; drift and day-2 still manual
Gen 3 · AIOps	2017–2024	ML for anomaly detection, event correlation, noise reduction	Detects and correlates but does not decide or act; “so what?” gap
Gen 4 · Agentic Operations	2024–	Goal-directed agents that perceive, reason, act, and verify	Requires trust architecture, governance, and new operating models

Each generation absorbed the previous one rather than replacing it. Agentic operations runs on top of IaC (agents express changes as code), consumes AIOps-style signals (correlated events are agent input), and still produces scripts (agents write and execute them). What changes is who closes the loop.

2.2 Why AIOps fell short

AIOps deserves credit: event correlation and deduplication genuinely work, and intelligent correlation can eliminate 80–90% of raw alert volume. But the category over-promised. Gartner went as far as reframing the “AIOps Platforms” market as “Event Intelligence Solutions” in 2025, citing vendor overuse of the term and widespread disillusionment among I&O leaders. The technology persists — but the market itself acknowledged the gap between detecting an incident and resolving one. Three specific shortfalls defined the AIOps ceiling:

Correlation without causation. Grouping fifty alerts into one incident is useful; it still doesn’t tell you the root cause or what to do.
Black-box outputs. A majority of IT professionals report struggling to interpret ML outputs from deployed AIOps platforms. Conclusions without reasoning don’t earn trust.
No hands. Classical AIOps could open a ticket or trigger a webhook, but could not investigate, form hypotheses, choose among remediations, execute, and verify the fix. The human remained the actuator.

2.3 What changed: reasoning models, tools, and protocols

Three technical unlocks between 2023 and 2026 made the agentic generation possible.

Frontier reasoning models. Large language models crossed the threshold where they can read logs, configs, and code; form causal hypotheses; and plan multi-step remediations with engineer-level judgment in well-scoped domains.
Tool use and computer use. Models gained reliable function calling — the ability to run CLI commands, query APIs, execute kubectl and Terraform, and read dashboards — turning reasoning into action.
Interoperability standards. The Model Context Protocol (MCP) emerged as the de facto standard for connecting agents to tools and data sources, reaching tens of millions of downloads and a thousand-plus server ecosystem within months — the TCP/IP moment of the agent layer.

Verifiability explains where agents succeed first. Infrastructure operations is a highly verifiable domain: a remediation either restores the SLO or it doesn’t; a Terraform plan either applies cleanly or it doesn’t; a health check passes or fails. Domains with crisp feedback loops are exactly where autonomous systems can be deployed with confidence — which is why operations, alongside coding, is leading the agentic wave.

2.4 The vendor signal

The hyperscalers have voted — with shipped products, named customers, and published numbers. AWS DevOps Agent, positioned by AWS as one of its first “frontier agents” (alongside the Security Agent), reached general availability on March 31, 2026, with United Airlines, T-Mobile, and Western Governors University as launch customers; AWS reports preview customers seeing up to 75% lower MTTR, 80% faster investigations, and 94% root-cause accuracy, and WGU describes one production investigation compressed from an estimated two hours to 28 minutes. (All vendor-reported figures from selected pilots — discount accordingly — but they sit at the optimistic end of the 40–70% range independent practitioners report.) Microsoft’s Azure SRE Agent went GA in March 2026 after Microsoft ran it on its own estate at remarkable scale: 1,300+ agents, 35,000+ incidents mitigated, 20,000+ engineering hours saved. Google shipped the same capability more conservatively: Gemini Cloud Assist’s proactive agents autonomously investigate alerts and cost anomalies in the background but, by design, make no changes to the environment. Three clouds, one pattern — every one launched at investigation-first postures with action gated behind customer governance, a public acknowledgment from the largest operators on earth that autonomy must be introduced in stages. The market is moving with them: the AIOps/AI-SRE category is projected to grow from roughly $15 billion today to $36 billion by 2030.

KEY TAKEAWAYAIOps made systems visible and signals intelligible. Agentic operations makes systems operable. The difference is the closed loop: perception to reasoning to action to verification, with humans supervising rather than executing.

​2.1 Four generations of operations

​2.2 Why AIOps fell short

​2.3 What changed: reasoning models, tools, and protocols

​2.4 The vendor signal

2.1 Four generations of operations

2.2 Why AIOps fell short

2.3 What changed: reasoning models, tools, and protocols

2.4 The vendor signal