Chapter 9 · The Implementation Roadmap

A staged, evidence-driven adoption path: ninety days to first value, twelve months to a new operating model.

9.1 Readiness: what agents need from you

Agents amplify the environment they inherit. Before the first deployment, honestly assess five foundations:

Observability. Centralized logs, metrics, and traces with reasonable coverage. Agents cannot reason over signals that don’t exist.
Access architecture. The ability to mint scoped, short-lived credentials per agent. If everything runs on one admin key today, fix that first.
Source of truth. Infrastructure as code for the surfaces agents will touch, even partially. IaC gives agents a safe change mechanism and you a diffable audit trail.
Documented intent. SLOs, runbooks, architecture notes — imperfect is fine; absent is not. This becomes the agent’s context layer.
An accountable owner. A named senior engineer with the mandate to set autonomy policy and the credibility to bring the on-call rotation along.

9.2 The 90-day pilot

Phase	Weeks	Focus	Exit criteria
Baseline & scope	1–2	Capture MTTR, alert volume, page counts, toil hours. Pick one bounded domain (one product’s incident response, or cloud cost for one account).	Signed baseline; scoped domain; success metrics agreed
Observe (L0–L1)	3–6	Connect telemetry and tools read-only. Agents investigate every incident in parallel with humans; engineers grade the analyses.	≥70% of agent root-cause analyses rated correct or useful by on-call
Approve (L2)	7–10	Agents propose complete remediations with evidence; humans one-click approve. Track acceptance and rollback rates.	≥80% acceptance; zero harmful actions; MTTR visibly improving
Graduate (L3)	11–13	Pre-approve the 5–10 safest, most-repeated action classes. Agents act and notify. Review every action weekly.	First autonomous resolutions in production; documented MTTR delta vs. baseline

Resist the urge to start with the hardest problem. The pilot’s job is to produce evidence and trust, not heroics. A boring domain with frequent, repetitive incidents — Kubernetes restarts, disk pressure, certificate expiry, cost anomalies — generates statistical confidence fastest.

Figure 9 — The 90-day pilot: four phases, each with signed exit criteria before autonomy graduates.

This pilot shape is now vendor-validated practice, not just prudence: AWS’s published adoption guidance for DevOps Agent — one region, one service, recommendation-only for weeks, then measure MTTR before expanding — is this roadmap’s Observe and Approve phases in different words, and Azure’s staged governance controls assume the same progression. If the hyperscalers gate their own agents this way on their own clouds, a bank should not be talked into skipping it.

9.3 Scaling: months 4–12

Expand domains, not just autonomy. Add specialists — database, security, cost — one at a time, each through the same observe → approve → graduate ladder.
Industrialize governance. Move autonomy policy from a document to enforced configuration; stand up the guardian/oversight layer; integrate agent actions into change management with automated evidence.
Build the memory moat. Curate the context layer deliberately: topology, conventions, past incidents, tribal knowledge. This is where your deployment becomes unreasonably effective and un-copyable.
Restructure on-call. As autonomous resolution rates climb, consolidate rotations, redirect reclaimed senior time to prevention engineering, and formalize the agent-operations and autonomy-policy roles.
Report relentlessly. Publish the dashboard monthly — MTTR trend, autonomous resolution rate, pages avoided, dollars saved — to engineering and to the business. Funded programs are measured programs.

9.4 How the canceled 40% die

Gartner predicts more than 40% of agentic AI projects will be canceled by the end of 2027, naming three killers: escalating costs, unclear business value, and inadequate risk controls. In operations specifically, those abstractions take five concrete forms. Each has a known antidote:

The pilot that never graduates. (Unclear value.) Advise-only forever feels safe and proves nothing — then the renewal arrives with no MTTR delta to show. Antidote: graduation criteria signed on day one, honored on schedule.
Autonomy before evidence. (Inadequate risk controls.) One confidently-wrong autonomous action at 3 a.m. costs more trust than a hundred good ones earn. Antidote: never skip ladder steps, and let acceptance and rollback rates — not enthusiasm — set the pace.
Tool sprawl without orchestration. (Escalating costs and unclear value.) Five disconnected point agents recreate the swivel-chair problem with extra licenses (the same coordination tax, one layer up, that §10.3 traces across single-cloud agents). Antidote: one orchestrator, one audit trail, one dashboard.
Unbounded model spend. (Escalating costs.) Frontier reasoning on every noisy signal erases the ROI before the first renewal. Antidote: two-tier sensing and per-incident cost tracking from day one.
Treating it as a tool purchase. (All three.) The experiment-to-production gap from Chapter 1 is an operating-model gap, not a technology gap. Antidote: budget for the role changes, the policy work, and the trust ladder — not just the license.

9.5 When not to deploy: the honest disqualifiers

Before deploying, you put a readiness baseline in place; this section is its harder companion: the cases where the honest answer is to wait. A book that tells buyers to distrust anyone who cannot say no to them should be able to say it about its own category. Each disqualifier below is a reason to fix something first, not a permanent verdict — but deploying through any of them buys an expensive disappointment.

You have no signal to reason over. If observability is sparse or fragmented — no centralised logs, metrics, or traces across the target domain — the agent has nothing to reason from, and will confidently reason from noise. Fix observability first; an agent amplifies the environment it inherits, and amplifying a blind spot produces a confident blind spot.
Everything runs on one shared admin credential. If you cannot issue scoped, short-lived credentials per agent, you cannot bound an agent’s blast radius or contain a compromised one. Until least-privilege access is real, autonomous action is an unacceptable risk regardless of how good the agent is.
No one owns the autonomy policy. If there is no named senior engineer with the authority to set autonomy policy and the standing to carry the on-call team, the program will stall at advisory or lurch into ungoverned action. The owner is a prerequisite, not a role to fill later.
Change management cannot accommodate machine-initiated change. If your change process has no path for a machine-initiated, human-approved change with an audit trail, agent actions will either bypass governance — unacceptable in a regulated environment — or be blocked entirely. Resolve the process question before, not during, deployment.
The first target is your most critical, least reversible system. Starting on the core path with irreversible actions inverts the trust ladder. If the only available pilot domain is the one where a wrong action is catastrophic and unrecoverable, wait until a bounded, reversible domain is available — or carve one out deliberately. The pilot’s job is evidence, not heroism.

There is also a timing disqualifier that has nothing to do with readiness: if the organisation cannot fund the operating-model change — the role redesign, the policy work, the trust ladder — and is buying only a license, it will land in the canceled 40% of §9.4 no matter how ready its infrastructure is. The technology is not the gating factor. The willingness to run the program as a transformation rather than a tool purchase is.

​9.1 Readiness: what agents need from you

​9.2 The 90-day pilot

​9.3 Scaling: months 4–12

​9.4 How the canceled 40% die

​9.5 When not to deploy: the honest disqualifiers

9.1 Readiness: what agents need from you

9.2 The 90-day pilot

9.3 Scaling: months 4–12

9.4 How the canceled 40% die

9.5 When not to deploy: the honest disqualifiers