AgenticOps: Quick Wins by Role

AgenticOps in one paragraph

You describe what you need in natural language; specialized AI agents do the work. Mention an agent with @name, optionally add a #tool to shape the output, then write the rest like you’d brief a colleague. The same syntax — @agent #tool [instruction] — works across every domain. No CLI, no scripting, no tool switching.

@alex #dashboard show me EC2 cost breakdown by instance type, last 30 days

Before you start

You need a workspace with at least one cloud connection — agents can’t return real results without one. If you haven’t set that up yet, follow the Setup guide first, then come back.

Pick your role

Each tab is one role, one goal, and one prompt you can paste today. Once you see the shape of the output, the follow-ups take you deeper.

Goal: Stop paying for cloud you don’t use. See where waste lives in under a minute.You need: AWS, Azure, or GCP connection. Agent: @alex.Quick win — find idle and oversized resources:

@alex find EC2 instances with <20% CPU utilization over the last 30 days, plus any unattached EBS volumes and unused Elastic IPs

Alex queries the cloud APIs, joins with utilization metrics, and returns a ranked list with projected monthly savings.Follow up:

@alex #recommend right-sizing for the top 5 by waste
@alex #dashboard cost trend by service for this quarter
@alex draft a reserved-instance plan for the stable workloads above

Goal: Find the highest-risk misconfigurations before an auditor or attacker does.You need: AWS, Azure, or GCP connection. Agent: @oliver.Quick win — surface public exposure on sensitive ports:

@oliver list security groups with 0.0.0.0/0 access on database, SSH, or RDP ports across all regions

Oliver returns the offending rules with resource owner, region, and severity ranking.Follow up:

@oliver #report SOC 2 compliance status with prioritized remediation
@oliver audit IAM policies for privilege-escalation paths
@oliver check for IMDSv1 instances or unencrypted EBS volumes

Goal: Find the queries that are actually hurting you, not the ones you assume are slow.You need: PostgreSQL or MySQL connection. Agent: @tony.Quick win — slowest queries in the last 24 hours:

@tony show the top 10 queries by total time over the last 24 hours on production PostgreSQL, with execution count and P95 latency

Tony pulls from pg_stat_statements (or the equivalent), ranks by impact, and shows where the cost is concentrated.Follow up:

@tony #recommend indexes for the top 3 queries above
@tony explain why query #2 isn't using the existing index
@tony #dashboard query latency P95 trends by endpoint

Goal: See where pods are oversized, undersized, or imbalanced — before the next OOMKill or budget review.You need: Kubernetes connection. Agent: @kai.Quick win — pod resource waste across the cluster:

@kai analyze pod resource usage vs requests across all namespaces, surface the largest over- and under-provisioned workloads

Kai joins requests/limits with actual usage and ranks the deltas by node-cost impact.Follow up:

@kai #recommend HPA policies for the variable workloads above
@kai find nodes with <30% utilization for consolidation
@kai check for pods without resource limits or liveness probes

Goal: Cut the time from alert to root cause. Get a structured investigation, not raw logs.You need: Pulse configured, plus the connections for the systems you operate. Agent: @anna to coordinate.Quick win — investigate an active alert:

@anna investigate the current incident: pull related metrics, recent deploys, and topology, then propose the top 3 likely causes ranked by evidence

Anna delegates to the relevant specialists, gathers evidence in parallel, and returns a hypothesis ladder you can act on.Follow up:

@anna #report draft a postmortem from this conversation
@anna pull the matching runbook and walk through the approval gates
@anna check whether this pattern matches any past incident in memory

Goal: Get a coordinated view across cost, security, performance, and reliability without scheduling four meetings.You need: Connections in place for the domains you want covered. Agent: @anna.Quick win — multi-agent quarterly review:

@anna coordinate a quarterly infrastructure review:
- @alex top cost optimization opportunities and savings
- @oliver security posture and compliance gaps
- @tony database performance hotspots
- @kai Kubernetes utilization and risk
Consolidate into an executive summary with prioritized actions.

Anna delegates, collects, deduplicates, and returns a single executive brief instead of four tabs.Follow up:

@anna #report quarterly business review in slide-deck format
@anna track the open actions from last quarter — what shipped, what slipped
@anna draft a roadmap that aligns cost reduction with reliability work

The syntax in 30 seconds

Every prompt has the same shape:

@agent #tool [instruction]

@agent — which specialist to ask. Anna, Alex, Oliver, Tony, or Kai. Anna is always available; the others activate when their matching connection is added.
#tool (optional) — what shape of output you want.
instruction — write it like a Slack message to a colleague.

Tool commands

Command	Output
`#dashboard`	Interactive visualization with charts
`#report`	Detailed analysis document
`#recommend`	Actionable recommendations
`#alert`	Set up monitoring notifications
`#chart`	Data visualization

Tips that compound

Be specific upfront. “EC2 costs in us-east-1 for the last 30 days” beats “show me costs” — the agent spends less budget guessing what you meant.
Refine, don’t restart. Agents keep conversation context. “Drill into RDS for the items above” works.
Combine tools. Use #dashboard first to see the shape, then #recommend for the action on the same topic.
Let Anna coordinate. Anything that touches more than one domain — start with @anna and name the specialists. One consolidated answer beats five tabs.

The Agentic Loop — pick where to go next

Every CloudThinker module runs the same four-phase loop, continuously: Detect → Analyze → Resolve → Validate Agents detect signals from your environment, analyze them into a hypothesis or plan, resolve the work under your policy, then validate the outcome and feed it back into memory before the next iteration. The loop runs 24/7 — operational coverage is no longer bounded by who is on shift, who is awake, or who saw the alert first. Human-on-the-loop, not in every step. AgenticOps does not remove the human; it changes where the human’s attention is spent. Agents own detection, analysis, and the routine moves of resolution. Humans own judgment — approving changes that carry real risk, intervening on edge cases the policy doesn’t yet cover, and setting the goals that frame what the loop is working on. Every action is auditable, every change is reviewable, and the policy you write is the policy that runs. The four autonomy levels — notify → suggest → approve → autonomous — are set per loop and gated by RBAC. Teams typically begin at suggest or approve for production-facing work, observe how the agent behaves in their environment, then raise the ceiling once a pattern has proven reliable. Over time the same loop demands less of your approval queue: more steps run autonomously, fewer require sign-off, and the human’s role narrows to defining what counts as risky rather than reviewing each individual change. Three module loops are covered end-to-end in the tutorial track. Pick the one that matches what you want to improve next:

Code Review Loop

Analyzes every PR with context from running infrastructure, past incidents, and team conventions. Validates the change with inline comments and patch suggestions. You merge; the agent catches what’s easy to miss. Best when production-bug escapes are the worry.

Cost Loop

Detects spend, utilization, and commitments across AWS, Azure, and GCP. Analyzes waste into remediation plans with projected savings. You approve changes; the agent finds them. Best when the cloud bill is climbing.

Incident Loop

Detects alerts, deploys, and topology shifts. Resolves by ranking hypotheses and running approved runbooks, then validates the fix held. You approve risky steps; the agent does the rest. Best when MTTR is your problem.

​AgenticOps in one paragraph

​Before you start

​Pick your role

​The syntax in 30 seconds

​Tool commands

​Tips that compound

​The Agentic Loop — pick where to go next