> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloudthinker.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Continuous Cloud Guardrails with CloudKeepers

> Establish continuous cost, security, and operational guardrails across your multi-cloud infrastructure

## **The Cost of Manual Cloud Operations**

Cloud operations teams spend countless hours on routine maintenance tasks: security audits, cost reviews, compliance checks, and resource cleanup. These tasks are critical but predictable—and they consume 40-60% of engineering capacity without delivering strategic value.

**Traditional approach challenges:**

* **Repetitive manual work**: Weekly security audits take 4-6 hours per account, scaling to 60+ hours for enterprises with 15+ accounts
* **Inconsistency across environments**: Different engineers interpret guidelines differently; compliance gaps appear when expertise concentrates in individuals
* **Reactive firefighting**: Issues surface during crisis moments (compliance audit, cost spike, security breach) rather than being caught proactively
* **False positives everywhere**: Manual scripts flag legitimate backup resources as orphaned, or miss context-aware risks entirely

[CloudKeepers](/guide/infrastructure/cloudkeepers) solves this by establishing **continuous, autonomous guardrails** that run 24/7 across your entire cloud estate, catching issues before they become incidents.

***

## **What Are CloudKeepers?**

CloudKeepers are autonomous pilots that enforce guardrails for **cost, security, and operational health**. They continuously scan your cloud infrastructure on a schedule you define, identify drift and misconfigurations, and surface intelligent recommendations with step-by-step remediation playbooks.

**Two specialized pilots:**

* **[CostOps](/guide/infrastructure/cloudkeepers)**: Identify unused resources, right-sizing opportunities, and cost anomalies with context-aware analysis
* **[SecurityOps](/guide/infrastructure/cloudkeepers)**: Detect IAM misconfigurations, exposed resources, encryption gaps, and compliance risks

Unlike periodic assessments, CloudKeepers are designed for **daily operations**—they catch problems early before they escalate.

***

## **The CloudKeepers Workflow**

<Steps>
  <Step title="Enable keepers">
    Open **CloudKeepers** and click **Enable Your First Keepers**. Select keepers by provider and pillar, then review detection rules and autonomy levels.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/01-onboarding-landing.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=96e8a4bd7a1b639eb3437a4d7335b3b2" alt="CloudKeepers onboarding page with Enable Your First Keepers CTA and value cards" width="4590" height="2764" data-path="images/infrastructure/cloudkeepers/01-onboarding-landing.jpg" />
    </Frame>

    <p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>CloudKeepers onboarding page</p>
  </Step>

  <Step title="Configure keepers and schedules">
    The two-step wizard lets you select keepers, then fine-tune detection rules per keeper. Set autonomy levels (Suggest, Approve, or Autonomous) and configure cron schedules in the **Settings** tab.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/02-setup-wizard.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=0058539cc8bb9ee7c514dd3cb481491b" alt="Two-step setup wizard with keeper selection and per-rule configuration" width="4776" height="2086" data-path="images/infrastructure/cloudkeepers/02-setup-wizard.jpg" />
    </Frame>

    <p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>Select keepers and review detection rules</p>
  </Step>

  <Step title="Review the dashboard">
    Each keeper's **Dashboard** tab shows stat cards for open findings, critical/high count, potential savings, and this week's detections, along with a findings-over-time trend chart.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/03-keeper-dashboard.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=299025a7004c4a91280dd61775178c42" alt="AWS Cost Optimization dashboard with stat cards and findings over time chart" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/03-keeper-dashboard.jpg" />
    </Frame>

    <p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>Keeper dashboard with savings and severity breakdown</p>
  </Step>

  <Step title="Triage findings">
    Switch to the **Findings** tab to see a Kanban board with columns for Pending, In Progress, Implemented, and Ignored. Each card shows estimated savings, effort, and risk severity.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/04-keeper-findings.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=e62b71ccdae9a203917ca363f34bf1fb" alt="Findings Kanban board with pending finding card showing savings and risk" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/04-keeper-findings.jpg" />
    </Frame>

    <p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>Findings Kanban board for triage</p>
  </Step>

  <Step title="Review detection runs">
    The **Runs** tab shows every detection run with status, summary, duration, and findings created — an audit trail to verify keepers are running on schedule.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/05-keeper-runs.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=028371683a439f72d4315896d8e5d402" alt="Runs tab showing completed detection run with summary and findings count" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/05-keeper-runs.jpg" />
    </Frame>

    <p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>Detection run history</p>
  </Step>

  <Step title="Configure detection rules">
    In the **Settings** tab, set the cron schedule and toggle individual detection rules on or off. Each rule describes what it detects and supports per-rule autonomy and threshold configuration.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/06-keeper-settings.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=edaadcf521459188e46c732ffacf4f35" alt="Settings tab with cron schedule and detection rule toggles" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/06-keeper-settings.jpg" />
    </Frame>

    <p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>Schedule and detection rule configuration</p>
  </Step>
</Steps>

***

## **Use Case 1: Proactive Cost Optimization with CostOps**

**Scenario:** Your infrastructure has grown organically over 18 months. You're aware costs are climbing, but pinpointing what's actually unused (vs. reserved for disaster recovery or testing) requires deep investigation. Your CostOps team lacks the bandwidth to do monthly audits.

**CostOps pilot discovers:**

* **Underutilized compute instances**: 8 EC2 instances running at 5-15% average CPU (ideal candidates for right-sizing or shutdown)
* **Orphaned storage**: 12 unattached EBS volumes and snapshots accumulating \$2,400/month
* **Reserved capacity misalignment**: Reserved instances for a deprecated service tier, losing \$8,500/month in discounts
* **NAT gateway inefficiency**: Multi-AZ NAT setup processing minimal traffic, could consolidate to single gateway

**CloudKeepers advantage:** CostOps agents understand that a volume tagged "daily-backup" from yesterday serves a real purpose, while "test-old" from 18 months ago is genuinely orphaned. It distinguishes instances with intentional low CPU (burst-capable) from those over-provisioned.

<Frame>
  <img src="https://mintcdn.com/cloudthinker/0IKJjKZJEIROke98/images/use-cases/continuous-cloud-guardrails/03-cost-optimization-recommendations.jpg?fit=max&auto=format&n=0IKJjKZJEIROke98&q=85&s=f128fe6aef813a4a19a8c59ab7198369" alt="Cost optimization analysis with resource utilization and savings recommendations" width="1676" height="946" data-path="images/use-cases/continuous-cloud-guardrails/03-cost-optimization-recommendations.jpg" />
</Frame>

<p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>Cost optimization recommendations with savings analysis</p>

**Workflow:**

1. **Schedule runs**: CostOps scan runs every Wednesday at 10:00 UTC
2. **Review findings**: Your FinOps team reviews the dashboard each Thursday morning, seeing \$14,200/month in identified savings
3. **Assess impact**: For the EC2 right-sizing recommendation, generate impact guidelines and share with engineering to validate performance assumptions
4. **Save to Plan**: Move high-confidence items (orphaned volumes, NAT consolidation) to [Plan](/guide/infrastructure/plan) for approval and scheduling
5. **Execute and track**: Plan workflows handle approvals, scheduling, and execution with full audit trails

**Time savings:** From 6-8 hours monthly on spreadsheets and console navigation → 30 minutes weekly to review findings and make governance decisions

***

## **Use Case 2: Continuous Security Posture Monitoring with SecurityOps**

**Scenario:** Your organization maintains 12 AWS accounts across dev, staging, and production. Security compliance requires monthly audits, but inconsistent findings (different engineers miss different issues) and no standardized remediation creates gaps. A recent audit found IAM policies that hadn't been reviewed in 8 months.

**SecurityOps pilot discovers:**

* **IAM configuration drift**: 23 IAM users/roles with overly-broad permissions (Developer policy attached when ReadOnlyAccess would suffice)
* **Exposed resources**: 2 S3 buckets with public read access (not intentional); 1 RDS database with public accessibility enabled
* **Encryption gaps**: 15 EBS volumes without encryption; 3 S3 buckets lacking default encryption
* **Access anomalies**: Root account used for day-to-day operations; detected unused service accounts not cleaned up
* **Network exposure**: 4 security groups allowing 0.0.0.0/0 SSH access (high-risk for compute; acceptable for ALBs)

**CloudKeepers advantage:** SecurityOps agents understand operational context. They know HTTP/HTTPS from 0.0.0.0/0 is standard for load balancers but dangerous for databases. They prioritize actual exploitability: a root account access key is critical; a read-only service account is low-risk.

<Frame>
  <img src="https://mintcdn.com/cloudthinker/0IKJjKZJEIROke98/images/use-cases/continuous-cloud-guardrails/02-security-recommendations.jpg?fit=max&auto=format&n=0IKJjKZJEIROke98&q=85&s=59fa0785d0bfbcf833e1209fd32c8e1d" alt="Security audit recommendations with remediation steps" width="1676" height="946" data-path="images/use-cases/continuous-cloud-guardrails/02-security-recommendations.jpg" />
</Frame>

<p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>Security audit recommendations with remediation steps</p>

**Workflow:**

1. **Schedule runs**: SecurityOps scan runs every Friday at 14:00 UTC (before your Monday compliance standup)
2. **Alert on critical findings**: Your security team receives Slack notifications immediately for high-severity items (exposed database, root account in use)
3. **Review full report**: Monday morning, your security team reviews the findings dashboard—23 medium-risk IAM findings, 2 critical exposure risks
4. **Generate playbooks**: For the S3 bucket fix, generate implementation guidelines with AWS CLI commands; distribute to the owning team
5. **Save to Plan**: Move findings requiring multi-team coordination (e.g., "remove root account access key") to Plan for assignment, approval, and tracking
6. **Close findings**: After remediation, mark findings as resolved or ignored to tune alert tuning

**Time savings:** From 4-6 hours weekly on manual IAM reviews, S3 audits, and cross-account checks → 20 minutes to triage alerts and assign remediation tasks

***

## **Integration with Plan for Governance**

CloudKeepers findings begin as **drafts** in your infrastructure view. When you're ready to act, you save them to [Plan](/guide/infrastructure/plan), where they become work items with:

* **Full audit trails**: Every finding, its status, and remediation steps are documented automatically
* **Approvals and assignment**: Route findings to the right teams (security, FinOps, platform engineering) for review and sign-off
* **Execution tracking**: Plan tracks status (pending, approved, in progress, completed) with timestamps and ownership
* **Compliance evidence**: For audits, Plan provides timestamped records of when issues were identified and how they were resolved

This transforms CloudKeepers from an alerting system into a **complete governance platform** where findings are tracked through remediation with full accountability.

***

## **Why CloudKeepers Beat Manual Processes**

| Dimension                 | Manual Audits                       | CloudKeepers                            |
| ------------------------- | ----------------------------------- | --------------------------------------- |
| **Execution frequency**   | Monthly (if lucky)                  | Daily/weekly—continuous guardrails      |
| **Time investment**       | 4-8 hours per session               | 2-5 min setup; 15-30 min weekly review  |
| **Consistency**           | Varies by engineer                  | Identical analysis every run            |
| **Context understanding** | Relies on engineer judgment         | Domain expertise baked into agents      |
| **Scaling with accounts** | Linear growth (4-6 hrs per account) | Constant time regardless of scale       |
| **False positives**       | High (scripts miss context)         | 95% reduction via intelligent filtering |
| **Issue detection time**  | Weeks to discovery                  | Hours to detection                      |
| **Knowledge transfer**    | Lost when experts leave             | Persistent in agent behavior            |
| **Audit evidence**        | Manual documentation                | Automatic comprehensive logs            |

***

## **Getting Started**

1. **Open CloudKeepers** in your workspace
2. **Configure pilots**: Enable CostOps and SecurityOps with your preferred schedules
3. **Set notifications**: Choose channels and severity thresholds
4. **Run your first scan**: Manually trigger a scan to see findings immediately
5. **Review and save**: Save high-impact findings to Plan for team review and remediation tracking

<Frame>
  <img src="https://mintcdn.com/cloudthinker/0IKJjKZJEIROke98/images/use-cases/continuous-cloud-guardrails/01-cloudkeeper-scheduler-setup.jpg?fit=max&auto=format&n=0IKJjKZJEIROke98&q=85&s=56375cdba201e5216a58536dd8c5f055" alt="CloudKeeper scheduler setup interface showing pilot configuration and scheduling options" width="1668" height="1368" data-path="images/use-cases/continuous-cloud-guardrails/01-cloudkeeper-scheduler-setup.jpg" />
</Frame>

<p style={{textAlign: 'center', fontSize: '0.9em', color: '#666', marginTop: '8px'}}>CloudKeeper scheduler setup interface</p>

Your cloud infrastructure can now maintain continuous guardrails autonomously, freeing your team to focus on strategic initiatives instead of operational toil.

***

## What's Next

<CardGroup cols={2}>
  <Card title="CloudKeepers Reference" icon="radar" href="/guide/infrastructure/cloudkeepers">
    Full CloudKeepers documentation — configuration, scheduling, and pilot types
  </Card>

  <Card title="Cost Optimization" icon="piggy-bank" href="/guide/cost-optimization/overview">
    AI-generated cost recommendations with effort, risk, and savings estimates
  </Card>

  <Card title="Security Assessment" icon="clipboard-check" href="/guide/infrastructure/assessment">
    Run a Well-Architected assessment across all 6 pillars with actionable findings
  </Card>

  <Card title="Notifications" icon="bell" href="/guide/notifications">
    Configure how and where CloudKeeper alerts are delivered
  </Card>
</CardGroup>
