> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloudthinker.io/llms.txt
> Use this file to discover all available pages before exploring further.

# CloudKeepers

> Autonomous keepers for cost, security, and performance optimization across AWS, GCP, and Kubernetes.

CloudKeepers are autonomous keepers that enforce guardrails for **cost**, **security**, and **performance** across every connected cloud and Kubernetes cluster. Each keeper combines a cloud provider with an operational pillar — giving you 9 specialized monitors that detect drift, surface remediation playbooks, and alert teams so issues are fixed before they become [Incidents](/guide/incident/overview).

***

## The Problem

Cloud drift is constant. Security groups get opened during incidents and never closed. Dev instances left running over weekends compound into hundreds of dollars of monthly waste. New resources get provisioned without required tags, skip encryption, or use overly permissive IAM roles. Kubernetes pods crash-loop unnoticed while CPU and memory requests stay oversized. Most teams only discover these issues during quarterly audits or, worse, after a security event.

Continuous monitoring currently requires:

* AWS Config rules — powerful but complex to write, maintain, and interpret
* Terraform Sentinel or OPA policies — code-based, developer-only, no plain-language rules
* Cloud Custodian — YAML-based, requires DevOps expertise to maintain
* Manual scheduled audits — infrequent, incomplete, and immediately stale

None of these give you a daily operational picture with plain-language findings, prioritized by impact, with implementation steps attached.

***

## How Existing Tools Compare

| Tool                         | What It Does                              | What's Missing                                                             |
| ---------------------------- | ----------------------------------------- | -------------------------------------------------------------------------- |
| **AWS Config**               | Rules-based configuration drift detection | Complex rule authoring, no AI analysis, no remediation guidance, AWS-only  |
| **Terraform Sentinel / OPA** | Policy-as-code enforcement                | Developer-only, requires code changes to add rules, no AI recommendations  |
| **Cloud Custodian**          | YAML-based cloud governance automation    | Complex setup, no natural-language interface, no prioritization            |
| **Wiz / Orca (CSPM)**        | Cloud security posture management         | Security-only (no cost/performance), expensive, requires dedicated analyst |
| **AWS Trusted Advisor**      | Basic well-architected checks             | \~50 fixed checks, no customization, no daily operational cadence          |

CloudKeepers combines cost, security, and performance guardrails into a single autonomous system — with findings in plain language, prioritized by impact, with remediation playbooks attached.

***

## Keeper Architecture

CloudKeepers organizes monitoring into a **3 × 3 matrix** of providers and pillars:

| Provider       | Cost     | Security | Performance |
| -------------- | -------- | -------- | ----------- |
| **AWS**        | AWS-COST | AWS-SEC  | AWS-PERF    |
| **GCP**        | GCP-COST | GCP-SEC  | GCP-PERF    |
| **Kubernetes** | K8S-COST | K8S-SEC  | K8S-PERF    |

Each keeper is a specialized monitor for one provider + one pillar combination. Enable only the keepers you need — for example, AWS-COST and K8S-SEC — or enable all nine for full coverage.

Each keeper contains multiple **detection rules** (40+ rules total) that you can individually toggle and configure:

* **Cost rules**: idle compute instances, unattached storage, old snapshots, unused static IPs, oversized databases, idle load balancers, over-requested pod resources, and more
* **Security rules**: public S3 buckets, unused IAM roles, MFA disabled on root, open security groups, secrets in parameter store, and more
* **Performance rules**: RDS connection limits, missing health probes, CrashLooping pods, throttled resources, and more

***

## Autonomy Levels

Every detection rule can operate at one of three autonomy levels:

| Level | Name           | Behavior                                                                                      |
| ----- | -------------- | --------------------------------------------------------------------------------------------- |
| **1** | **Suggest**    | Read-only. The keeper analyzes infrastructure and reports findings. No changes are made.      |
| **2** | **Approve**    | The keeper drafts actions for each finding. You review and approve before anything runs.      |
| **3** | **Autonomous** | The keeper executes approved command types automatically. You are notified after each action. |

Autonomy is configured **per rule**, so you can run most rules in Suggest mode while allowing well-understood cost rules (like cleaning up unattached volumes) to operate autonomously.

***

## What Makes This Different

* **Keepers, not rules**: instead of writing policy code, you enable provider-pillar keepers and configure detection rules — the keepers decide what matters
* **Three pillars**: cost optimization, security monitoring, and performance analysis in a single system
* **Configurable autonomy**: choose Suggest, Approve, or Autonomous per rule — from read-only observation to full auto-remediation
* **Tunable thresholds**: adjust detection sensitivity per rule (e.g., idle CPU threshold, snapshot max age, lookback period)
* **Daily operational cadence**: designed to run daily (configurable cron), not quarterly — catching drift before it compounds
* **Plain-language findings**: each finding explains the risk and impact in business terms, not just a rule name
* **Remediation playbooks attached**: every finding includes impact analysis, before/after estimates, and step-by-step implementation guidance
* **Multi-cloud + Kubernetes**: scans AWS, GCP, and Kubernetes in a single system — not separate tools per provider

***

## Responsibilities

* **Policy enforcement**: apply cost, security, and performance guardrails through specialized keepers for day-to-day operations.
* **Drift detection**: continuously scan for misconfigurations, risky defaults, resource bloat, and performance bottlenecks.
* **Remediation playbooks**: attach implementation steps and automation options to every finding.
* **Alerting**: notify the right channels by severity so teams can triage quickly.

***

## Prerequisites

* At least one cloud account or Kubernetes cluster connected with permissions for read/monitoring and (optionally) remediation.
* [Slack Integration](/guide/slack-integration), Microsoft Teams, or email destinations configured if you want outbound alerts in addition to in-app [Notifications](/guide/notifications).
* Optional: tags or filters ready if you plan to scope findings to specific environments.

***

## Quick Start

<Steps>
  <Step title="Open CloudKeepers">
    Go to **CloudKeepers** to see the onboarding view. It walks you through three steps: connect a cloud account, enable keepers, and run your first detection scan. Click **Enable Your First Keepers** to begin.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/01-onboarding-landing.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=96e8a4bd7a1b639eb3437a4d7335b3b2" alt="CloudKeepers onboarding page with Enable Your First Keepers CTA, three-step how-it-works timeline, and cost, security, and performance value cards" width="4590" height="2764" data-path="images/infrastructure/cloudkeepers/01-onboarding-landing.jpg" />
    </Frame>
  </Step>

  <Step title="Select and configure keepers">
    The setup wizard has two steps. In **Select Keepers**, choose which keepers to activate — filter by provider (AWS, Kubernetes) or pillar (Cost, Security, Performance). In **Review & Configure**, fine-tune detection rules per keeper, set the autonomy level (Suggest, Approve, or Autonomous), and adjust which rules are enabled.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/02-setup-wizard.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=0058539cc8bb9ee7c514dd3cb481491b" alt="Two-step setup wizard showing keeper selection grid on the left and per-keeper rule review with autonomy level toggles on the right" width="4776" height="2086" data-path="images/infrastructure/cloudkeepers/02-setup-wizard.jpg" />
    </Frame>
  </Step>

  <Step title="Review the dashboard">
    Once keepers are enabled, select one from the sidebar to see its **Dashboard** tab. Four stat cards — **Open Findings**, **Critical & High**, **Potential Savings**, and **This Week** — give you a quick pulse. The **Findings Over Time** chart breaks down trends by severity.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/03-keeper-dashboard.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=299025a7004c4a91280dd61775178c42" alt="AWS Cost Optimization dashboard with stat cards for open findings, critical and high, potential savings, and this week count, plus a findings over time chart" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/03-keeper-dashboard.jpg" />
    </Frame>
  </Step>

  <Step title="Triage findings">
    Switch to the **Findings** tab to see a Kanban board with columns for **Pending**, **In Progress**, **Implemented**, and **Ignored**. Each finding card shows the title, estimated savings, effort level, and risk severity. Click a card to drill into details, or drag it between columns to update its status.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/04-keeper-findings.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=e62b71ccdae9a203917ca363f34bf1fb" alt="Findings Kanban board with a pending finding card showing 30 unattached EBS volumes, $55.20 savings, effort low, risk medium" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/04-keeper-findings.jpg" />
    </Frame>
  </Step>

  <Step title="Review detection runs">
    The **Runs** tab shows every detection run with its status, summary, duration, and how many findings were created or updated. Use this as an audit trail to verify keepers are running on schedule.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/05-keeper-runs.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=028371683a439f72d4315896d8e5d402" alt="Runs tab showing a completed detection run with 30 detections from 6 rules, 57-second duration, and 1 new finding" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/05-keeper-runs.jpg" />
    </Frame>
  </Step>

  <Step title="Configure keeper settings">
    In the **Settings** tab, set the cron schedule (default: daily at 07:00 UTC), and toggle individual detection rules on or off. Each rule shows a description of what it detects and supports per-rule autonomy and threshold configuration.

    <Frame>
      <img src="https://mintcdn.com/cloudthinker/OJRahgLXUPDmURsx/images/infrastructure/cloudkeepers/06-keeper-settings.jpg?fit=max&auto=format&n=OJRahgLXUPDmURsx&q=85&s=edaadcf521459188e46c732ffacf4f35" alt="Settings tab showing cron schedule editor and a list of 10 detection rules with toggle switches for idle compute, unattached storage, old snapshots, and more" width="4572" height="2766" data-path="images/infrastructure/cloudkeepers/06-keeper-settings.jpg" />
    </Frame>
  </Step>
</Steps>

***

## How enforcement and drift detection work

* Keepers run on the cron schedules you set (default: daily at 7 AM UTC) or on-demand to scan all permitted resources for cost, security, and performance risk — not limited to what you previously discovered.
* Each detection run produces an audit trail in the **Runs** tab showing status, timing, findings created/updated/closed, and any errors.
* Findings are tagged with pillar, severity (Low / Medium / High), effort, and estimated savings to prioritize the highest-value fixes.
* Findings start as drafts; **promote** them to active recommendations, then save to [Plan](/guide/infrastructure/plan) when you are ready for approvals, scheduling, and execution tracking.

### Finding statuses

Findings move through a Kanban workflow:

| Status           | Meaning                                         |
| ---------------- | ----------------------------------------------- |
| **New**          | Just detected — awaiting triage                 |
| **Acknowledged** | Team is aware, not yet acting                   |
| **Active**       | Remediation in progress                         |
| **Resolved**     | Fix implemented and verified                    |
| **Dismissed**    | Intentionally skipped — keeper will not re-flag |

<Note>
  CloudKeepers is your daily operational guardrail.
  [Assessment](/guide/infrastructure/assessment) is a deeper, periodic
  evaluation and is not meant for day-to-day runs.
</Note>

***

## Keeper settings

Each keeper has a dedicated **Settings** tab where you can:

* **Schedule**: set a cron expression for automated runs (minimum 1-hour interval).
* **Detection rules**: toggle individual rules, adjust their autonomy level (Suggest / Approve / Autonomous), and configure per-rule thresholds (e.g., idle CPU %, lookback days, snapshot max age).
* **Commands & permissions**: manage which cloud commands each rule is allowed to execute, with per-command effects (Allow / Require Approval / Deny).
* **Notifications**: configure Email, Slack, and Teams channels with per-channel minimum severity thresholds.

***

## Remediation playbooks

* Every finding includes an impact analysis with before/after estimates and a step-by-step playbook.
* Use **Impact Analytics** for deeper analysis, **Generate Guidelines** for shareable runbooks, **Custom Prompt** to explore edge cases, or **Implement** to execute changes.
* Track status and outcomes in [Plan](/guide/infrastructure/plan) so governance, FinOps, and security teams share the same source of truth.

***

## Alerting and routing

* Set per-channel minimum severities to keep noise low while still surfacing critical issues quickly.
* **Slack**: real-time triage with action links back to CloudThinker.
* **Email**: audit trails with workspace-aware links.
* **Teams**: team-channel delivery with severity filtering.
* In-app [Notifications](/guide/notifications) are always delivered regardless of channel settings.
* Combine alerting with [Plan](/guide/infrastructure/plan) workflows to ensure findings get reviewed, approved, and closed.

***

## What's Next

<CardGroup cols={2}>
  <Card title="Plan" icon="list-check" href="/guide/infrastructure/plan">
    Save findings to Plan for approvals, scheduling, and execution tracking
  </Card>

  <Card title="Assessment" icon="clipboard-check" href="/guide/infrastructure/assessment">
    Run deeper periodic Well-Architected assessments alongside daily CloudKeepers runs
  </Card>

  <Card title="Slack Integration" icon="slack" href="/guide/slack-integration">
    Route CloudKeepers alerts to Slack channels for real-time triage
  </Card>

  <Card title="Recurring Tasks" icon="calendar-check" href="/guide/recurring-tasks">
    Schedule additional recurring analysis to complement CloudKeepers
  </Card>
</CardGroup>
