> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloudthinker.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Infrastructure Analytics

> Monitor infrastructure utilization, performance, and reliability trends across connected cloud environments

Query utilization, performance, and reliability data across all connected cloud environments using agent-led dashboards, charts, and alerts. Cost anomaly analysis and spending trends live in [Cost Analytics](/guide/cost-optimization/analytics).

## Prompt syntax

The general form for an analytics query:

```text theme={null}
@agent #tool your query [time range]
```

| Component    | Description                  | Values                                                                                                      |
| ------------ | ---------------------------- | ----------------------------------------------------------------------------------------------------------- |
| `@agent`     | Who executes the query       | `@alex` (cloud and compute), `@tony` (databases), `@kai` (Kubernetes), `@anna` (forecasts and coordination) |
| `#tool`      | Output format                | `#dashboard` (visual), `#chart` (inline), `#report` (exportable), `#alert` (threshold rule)                 |
| `time range` | Optional; defaults to 7 days | `last 7 days`, `last 30 days`, `last quarter`, `since last quarter`                                         |

See [CloudThinker Language](/guide/language) for the complete syntax reference.

## Dashboards

| Dashboard                      | What it shows                                                            | Agent      |
| ------------------------------ | ------------------------------------------------------------------------ | ---------- |
| Resource utilization           | CPU, memory, storage, and network usage per service, region, and account | Alex, Kai  |
| Kubernetes workload efficiency | Pod resource requests vs. actual usage by namespace                      | Kai        |
| Application performance        | API response times and error rates correlated with infrastructure load   | Alex       |
| Database performance           | Query latency at P50/P95/P99, slow queries, connection counts            | Tony       |
| Cluster health                 | CPU pressure, OOMKill events, and node status                            | Kai        |
| Capacity headroom              | Resource runway before performance is impacted                           | Alex, Anna |
| Utilization trends             | Improvement or regression over a chosen period                           | Alex       |

## Alerts

| Signal              | What is detected                               | Default threshold                |
| ------------------- | ---------------------------------------------- | -------------------------------- |
| CPU pressure        | Sustained high CPU across a cluster            | >85% for a configurable duration |
| Memory growth       | Steady increase without release (leak pattern) | >10% per hour                    |
| Latency degradation | P95 latency rise above baseline                | >2× baseline                     |
| OOMKills            | Pod terminated due to memory limit             | Any occurrence                   |
| Replication lag     | Database replica falling behind primary        | >30 seconds                      |

All thresholds are configurable per environment using `#alert` prompts.

## Signal interpretation

| Pattern                           | Likely cause                             | Next step                                                                        |
| --------------------------------- | ---------------------------------------- | -------------------------------------------------------------------------------- |
| High utilization + normal latency | Appropriately sized workload             | Consider reserved capacity — ask `@alex`                                         |
| Low utilization + high cost       | Overprovisioned resources                | Right-size with `@alex`                                                          |
| High latency + normal utilization | Application or database bottleneck       | Ask `@tony`                                                                      |
| Utilization spikes + OOMKills     | Resource limits misconfigured            | Ask `@kai`                                                                       |
| Cost spike without traffic change | Configuration drift or orphaned resource | Ask `@alex` or check [CloudKeepers](/guide/infrastructure/cloudkeepers) findings |

## Examples

Utilization dashboards:

```text theme={null}
@alex #dashboard resource utilization across all accounts last 7 days
@kai #dashboard pod resource requests vs actual usage by namespace
```

Performance dashboards:

```text theme={null}
@tony #dashboard query latency P50/P95/P99 over last 30 days
@kai #dashboard cluster CPU pressure and OOMKill events by namespace
```

Alert configuration:

```text theme={null}
@tony #alert when P95 query latency exceeds 500ms for 5 consecutive minutes
@kai #alert on OOMKilled events or nodes with more than 90% memory pressure
```

Trends and forecasting:

```text theme={null}
@anna forecast infrastructure needs for 2x traffic growth
@alex show improvement in resource utilization since last quarter
```

## Related

<CardGroup cols={2}>
  <Card title="CloudKeepers" icon="radar" href="/guide/infrastructure/cloudkeepers">
    Set up keepers that continuously monitor infrastructure and surface findings
  </Card>

  <Card title="Cost Analytics" icon="chart-line" href="/guide/cost-optimization/analytics">
    Analyze spending patterns and anomalies across connected accounts
  </Card>

  <Card title="Assessment" icon="clipboard-check" href="/guide/infrastructure/assessment">
    Run a Well-Architected assessment to baseline infrastructure health
  </Card>

  <Card title="Topology" icon="diagram-project" href="/guide/infrastructure/topology">
    Correlate analytics signals with your infrastructure dependency graph
  </Card>
</CardGroup>