> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloudthinker.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Infrastructure Analytics

> Analyze infrastructure performance, cost, and reliability trends across all connected cloud environments

[Infrastructure Analytics](/guide/infrastructure/analytics) gives you a unified view of health, performance, and cost signals across all connected clouds — correlating data from CloudWatch, Kubernetes metrics, database performance, and cost telemetry into a single operational picture.

***

## The Problem With Siloed Infrastructure Data

Modern cloud infrastructure generates data from dozens of sources: CloudWatch metrics for AWS, Azure Monitor for Azure, Prometheus for Kubernetes, RDS Performance Insights for databases, Datadog or Grafana for APM. Each tool has its own query language, its own dashboard format, and its own access model.

The result: answering a question like "Is the performance degradation I'm seeing cost-related or resource-contention-related?" requires pulling data from 3–4 separate tools, correlating timestamps manually, and hoping you're looking at the same time window.

Infrastructure Analytics connects these signals into a coherent picture. Ask agents questions in plain language — they query the right sources, correlate the data, and surface actionable insights.

***

## Key Dashboards

### Resource Utilization

Track compute, memory, storage, and network utilization across your entire infrastructure:

```bash theme={null}
# Multi-cloud utilization overview
@alex #dashboard resource utilization across all accounts last 7 days

# Per-service breakdown
@alex #dashboard EC2 CPU and memory utilization by service tag

# Kubernetes workload efficiency
@kai #dashboard pod resource requests vs actual usage by namespace
```

**What you can see:**

* CPU/memory utilization trends per service, region, and account
* Overprovisioned vs underprovisioned resources at a glance
* Peak vs average utilization (P95, P99) for capacity planning

***

### Performance Health

Monitor application performance signals correlated with infrastructure state:

```bash theme={null}
# API latency correlation with infrastructure
@alex #dashboard API response times vs infrastructure load

# Database performance trends
@tony #dashboard query latency P50/P95/P99 over last 30 days

# Kubernetes cluster performance
@kai #dashboard cluster CPU pressure and OOMKill events by namespace
```

***

### Cost Correlation

Connect infrastructure changes to cost impact:

```bash theme={null}
# Cost vs utilization efficiency
@alex #dashboard cost per unit of utilization across services

# Anomaly detection
@alex identify infrastructure changes correlated with cost spikes

# Waste attribution
@alex #dashboard unused and underutilized resources by team tag
```

***

### Trend Analysis

Understand how your infrastructure evolves over time:

```bash theme={null}
# Growth trends
@alex analyze infrastructure growth patterns over last 6 months

# Capacity forecasting
@anna forecast infrastructure needs for 2x traffic growth

# Efficiency trends
@alex show improvement in resource utilization since last quarter
```

***

## Anomaly Detection

CloudThinker agents continuously monitor for anomalies and surface them automatically:

| Signal                  | What's Detected                                     | Alert Threshold |
| ----------------------- | --------------------------------------------------- | --------------- |
| **Cost spike**          | Spend increase >20% day-over-day                    | Configurable    |
| **CPU pressure**        | Sustained >85% CPU across cluster                   | Configurable    |
| **Memory growth**       | Steady memory growth without release (leak pattern) | >10% per hour   |
| **Latency degradation** | P95 latency increase >2x baseline                   | Configurable    |
| **OOMKills**            | Pod terminated due to memory limit                  | Any occurrence  |
| **Replication lag**     | Database replica falling behind                     | >30 seconds     |

Configure thresholds to match your environment:

```bash theme={null}
# Set a cost anomaly alert
@alex #alert when daily spend exceeds $5,000 or increases >25% day-over-day

# Set a performance alert
@tony #alert when P95 query latency exceeds 500ms for 5 consecutive minutes

# Set a K8s health alert
@kai #alert on OOMKilled events or nodes with >90% memory pressure
```

***

## Infrastructure Insights Dashboard

The built-in Infrastructure Insights dashboard (accessible at **Infrastructure → Analytics**) provides:

<CardGroup cols={2}>
  <Card title="Health Score" icon="heart-pulse">
    Composite health score across compute, network, database, and Kubernetes — updated continuously
  </Card>

  <Card title="Cost Efficiency" icon="piggy-bank">
    Ratio of actual resource utilization to what you're paying for — identifies waste at a glance
  </Card>

  <Card title="Reliability Indicators" icon="shield-check">
    Error rates, availability, recent incidents, and MTTR trends over time
  </Card>

  <Card title="Capacity Headroom" icon="gauge-high">
    How much runway you have before resource constraints impact performance
  </Card>
</CardGroup>

***

## How to Interpret Analytics

**High utilization + high cost** → Appropriately sized, consider reserved capacity purchases

**Low utilization + high cost** → Right-sizing opportunity — talk to `@alex`

**High latency + normal utilization** → Application-layer issue or database bottleneck — talk to `@tony`

**Utilization spikes + OOMKills** → Resource limits misconfigured — talk to `@kai`

**Cost spike without traffic change** → Configuration drift or orphaned resource — talk to `@alex` or check [CloudKeepers](/guide/infrastructure/cloudkeepers) findings

***

## What's Next

<CardGroup cols={2}>
  <Card title="CloudKeepers" icon="radar" href="/guide/infrastructure/cloudkeepers">
    Set up autonomous pilots to detect and alert on anomalies automatically
  </Card>

  <Card title="Cost Analytics" icon="chart-line" href="/guide/cost-optimization/analytics">
    Deep-dive into cloud spend trends and cost attribution
  </Card>

  <Card title="Assessment" icon="clipboard-check" href="/guide/infrastructure/assessment">
    Run a Well-Architected assessment to baseline infrastructure health
  </Card>

  <Card title="Topology" icon="diagram-project" href="/guide/infrastructure/topology">
    Correlate analytics signals with your infrastructure dependency graph
  </Card>
</CardGroup>
