> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloudthinker.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Topology

> Visualize cloud infrastructure topology maps — trace dependencies, identify single points of failure, and accelerate incident investigation

The [Topology Explorer](/guide/infrastructure/topology) provides interactive visualization of your cloud infrastructure and service relationships. Build topology maps manually, let agents discover them, or import from Infrastructure as Code.

<Frame>
  <img src="https://mintcdn.com/cloudthinker/M-utUm-TaqDSbEEK/images/infrastructure/topology-explorer.jpg?fit=max&auto=format&n=M-utUm-TaqDSbEEK&q=85&s=b931c4d83fd7741964ff90dca6583bcc" alt="Topology Explorer" width="3578" height="2010" data-path="images/infrastructure/topology-explorer.jpg" />
</Frame>

***

## The Problem

Modern cloud applications are webs of dependencies — load balancers, auto-scaling groups, databases, caches, queues, Lambda functions, Kubernetes services — all connected. When something breaks, the critical question is: *what depends on what?*

Without a topology map, incident investigation looks like this: an alert fires, engineers start checking every service manually, nobody knows the blast radius, and MTTR stretches to hours while teams debate which service is the actual origin.

Architecture diagrams in Confluence are outdated within days of being created. Nobody has time to maintain them. And when an audit asks for data flow documentation or an engineer asks "what breaks if this RDS instance goes down?", the honest answer is: *nobody knows for certain*.

***

## How Existing Tools Compare

| Tool                      | What It Does                          | What's Missing                                                                                                  |
| ------------------------- | ------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| **AWS X-Ray Service Map** | Live request tracing between services | Requires X-Ray instrumentation in every service; AWS-only; shows request flows, not infrastructure dependencies |
| **Datadog Service Map**   | Application dependency visualization  | Requires full Datadog agent deployment; application-layer only, not infrastructure layer                        |
| **Dynatrace SmartScape**  | Automatic dependency discovery        | Proprietary, expensive, requires Dynatrace agents everywhere                                                    |
| **Cloudcraft**            | Cloud architecture diagramming        | Manual drawing tool — not connected to live infrastructure; goes stale immediately                              |
| **Lucidchart / Draw\.io** | General diagramming                   | Static diagrams with no live infrastructure connection                                                          |

CloudThinker Topology is the only tool that: (1) auto-discovers dependencies from live infrastructure without requiring code instrumentation, (2) integrates directly with AI agents for instant RCA, and (3) works across AWS, GCP, and Azure in a unified view.

***

## What Makes This Different

* **Live discovery**: agents discover topology from AWS/GCP/Azure APIs, Terraform state, or CloudFormation — no instrumentation required
* **RCA-integrated**: when an incident occurs, agents use topology to trace the impact path and identify the origin service automatically
* **Multi-cloud**: a single topology view spanning AWS, Azure, GCP, and Kubernetes — not separate maps per provider
* **Dynamic**: continuously synced as infrastructure changes, not a static diagram that goes stale

***

## Overview

Topology maps help you:

* **Visualize relationships** between cloud resources
* **Understand dependencies** across services
* **Support incident response** with visual context
* **Enable [root cause analysis (RCA)](/guide/incident/root-cause-analysis)** by tracing connections
* **Document architecture** for team knowledge sharing

***

## Building Topology

<Tabs>
  <Tab title="Agent-Led Discovery">
    Let CloudThinker agents automatically discover and map your infrastructure.

    ```bash theme={null}
    @alex discover and map infrastructure topology for production
    @kai map Kubernetes service dependencies
    @alex build topology from AWS account including all VPCs
    ```

    **Benefits:**

    * Automatic resource discovery
    * Real-time relationship mapping
    * Continuous sync with infrastructure changes
  </Tab>

  <Tab title="Import from IaC">
    Import topology from your Infrastructure as Code sources.

    **Supported sources:**

    * **Terraform State** - Import from `.tfstate` files
    * **CloudFormation** - Import from stack templates
    * **Pulumi** - Import from state files

    <Steps>
      <Step title="Navigate to Topology">
        Go to **Infrastructure → Topology**
      </Step>

      <Step title="Click Import">
        Select **New View → Import from IaC**
      </Step>

      <Step title="Select Source">
        Choose Terraform State, CloudFormation, or other supported format
      </Step>

      <Step title="Upload or Connect">
        Upload state file or connect to remote state backend
      </Step>
    </Steps>
  </Tab>

  <Tab title="Manual Builder">
    Build topology maps manually for custom visualizations.

    <Steps>
      <Step title="Create New View">
        Click **New View** in the Topology Explorer
      </Step>

      <Step title="Add Resources">
        Drag resources from the left panel onto the canvas
      </Step>

      <Step title="Draw Connections">
        Click and drag between resources to create relationships
      </Step>

      <Step title="Save View">
        Name and save your topology view
      </Step>
    </Steps>
  </Tab>
</Tabs>

***

## Resource Types

The Topology Explorer supports all major cloud resource types:

| Category       | Resources                                    |
| -------------- | -------------------------------------------- |
| **Compute**    | EC2, Lambda, ECS, EKS, VMs, Cloud Run        |
| **Networking** | VPC, Load Balancers, CloudFront, API Gateway |
| **Database**   | RDS, Aurora, DynamoDB, Cloud SQL             |
| **Storage**    | S3, EFS, EBS, Cloud Storage                  |
| **Security**   | IAM Roles, Security Groups, ACM Certificates |
| **Kubernetes** | Clusters, Deployments, Services, Pods        |

***

## Using Topology for Incident Investigation

Topology maps are invaluable during incidents:

### Root Cause Analysis (RCA)

```bash theme={null}
@alex use topology to trace the impact of RDS outage
@kai show all services affected by the failing pod
@anna coordinate incident response using infrastructure map
```

### Impact Analysis

Visualize blast radius and affected services:

```bash theme={null}
@alex show downstream dependencies of payment-service
@kai map all services connected to the database cluster
@oliver identify security exposure paths in topology
```

### Real-Time Status

During incidents, topology shows:

* **Health status** of each resource
* **Connection states** between services
* **Error propagation** paths
* **Recovery progress** visualization

***

## Views and Filters

### Load View

Access saved topology views from the **Load View** dropdown.

### Filter Resources

Use the search and filter panel to:

* Search by resource name or ID
* Filter by resource type (EC2, RDS, EKS, etc.)
* Filter by tags or metadata
* Show/hide resource categories

### Sync Status

The **Synced** indicator shows when topology was last updated from your infrastructure.

***

## Agent Integration

Agents use topology for enhanced analysis:

| Agent                              | Topology Usage                                            |
| ---------------------------------- | --------------------------------------------------------- |
| **[Alex](/guide/agents/alex)**     | Cost impact visualization, resource optimization paths    |
| **[Oliver](/guide/agents/oliver)** | Security exposure mapping, compliance visualization       |
| **[Tony](/guide/agents/tony)**     | Database dependency chains, performance bottlenecks       |
| **[Kai](/guide/agents/kai)**       | Service mesh visualization, pod relationships             |
| **[Anna](/guide/agents/anna)**     | Cross-service incident coordination, architecture reviews |

### Example Prompts

```bash theme={null}
@alex analyze cost optimization opportunities using topology view
@oliver map security vulnerabilities across the infrastructure topology
@kai show Kubernetes service dependencies and potential single points of failure
@anna use topology to coordinate the database migration impact
```

***

## Export Options

Export topology for documentation and sharing:

* **PNG/SVG** - Static image export
* **PDF** - Printable documentation
* **JSON** - Machine-readable format
* **Share Link** - Collaborative viewing

***

## Real-World Use Cases

### Production Outage Response

**Scenario:** Your payment service is down and customers can't complete orders.

```bash theme={null}
@alex show topology centered on payment-service with all dependencies
```

The topology reveals:

* Payment service connects to **RDS Aurora** (primary database)
* Aurora connects to **ElastiCache** (session cache)
* ElastiCache shows **unhealthy status** ← Root cause identified

**Resolution time:** Minutes instead of hours by visually tracing the dependency chain.

***

### Cloud Migration Planning

**Scenario:** Migrating from on-premises to AWS. Need to understand what moves together.

```bash theme={null}
@alex build topology from our Terraform state and identify migration groups
@anna use topology to create migration waves based on dependencies
```

**Outcome:**

* Wave 1: Stateless web services (low risk)
* Wave 2: Application servers with database dependencies
* Wave 3: Core databases with replication setup
* Wave 4: Final cutover with traffic routing

***

### Security Incident Investigation

**Scenario:** Security alert - unusual traffic from an EC2 instance.

```bash theme={null}
@oliver map all connections from instance i-0abc123 in topology
@oliver trace data flow paths that could expose sensitive data
```

**Topology reveals:**

* Compromised instance has access to **3 S3 buckets**
* Connected to **production RDS** via security group
* Blast radius: 12 downstream services

**Action:** Isolate instance, rotate credentials, audit all connected resources.

***

### Cost Optimization Discovery

**Scenario:** Monthly AWS bill spiked 40%. Need to find the cause.

```bash theme={null}
@alex overlay cost data on infrastructure topology
@alex highlight resources with >$500/month spend
```

**Topology shows:**

* Orphaned load balancers with no targets: **\$180/month**
* Oversized RDS instance (db.r5.4xlarge) for dev: **\$2,400/month**
* Idle EKS node group running 24/7: **\$1,200/month**

**Savings identified:** \$3,780/month by visual inspection.

***

### Compliance Audit Preparation

**Scenario:** SOC 2 audit next month. Need to document data flows.

```bash theme={null}
@oliver generate topology showing all PII data paths
@oliver map encryption status for data at rest and in transit
```

**Deliverables:**

* Visual data flow diagrams for auditors
* Encryption coverage map (gaps highlighted in red)
* Network segmentation proof
* Access control visualization

***

### Disaster Recovery Testing

**Scenario:** Validate DR plan before annual test.

```bash theme={null}
@alex compare production topology with DR region topology
@alex identify resources missing from DR setup
```

**Gaps found:**

* DR missing **ElastiCache** cluster
* **Lambda functions** not replicated
* **S3 cross-region replication** not enabled for 2 buckets

**Fix before DR test:** Avoid embarrassing failures.

***

### New Engineer Onboarding

**Scenario:** New team member needs to understand the architecture.

```bash theme={null}
@anna create topology overview of our e-commerce platform
@alex annotate topology with service responsibilities
```

**Result:** Interactive architecture diagram that new engineers can explore, click on resources to see details, and understand how services connect.

***

### Kubernetes Service Mesh Debugging

**Scenario:** Intermittent 503 errors in production.

```bash theme={null}
@kai map service mesh topology with current health status
@kai show request flow from ingress to failing service
```

**Topology reveals:**

* Ingress → API Gateway → **Order Service** → Inventory Service
* Inventory Service pod: **CrashLoopBackOff**
* Root cause: OOMKilled due to memory leak

***

### Root Cause Analysis (RCA) for Errors

**Scenario:** Application throwing "Connection refused" errors intermittently.

```bash theme={null}
@alex trace error path from web-app through topology
@tony correlate database connection errors with topology dependencies
```

**Topology-driven RCA:**

1. Web App → Load Balancer → **API Server** → Database
2. API Server shows healthy
3. Database connection pool: **Exhausted** ← Root cause
4. Upstream cause: Slow query holding connections

**Resolution:** Optimize slow query, increase connection pool, add connection timeout.

***

### Performance Degradation Analysis

**Scenario:** API response times increased from 200ms to 2 seconds.

```bash theme={null}
@alex analyze performance bottlenecks using topology view
@tony overlay latency metrics on service topology
```

**Topology with metrics overlay:**

```
User → CloudFront (5ms) → ALB (3ms) → API (50ms) → RDS (1800ms) ← Bottleneck
                                    ↘ ElastiCache (2ms)
```

**Findings:**

* Database latency spiked from 20ms to 1800ms
* Missing index on new query pattern
* Table scan on 50M rows

**Fix:** Add composite index, response time back to 200ms.

***

### Cascading Failure Investigation

**Scenario:** Multiple services failing simultaneously.

```bash theme={null}
@anna map failure propagation across topology
@alex identify the origin point of cascading failures
```

**Topology timeline:**

1. **T+0:** Redis cluster failover triggered
2. **T+5s:** Session service lost cache → returning errors
3. **T+10s:** Auth service failing → can't validate tokens
4. **T+15s:** All downstream services rejecting requests

**Root cause:** Redis cluster hit memory limit, triggered unexpected failover.

**Prevention:** Add memory alerts, implement circuit breakers, cache fallbacks.

***

### Memory Leak Detection

**Scenario:** Service restarts every few hours in production.

```bash theme={null}
@kai correlate pod restarts with resource topology
@alex show memory trends for services in the request path
```

**Topology + metrics:**

* Order Service: Memory growing 50MB/hour
* Connected to: Message Queue, Database, Cache
* Leak source: Unclosed database connections after queue processing

**Resolution:** Fix connection cleanup in queue consumer, add connection pool monitoring.

***

### Network Latency Troubleshooting

**Scenario:** Cross-service calls timing out randomly.

```bash theme={null}
@alex map network topology with latency annotations
@kai identify network bottlenecks between services
```

**Topology reveals:**

* Services in **different availability zones**
* NAT Gateway: **Throughput limit reached**
* Cross-AZ traffic: 2ms → 200ms during peak

**Solution:** Co-locate dependent services, add NAT Gateway capacity.

***

### Database Connection Issues

**Scenario:** "Too many connections" errors during peak traffic.

```bash theme={null}
@tony map all services connecting to production database
@alex show connection counts per service in topology
```

**Topology with connection metrics:**

```
┌─────────────────────────────────────────┐
│            RDS PostgreSQL               │
│         Max connections: 500            │
│         Current: 487 (97%)              │
└─────────────────────────────────────────┘
     ↑           ↑           ↑
  API (200)  Worker (250)  Cron (37)
```

**Issue:** Worker service connection pool too large.

**Fix:** Right-size connection pools per service based on actual need.

***

## Related

<CardGroup cols={2}>
  <Card title="Resources" icon="cubes" href="/guide/infrastructure/resources">
    View all discovered infrastructure resources
  </Card>

  <Card title="Assessment" icon="clipboard-check" href="/guide/infrastructure/assessment">
    Run infrastructure assessments
  </Card>
</CardGroup>
