Topology

The Topology Explorer provides interactive visualization of your cloud infrastructure and service relationships. Build topology maps manually, let agents discover them, or import from Infrastructure as Code.

Overview

Topology maps help you:

Visualize relationships between cloud resources
Understand dependencies across services
Support incident response with visual context
Enable root cause analysis (RCA) by tracing connections
Document architecture for team knowledge sharing

Building Topology

Agent-Led Discovery
Import from IaC
Manual Builder

Let CloudThinker agents automatically discover and map your infrastructure.

@alex discover and map infrastructure topology for production
@kai map Kubernetes service dependencies
@alex build topology from AWS account including all VPCs

Benefits:

Automatic resource discovery
Real-time relationship mapping
Continuous sync with infrastructure changes

Import topology from your Infrastructure as Code sources.Supported sources:

Terraform State - Import from .tfstate files
CloudFormation - Import from stack templates
Pulumi - Import from state files

Navigate to Topology

Go to Infrastructure → Topology

Click Import

Select New View → Import from IaC

Select Source

Choose Terraform State, CloudFormation, or other supported format

Upload or Connect

Upload state file or connect to remote state backend

Resource Types

The Topology Explorer supports all major cloud resource types:

Category	Resources
Compute	EC2, Lambda, ECS, EKS, VMs, Cloud Run
Networking	VPC, Load Balancers, CloudFront, API Gateway
Database	RDS, Aurora, DynamoDB, Cloud SQL
Storage	S3, EFS, EBS, Cloud Storage
Security	IAM Roles, Security Groups, ACM Certificates
Kubernetes	Clusters, Deployments, Services, Pods

Using Topology for Incident Response

Topology maps are invaluable during incidents:

Root Cause Analysis (RCA)

@alex use topology to trace the impact of RDS outage
@kai show all services affected by the failing pod
@anna coordinate incident response using infrastructure map

Impact Analysis

Visualize blast radius and affected services:

@alex show downstream dependencies of payment-service
@kai map all services connected to the database cluster
@oliver identify security exposure paths in topology

Real-Time Status

During incidents, topology shows:

Health status of each resource
Connection states between services
Error propagation paths
Recovery progress visualization

Views and Filters

Load View

Access saved topology views from the Load View dropdown.

Filter Resources

Use the search and filter panel to:

Search by resource name or ID
Filter by resource type (EC2, RDS, EKS, etc.)
Filter by tags or metadata
Show/hide resource categories

Sync Status

The Synced indicator shows when topology was last updated from your infrastructure.

Agent Integration

Agents use topology for enhanced analysis:

Agent	Topology Usage
Alex	Cost impact visualization, resource optimization paths
Oliver	Security exposure mapping, compliance visualization
Tony	Database dependency chains, performance bottlenecks
Kai	Service mesh visualization, pod relationships
Anna	Cross-service incident coordination, architecture reviews

Example Prompts

@alex analyze cost optimization opportunities using topology view
@oliver map security vulnerabilities across the infrastructure topology
@kai show Kubernetes service dependencies and potential single points of failure
@anna use topology to coordinate the database migration impact

Export Options

Export topology for documentation and sharing:

PNG/SVG - Static image export
PDF - Printable documentation
JSON - Machine-readable format
Share Link - Collaborative viewing

Real-World Use Cases

Production Outage Response

Scenario: Your payment service is down and customers can’t complete orders.

@alex show topology centered on payment-service with all dependencies

The topology reveals:

Payment service connects to RDS Aurora (primary database)
Aurora connects to ElastiCache (session cache)
ElastiCache shows unhealthy status ← Root cause identified

Resolution time: Minutes instead of hours by visually tracing the dependency chain.

Cloud Migration Planning

Scenario: Migrating from on-premises to AWS. Need to understand what moves together.

@alex build topology from our Terraform state and identify migration groups
@anna use topology to create migration waves based on dependencies

Outcome:

Wave 1: Stateless web services (low risk)
Wave 2: Application servers with database dependencies
Wave 3: Core databases with replication setup
Wave 4: Final cutover with traffic routing

Security Incident Investigation

Scenario: Security alert - unusual traffic from an EC2 instance.

@oliver map all connections from instance i-0abc123 in topology
@oliver trace data flow paths that could expose sensitive data

Topology reveals:

Compromised instance has access to 3 S3 buckets
Connected to production RDS via security group
Blast radius: 12 downstream services

Action: Isolate instance, rotate credentials, audit all connected resources.

Cost Optimization Discovery

Scenario: Monthly AWS bill spiked 40%. Need to find the cause.

@alex overlay cost data on infrastructure topology
@alex highlight resources with >$500/month spend

Topology shows:

Orphaned load balancers with no targets: $180/month
Oversized RDS instance (db.r5.4xlarge) for dev: $2,400/month
Idle EKS node group running 24/7: $1,200/month

Savings identified: $3,780/month by visual inspection.

Compliance Audit Preparation

Scenario: SOC 2 audit next month. Need to document data flows.

@oliver generate topology showing all PII data paths
@oliver map encryption status for data at rest and in transit

Deliverables:

Visual data flow diagrams for auditors
Encryption coverage map (gaps highlighted in red)
Network segmentation proof
Access control visualization

Disaster Recovery Testing

Scenario: Validate DR plan before annual test.

@alex compare production topology with DR region topology
@alex identify resources missing from DR setup

Gaps found:

DR missing ElastiCache cluster
Lambda functions not replicated
S3 cross-region replication not enabled for 2 buckets

Fix before DR test: Avoid embarrassing failures.

New Engineer Onboarding

Scenario: New team member needs to understand the architecture.

@anna create topology overview of our e-commerce platform
@alex annotate topology with service responsibilities

Result: Interactive architecture diagram that new engineers can explore, click on resources to see details, and understand how services connect.

Kubernetes Service Mesh Debugging

Scenario: Intermittent 503 errors in production.

@kai map service mesh topology with current health status
@kai show request flow from ingress to failing service

Topology reveals:

Ingress → API Gateway → Order Service → Inventory Service
Inventory Service pod: CrashLoopBackOff
Root cause: OOMKilled due to memory leak

Root Cause Analysis (RCA) for Errors

Scenario: Application throwing “Connection refused” errors intermittently.

@alex trace error path from web-app through topology
@tony correlate database connection errors with topology dependencies

Topology-driven RCA:

Web App → Load Balancer → API Server → Database
API Server shows healthy
Database connection pool: Exhausted ← Root cause
Upstream cause: Slow query holding connections

Resolution: Optimize slow query, increase connection pool, add connection timeout.

Performance Degradation Analysis

Scenario: API response times increased from 200ms to 2 seconds.

@alex analyze performance bottlenecks using topology view
@tony overlay latency metrics on service topology

Topology with metrics overlay:

User → CloudFront (5ms) → ALB (3ms) → API (50ms) → RDS (1800ms) ← Bottleneck
                                    ↘ ElastiCache (2ms)

Findings:

Database latency spiked from 20ms to 1800ms
Missing index on new query pattern
Table scan on 50M rows

Fix: Add composite index, response time back to 200ms.

Cascading Failure Investigation

Scenario: Multiple services failing simultaneously.

@anna map failure propagation across topology
@alex identify the origin point of cascading failures

Topology timeline:

T+0: Redis cluster failover triggered
T+5s: Session service lost cache → returning errors
T+10s: Auth service failing → can’t validate tokens
T+15s: All downstream services rejecting requests

Root cause: Redis cluster hit memory limit, triggered unexpected failover. Prevention: Add memory alerts, implement circuit breakers, cache fallbacks.

Memory Leak Detection

Scenario: Service restarts every few hours in production.

@kai correlate pod restarts with resource topology
@alex show memory trends for services in the request path

Topology + metrics:

Order Service: Memory growing 50MB/hour
Connected to: Message Queue, Database, Cache
Leak source: Unclosed database connections after queue processing

Resolution: Fix connection cleanup in queue consumer, add connection pool monitoring.

Network Latency Troubleshooting

Scenario: Cross-service calls timing out randomly.

@alex map network topology with latency annotations
@kai identify network bottlenecks between services

Topology reveals:

Services in different availability zones
NAT Gateway: Throughput limit reached
Cross-AZ traffic: 2ms → 200ms during peak

Solution: Co-locate dependent services, add NAT Gateway capacity.

Database Connection Issues

Scenario: “Too many connections” errors during peak traffic.

@tony map all services connecting to production database
@alex show connection counts per service in topology

Topology with connection metrics:

┌─────────────────────────────────────────┐
│            RDS PostgreSQL               │
│         Max connections: 500            │
│         Current: 487 (97%)              │
└─────────────────────────────────────────┘
     ↑           ↑           ↑
  API (200)  Worker (250)  Cron (37)

Issue: Worker service connection pool too large. Fix: Right-size connection pools per service based on actual need.

Resources

View all discovered infrastructure resources

Assessment

Run infrastructure assessments

Start Here

Code Review

Infrastructure

Incident

Setup

Use Cases

Reference

Topology

Topology

Overview

Building Topology

Resource Types

Using Topology for Incident Response

Root Cause Analysis (RCA)

Impact Analysis

Real-Time Status

Views and Filters

Load View

Filter Resources

Sync Status

Agent Integration

Example Prompts

Export Options

Real-World Use Cases

Production Outage Response

Cloud Migration Planning

Security Incident Investigation

Cost Optimization Discovery

Compliance Audit Preparation

Disaster Recovery Testing

New Engineer Onboarding

Kubernetes Service Mesh Debugging

Root Cause Analysis (RCA) for Errors

Performance Degradation Analysis

Cascading Failure Investigation

Memory Leak Detection

Network Latency Troubleshooting

Database Connection Issues

Resources

Assessment

Start Here

Code Review

Infrastructure

Incident

Setup

Use Cases

Reference

​Topology

​Overview

​Building Topology

​Resource Types

​Using Topology for Incident Response

​Root Cause Analysis (RCA)

​Impact Analysis

​Real-Time Status

​Views and Filters

​Load View

​Filter Resources

​Sync Status

​Agent Integration

​Example Prompts

​Export Options

​Real-World Use Cases

​Production Outage Response

​Cloud Migration Planning

​Security Incident Investigation

​Cost Optimization Discovery

​Compliance Audit Preparation

​Disaster Recovery Testing

​New Engineer Onboarding

​Kubernetes Service Mesh Debugging

​Root Cause Analysis (RCA) for Errors

​Performance Degradation Analysis

​Cascading Failure Investigation

​Memory Leak Detection

​Network Latency Troubleshooting

​Database Connection Issues

​Related

Resources

Assessment

Topology

Overview

Building Topology

Resource Types

Using Topology for Incident Response

Root Cause Analysis (RCA)

Impact Analysis

Real-Time Status

Views and Filters

Load View

Filter Resources

Sync Status

Agent Integration

Example Prompts

Export Options

Real-World Use Cases

Production Outage Response

Cloud Migration Planning

Security Incident Investigation

Cost Optimization Discovery

Compliance Audit Preparation

Disaster Recovery Testing

New Engineer Onboarding

Kubernetes Service Mesh Debugging

Root Cause Analysis (RCA) for Errors

Performance Degradation Analysis

Cascading Failure Investigation

Memory Leak Detection

Network Latency Troubleshooting

Database Connection Issues

Related