Deep Response Engine Setup

What You’ll Set Up

By the end of this tutorial, signals from your monitoring tools will flow through Pulse (which cuts ~98% of noise via deduplication, suppression, and AI classification), and any cluster that crosses the actionability bar will auto-escalate to an Incident — where agents correlate evidence across metrics, logs, traces, and topology to identify root cause and suggest remediation. For the upstream signal-source setup (AWS pollers, Slack, Teams, third-party webhooks), see Pulse Setup.

Navigate to Deep Response Engine Settings

Go to Deep Response Engine in your workspace. You’ll see the incident dashboard and configuration options.

Connect Monitoring Tools via Webhooks

CloudThinker ingests alerts from 15+ monitoring platforms through webhooks:

Platform	Setup
PagerDuty	Add CloudThinker webhook URL as a service integration
Datadog	Create a webhook notification in Monitors
Prometheus / Alertmanager	Add webhook receiver configuration
AWS CloudWatch	Route alarms through SNS to webhook
Grafana	Add webhook contact point
Opsgenie	Configure webhook integration
New Relic	Add webhook notification channel
Sentry	Configure webhook integration for issues

To connect:

Go to Deep Response Engine > Integrations
Select your monitoring platform
Copy the generated webhook URL
Paste it in your monitoring tool’s webhook/notification settings
Send a test alert to verify the connection

See Webhook Integrations for detailed setup instructions for each platform.

Configure Alert Routing

Once webhooks are connected, configure how alerts are handled:

Auto-investigate: Automatically start AI investigation when an alert arrives (recommended)
Severity mapping: Map your monitoring tool’s severity levels to CloudThinker’s (Critical, High, Medium, Low)
Deduplication: Prevent duplicate incidents from related alerts

Trigger Your First Incident

You can either:

Wait for a real alert: Let your monitoring tools trigger a real incident
Send a test webhook: Use your monitoring tool’s test feature to send a sample alert
Log manually: Go to Deep Response Engine > Manual Logging to create a test incident

For your first run, manual logging lets you see the full investigation flow immediately:

Click New Incident
Describe the issue: “High CPU utilization on production web server”
Set severity and affected resources
Submit

Watch the AI Investigation

Once an incident is created, the AI agent starts a hypothesis-driven investigation:

Initial hypothesis: Forms possible root causes based on the alert data
Evidence gathering: Pulls metrics, logs, traces, configs, and recent deployments
Timeline correlation: Maps events across systems to a unified timeline
Topology analysis: Traces service dependencies to understand blast radius
Root Cause Analysis: Narrows down to the most likely cause with a confidence score

You can watch the investigation in real-time as the agent works through each step.

Review the Root Cause Analysis

The completed investigation shows:

Root cause: The identified issue with confidence score
Evidence chain: All data points that support the conclusion
Blast radius: Which services and users are affected
Timeline: Sequence of events leading to the incident
Remediation: Recommended actions to resolve and prevent recurrence

The agent’s reasoning is fully transparent — you can see every hypothesis it considered and why it was confirmed or ruled out.

Resolve and Learn

After resolving the incident:

Mark the incident as Resolved
The agent stores the investigation in its knowledge base
Future similar incidents benefit from learned patterns

Over time, the system gets faster and more accurate at diagnosing issues it has seen before.

How Investigation Works

Alert arrives → AI forms hypotheses → Gathers evidence (metrics, logs, traces)
→ Correlates timeline → Analyzes topology → Identifies root cause
→ Recommends remediation → Learns from resolution

The entire flow — from alert to root cause — typically completes in under 5 minutes.

Tips

Connect multiple monitoring tools: The more data sources agents can access, the more accurate the Root Cause Analysis
Start with auto-investigate on: Let the AI investigate every alert automatically — you can always tune later
Review dismissed hypotheses: Understanding why the agent ruled out alternatives builds trust in its reasoning
Enable Slack notifications: Route incident updates to your #incidents channel so the team stays informed
Combine with CloudKeepers: Many incidents are preventable — CloudKeepers catch drift before it causes outages

Tutorial Complete

You’ve now set up the core CloudThinker features end-to-end:

VibeOps

Conversational cloud operations

Code Review

AI-powered PR reviews

CloudKeepers

Autonomous monitoring

Assessment

Infrastructure analysis

What’s Next

Use Cases

Real-world examples of CloudThinker in action

Connections

Add more cloud providers and integrations

Slack Integration

Run operations from Slack

​What You’ll Set Up

​How Investigation Works

​Tips

​Tutorial Complete

VibeOps

Code Review

CloudKeepers

Assessment

​What’s Next

Use Cases

Connections

Slack Integration

What You’ll Set Up

How Investigation Works

Tips

Tutorial Complete

What’s Next