What You’ll Set Up
By the end of this tutorial, alerts from your monitoring tools will automatically trigger AI-powered investigation — agents correlate evidence across metrics, logs, traces, and topology to identify root cause and suggest remediation.Navigate to Incident Settings
Go to Incident in your workspace. You’ll see the incident dashboard and configuration options.
Connect Monitoring Tools via Webhooks
CloudThinker ingests alerts from 15+ monitoring platforms through webhooks:
To connect:
| Platform | Setup |
|---|---|
| PagerDuty | Add CloudThinker webhook URL as a service integration |
| Datadog | Create a webhook notification in Monitors |
| Prometheus / Alertmanager | Add webhook receiver configuration |
| AWS CloudWatch | Route alarms through SNS to webhook |
| Grafana | Add webhook contact point |
| Opsgenie | Configure webhook integration |
| New Relic | Add webhook notification channel |
| Sentry | Configure webhook integration for issues |
- Go to Incident > Integrations
- Select your monitoring platform
- Copy the generated webhook URL
- Paste it in your monitoring tool’s webhook/notification settings
- Send a test alert to verify the connection
See Webhook Integrations for detailed setup instructions for each platform.
Configure Alert Routing
Once webhooks are connected, configure how alerts are handled:
- Auto-investigate: Automatically start AI investigation when an alert arrives (recommended)
- Severity mapping: Map your monitoring tool’s severity levels to CloudThinker’s (Critical, High, Medium, Low)
- Deduplication: Prevent duplicate incidents from related alerts
Trigger Your First Incident
You can either:
- Wait for a real alert: Let your monitoring tools trigger a real incident
- Send a test webhook: Use your monitoring tool’s test feature to send a sample alert
- Log manually: Go to Incident > Manual Logging to create a test incident
- Click New Incident
- Describe the issue: “High CPU utilization on production web server”
- Set severity and affected resources
- Submit
Watch the AI Investigation
Once an incident is created, the AI agent starts a hypothesis-driven investigation:
- Initial hypothesis: Forms possible root causes based on the alert data
- Evidence gathering: Pulls metrics, logs, traces, configs, and recent deployments
- Timeline correlation: Maps events across systems to a unified timeline
- Topology analysis: Traces service dependencies to understand blast radius
- Root cause identification: Narrows down to the most likely cause with a confidence score
Review the Root Cause Analysis
The completed investigation shows:
- Root cause: The identified issue with confidence score
- Evidence chain: All data points that support the conclusion
- Blast radius: Which services and users are affected
- Timeline: Sequence of events leading to the incident
- Remediation: Recommended actions to resolve and prevent recurrence
How Investigation Works
Tips
- Connect multiple monitoring tools: The more data sources agents can access, the more accurate the root cause analysis
- Start with auto-investigate on: Let the AI investigate every alert automatically — you can always tune later
- Review dismissed hypotheses: Understanding why the agent ruled out alternatives builds trust in its reasoning
- Enable Slack notifications: Route incident updates to your
#incidentschannel so the team stays informed - Combine with CloudKeepers: Many incidents are preventable — CloudKeepers catch drift before it causes outages
Tutorial Complete
You’ve now set up the core CloudThinker features end-to-end:VibeOps
Conversational cloud operations
Code Review
AI-powered PR reviews
CloudKeepers
Autonomous monitoring
Assessment
Infrastructure analysis