Kai — Kubernetes Engineer
Kai is CloudThinker’s container orchestration expert, specializing in Kubernetes cluster management, workload optimization, autoscaling, and operational troubleshooting across EKS, GKE, AKS, and self-managed clusters.The Problem Kai Solves
Kubernetes is powerful but deeply complex. Most teams provision resource requests and limits once (or copy them from a template), then never revisit them. Pods get OOMKilled because limits are too low; nodes are underutilized because requests are too high. Cluster autoscaler adds nodes instead of right-sizing workloads. RBAC configurations drift from least-privilege as service accounts accumulate permissions. Operating Kubernetes well requires daily attention from someone with deep expertise:- Monitoring pod resource utilization across hundreds of pods across multiple namespaces
- Diagnosing crash loops by reading logs, events, and checking resource constraints
- Tuning HPA thresholds, VPA recommendations, and Cluster Autoscaler behavior
- Auditing RBAC configurations and network policies for security gaps
How Existing Tools Compare
| Tool | What It Does | What’s Missing |
|---|---|---|
| kubectl | Direct cluster API access | Raw tool, requires deep expertise, no analysis or recommendations |
| Lens / k9s | Kubernetes dashboards and CLI | Visualization only, no AI analysis, no recommendations |
| Kubecost | Kubernetes cost allocation and reporting | Cost visibility only, no troubleshooting or optimization guidance |
| Datadog / Prometheus + Grafana | Kubernetes metrics and alerting | Monitoring only, still requires expert interpretation to act |
| KEDA / VPA | Autoscaling automation | Single-purpose tools, no holistic cluster analysis |
How Kai Works
- Connects to Kubernetes API — reads pods, nodes, deployments, services, events, and RBAC configurations across all namespaces
- Pulls metrics — correlates Kubernetes API state with metrics-server data (CPU/memory actual vs. requested)
- Identifies inefficiency patterns — OOMKill history, pending pods, underutilized nodes, misconfigured autoscaling policies
- Generates specific recommendations — exact resource request/limit values based on actual P95 utilization, HPA threshold adjustments, RBAC policy changes
- Troubleshoots with context — when a pod fails, Kai reads logs, events, and resource state simultaneously to identify root cause instead of having you correlate them manually
Capabilities
| Domain | Capabilities |
|---|---|
| Cluster Management | Health monitoring, node management, resource allocation, upgrades |
| Workload Optimization | Pod right-sizing, resource requests/limits, scheduling efficiency |
| Autoscaling | HPA/VPA/Cluster Autoscaler optimization, scaling policies |
| Security | RBAC auditing, network policies, pod security, secrets management |
| Troubleshooting | Crash loops, OOMKills, scheduling failures, networking issues |
Supported Platforms
| Platform | Support Level |
|---|---|
| Amazon EKS | Full support with AWS integration |
| Google GKE | Full support with GCP integration |
| Azure AKS | Full support with Azure integration |
| Self-Managed | Kubernetes 1.24+ with metrics-server |
Prompt Patterns
Cluster Health
Workload Optimization
Autoscaling
Troubleshooting
Security
Tool Usage
| Tool | Kai Use Case |
|---|---|
#dashboard | Cluster health, node status, resource utilization, pod metrics |
#report | Optimization analysis, security audits, capacity planning |
#recommend | Right-sizing, scaling policies, consolidation actions |
#alert | OOMKills, node pressure, pod failures, resource thresholds |
#chart | Resource trends, scaling patterns, utilization over time |
Examples with Tools
Effective Prompts
Include Cluster Context
Define Success Metrics
Connection Requirements
Kai requires Kubernetes cluster access with monitoring capabilities:| Component | Required Access |
|---|---|
| Kubernetes API | Read access to pods, nodes, deployments, services |
| Metrics Server | Resource metrics for pods and nodes |
| Events | Cluster events for troubleshooting |
| Logs | Container logs for debugging |