Skip to main content

Kubernetes

Connect your Kubernetes clusters to enable Kai (Kubernetes Engineer) to analyze workloads, optimize resources, and manage cluster operations.

Supported Platforms

PlatformSupport
Amazon EKSAll versions
Google GKEStandard, Autopilot
Azure AKSAll versions
Self-managedKubernetes 1.24+
RancherRKE, RKE2
OpenShift4.x

Setup Methods


Kubeconfig Format

apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: <base64-encoded-ca-cert>
    server: https://your-cluster-endpoint:6443
  name: your-cluster
contexts:
- context:
    cluster: your-cluster
    user: cloudthinker-readonly
  name: cloudthinker-context
current-context: cloudthinker-context
users:
- name: cloudthinker-readonly
  user:
    token: <your-service-account-token>

Required Permissions

Minimum (Read-Only)

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cloudthinker-readonly
rules:
- apiGroups: [""]
  resources: ["pods", "nodes", "services", "namespaces", "events", "configmaps"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["metrics.k8s.io"]
  resources: ["pods", "nodes"]
  verbs: ["get", "list"]
# Use built-in ClusterRole
kind: ClusterRoleBinding
roleRef:
  kind: ClusterRole
  name: view  # Built-in read-only role

Agent Capabilities

Once connected, Kai can:
CapabilityDescription
Resource AnalysisPod CPU/memory usage, requests vs limits
Node HealthNode status, capacity, allocatable resources
Workload OptimizationRight-sizing recommendations, HPA tuning
TroubleshootingCrashLoopBackOff, OOMKilled, pending pods
Security AuditRBAC review, pod security, network policies

Example Prompts

@kai analyze pod resource utilization in production namespace
@kai identify nodes with <30% CPU utilization
@kai investigate crash loops in payment service
@kai #recommend HPA policies for web deployments

Prerequisites

For full functionality, ensure:
ComponentPurpose
Metrics ServerRequired for resource metrics
kube-state-metricsEnhanced cluster metrics (optional)
Network accessCloudThinker must reach API server

Install Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Troubleshooting

  • Verify API server endpoint is accessible from internet
  • Check firewall/security groups allow CloudThinker IPs
  • For private clusters: Set up VPN or bastion access
  • Confirm API server certificate is valid
  • Verify service account token is correct
  • Check ClusterRoleBinding is applied
  • Ensure token hasn’t expired
  • Confirm service account exists in correct namespace
  • Verify Metrics Server is installed: kubectl top nodes
  • Check Metrics Server pods are running
  • Ensure metrics.k8s.io API is available
  • Verify ClusterRole has namespace list permission
  • Check if RBAC restricts access to certain namespaces
  • Confirm service account binding is cluster-wide

Security Best Practices

  • Read-only access - Never grant write permissions to CloudThinker
  • Namespace isolation - Keep service account in dedicated namespace
  • Token rotation - Rotate service account tokens periodically
  • Network policies - Restrict API server access to CloudThinker IPs
  • Audit logging - Enable Kubernetes audit logs