KlusterAlert

Issue Detection

KlusterAlert automatically monitors your pods and nodes every 60 seconds. There are no rules to write. Detection is built in. The agent queries the Kubernetes API, identifies issues, and pushes them to KlusterAlert for alerting and AI analysis.

How detection works

The agent running inside your cluster uses the Kubernetes API to list all pods and check their status every 60 seconds. When it detects a problem, such as a crash loop or stuck pod, it pushes an event to KlusterAlert over HTTPS. KlusterAlert deduplicates the event and fires a notification to your configured channels.

The agent is stateless and read-only. It never writes to your cluster. It only reads pod and node state from the Kubernetes API.

Detected issue types

IssuePlansDescription
CrashLoopBackOffAllContainer is repeatedly crashing and being restarted by Kubernetes
Pod FailedAllPod is in a terminal Failed phase and not restarting
High restart countAllContainer has restarted more than the detection threshold within a window
ImagePullBackOffAllKubernetes cannot pull the container image. Check the image tag and or registry credentials
Pod stuck PendingAllPod has been in Pending phase for longer than expected (scheduling failure)
OOMKilledAllContainer was killed by the kernel because it exceeded its memory limit
High memory usagePro+Pod memory usage is above the configured threshold or rising sharply
CPU spikePro+Pod CPU usage has spiked significantly above its recent baseline

Alert deduplication

KlusterAlert deduplicates alerts so you don't get spammed. If the same pod is still in CrashLoopBackOff on the next check, a second notification is not sent until the deduplication window expires. This prevents notification storms during incidents.

Testing detection end-to-end

Deploy a pod that immediately exits to trigger a CrashLoopBackOff alert:

Trigger a test alert
kubectl run crasher --image=busybox -- /bin/sh -c "exit 1"

# Within ~60 seconds KlusterAlert detects the crash loop.
# Check your notification channel for the alert.

# Clean up
kubectl delete pod crasher
The alert includes the cluster name, namespace, pod name, issue type, restart count, and, on Pro and above, an AI-generated plain-English explanation with a suggested kubectl command to investigate.

What an alert looks like

Example alert payload
Cluster:    production
Namespace:  default
Pod:        payments-service-7d9f4b-xk2
Issue:      CrashLoopBackOff
Restarts:   7
Detected:   2026-05-18T14:32:00Z

AI explanation (Pro+):
  The container is crashing immediately after startup with exit code 1.
  Most likely causes: missing environment variable or secret, application
  startup bug, or incompatible image tag recently deployed.

  Investigate with:
  kubectl logs payments-service-7d9f4b-xk2 -n default --previous