G
GuideDevOps
Lesson 16 of 17

Kubernetes Troubleshooting

Part of the Kubernetes tutorial series.

Kubernetes is complex, and things will eventually break. Here is the standard workflow for diagnosing and fixing cluster issues.

1. The Pod is failing (CrashLoopBackOff / Error)

This is the most common issue.

Step 1: Check Pod Status

Action:

kubectl get pods

Result:

NAME        READY   STATUS             RESTARTS   AGE
web-app-1   0/1     CrashLoopBackOff   4          2m

Step 2: Check Events (The "Why")

Action:

kubectl describe pod web-app-1

Result: Look for the Events section at the bottom:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Warning  Failed     30s (x3 over 1m)   kubelet            Error: ImagePullBackOff

(This tells you Kubernetes can't find your image or doesn't have permission to pull it)

Step 3: Check Logs (Application Errors)

Action:

kubectl logs web-app-1

Result:

[FATAL] Database connection failed: 'db-service' not found.

(This tells you the issue is inside your code or configuration)


2. Service is unreachable

If your app is running but you can't talk to it.

Step 1: Verify Endpoints

Action:

kubectl get endpoints my-web-service

Result:

NAME             ENDPOINTS            AGE
my-web-service   <none>               5m

Interpretation: <none> means your Service Selector doesn't match your Pod Labels. Fix your YAML!


3. Node or Cluster Issues

Check Node Health

Action:

kubectl get nodes

Result:

NAME       STATUS     ROLES           AGE   VERSION
node-1     Ready      control-plane   10d   v1.26.1
node-2     NotReady   <none>          10d   v1.26.1

Interpretation: NotReady often means the node is out of disk space, memory, or the Kubelet has crashed.


Summary: Debugging Checklist

  1. kubectl get pods: Is it running?
  2. kubectl describe: What do the Events say?
  3. kubectl logs: What does the app code say?
  4. kubectl get events -A: Check global cluster errors.
  5. kubectl exec: Can you ping other services from inside?