KubeasyKubeasy
Troubleshooting

Kubernetes Pod Stuck in Pending: Troubleshooting Guide

Why pods get stuck in Pending state and how to fix it. Covers resource constraints, node selectors, taints, and PVC issues.

Paul BrissaudPaul Brissaud
5 min read
#troubleshooting#pods#scheduling

You run kubectl get pods and see it: your pod is stuck in Pending. No crash, no error logs, no container even started. It just sits there. Unlike CrashLoopBackOff where the container at least tries to run, a Pending pod means Kubernetes can't even find a place to schedule it.

Quick Answer

Your pod can't be scheduled onto any node. To find out why:

kubectl describe pod <pod-name>

Scroll to the Events section. You'll typically see one of these messages:

Warning  FailedScheduling  default-scheduler  0/3 nodes are available:
  1 node(s) had taint {node-role.kubernetes.io/control-plane: }, that the pod didn't tolerate,
  2 node(s) didn't match Pod's node affinity/selector.

The scheduler is telling you exactly what's wrong — you just need to know how to read it.


What Does Pending Mean?

A pod enters the Pending state when it has been accepted by the Kubernetes API server but hasn't been scheduled onto a node yet. The scheduler's job is to find a node that satisfies all of the pod's constraints. If no node matches, the pod stays Pending indefinitely.

This is fundamentally different from other pod issues:


Common Causes


Step-by-Step Troubleshooting

Step 1: Check the Events

This is always your first move:

kubectl describe pod <pod-name>

The Events section at the bottom is the most important part. The scheduler explains its decision for every node in the cluster. Read it carefully — it's the key to everything.

Step 2: Verify Cluster Resources

Check what's available on your nodes:

# Overall node capacity and allocations
kubectl describe nodes | grep -A 5 "Allocated resources"

# Quick view with metrics-server
kubectl top nodes

Compare against what your pod is requesting:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}' | jq

Step 3: Check Node Labels and Taints

If the scheduler mentions selectors or taints:

# View all node labels
kubectl get nodes --show-labels

# View taints on all nodes
kubectl describe nodes | grep -A 3 "Taints"

Step 4: Check PVC Status

If your pod uses persistent volumes:

kubectl get pvc -n <namespace>

A PVC in Pending state means the volume isn't provisioned yet — and your pod will stay Pending until it is.


Solutions by Cause

Cause A: Insufficient Resources

Symptoms: The scheduler message shows Insufficient cpu or Insufficient memory for all nodes.

0/3 nodes are available: 1 Insufficient cpu, 2 Insufficient memory.

This means no node has enough allocatable resources to satisfy the pod's requests. Note that Kubernetes uses requests (not limits) for scheduling.

Fix option 1 — Lower the pod's resource requests:

resources:
  requests:
    memory: "128Mi"  # Reduced from 512Mi
    cpu: "100m"       # Reduced from 500m
  limits:
    memory: "256Mi"
    cpu: "500m"

Fix option 2 — Free up resources by removing unused workloads:

# Find pods using the most resources
kubectl top pods -A --sort-by=memory

# Look for completed jobs or failed pods
kubectl get pods -A --field-selector=status.phase=Succeeded
kubectl get pods -A --field-selector=status.phase=Failed

Fix option 3 — Add more nodes to the cluster. If you're using a cloud provider with Cluster Autoscaler, this should happen automatically when pods are pending due to resource constraints.

Cause B: Node Selector Mismatch

Symptoms: The scheduler message shows didn't match Pod's node affinity/selector.

Your pod specifies a nodeSelector or nodeAffinity that no node satisfies.

Identify the problem:

# Check what the pod requires
kubectl get pod <pod-name> -o jsonpath='{.spec.nodeSelector}' | jq

# Check what labels your nodes have
kubectl get nodes --show-labels

Fix option 1 — Add the missing label to a node:

kubectl label node <node-name> disktype=ssd

Fix option 2 — Update the pod's nodeSelector to match existing labels:

spec:
  nodeSelector:
    kubernetes.io/os: linux  # Use a label that actually exists

Cause C: Taints and Tolerations

Symptoms: The scheduler message mentions taints the pod doesn't tolerate.

0/3 nodes are available: 3 node(s) had taint {dedicated: gpu}, that the pod didn't tolerate.

Taints are a way to repel pods from nodes. Only pods with a matching toleration can schedule onto a tainted node.

View the taints:

kubectl describe nodes | grep -B 2 -A 3 "Taints"

Fix option 1 — Add a toleration to the pod:

spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

Fix option 2 — Remove the taint from a node (if it was applied by mistake):

kubectl taint node <node-name> dedicated=gpu:NoSchedule-

The trailing - removes the taint.

Common gotcha: Control plane nodes are tainted with node-role.kubernetes.io/control-plane:NoSchedule by default. If you're running a single-node cluster (like minikube), this taint is usually removed automatically. In multi-node clusters, don't schedule application workloads on control plane nodes unless you have a good reason.

Cause D: PVC Pending

Symptoms: kubectl describe pod shows the pod is waiting for a volume to be bound.

Warning  FailedScheduling  default-scheduler  0/3 nodes are available:
  3 node(s) didn't find available persistent volumes to bind.

Check PVC status:

kubectl get pvc -n <namespace>

If the PVC shows Pending, the issue is with the volume, not the pod.

Common PVC issues:

Fix example — Check and fix the StorageClass:

# List available StorageClasses
kubectl get storageclass

# Check if there's a default
kubectl get storageclass -o jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}'

Cause E: Pod Affinity/Anti-Affinity Constraints

Symptoms: didn't match pod affinity rules or didn't match pod anti-affinity rules.

Pod affinity rules require the pod to be co-located with (or away from) other pods.

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: my-app
      topologyKey: kubernetes.io/hostname

This anti-affinity rule says "don't put two pods with label app: my-app on the same node." If you have 3 replicas but only 2 nodes, the third pod will stay Pending.

Fix: Switch from requiredDuringSchedulingIgnoredDuringExecution to preferredDuringSchedulingIgnoredDuringExecution. This makes it a soft constraint — the scheduler will try to honor it but won't block scheduling if it can't.

Cause F: Missing ConfigMap, Secret, or Volume

Symptoms: The pod stays stuck in Pending or transitions to ContainerCreating but never reaches Running. Events show references to missing resources.

Warning  FailedMount  kubelet  MountVolume.SetUp failed for volume "config-vol":
  configmap "app-config" not found

Or for Secrets:

Warning  FailedMount  kubelet  MountVolume.SetUp failed for volume "secret-vol":
  secret "db-credentials" not found

This happens when a pod references a ConfigMap, Secret, or Volume that doesn't exist in the namespace. Kubernetes can't start the container because it can't mount the required data.

Identify the problem:

# Check which volumes the pod expects
kubectl get pod <pod-name> -o jsonpath='{.spec.volumes}' | jq

# Check if the ConfigMap exists
kubectl get configmap -n <namespace>

# Check if the Secret exists
kubectl get secret -n <namespace>

Fix option 1 — Create the missing resource:

# Create a ConfigMap from a file
kubectl create configmap app-config --from-file=config.yaml -n <namespace>

# Create a Secret
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password=secret -n <namespace>

Fix option 2 — If the resource is optional, mark it as such in the pod spec. This tells Kubernetes to start the container even if the ConfigMap or Secret doesn't exist:

volumes:
- name: config-vol
  configMap:
    name: app-config
    optional: true  # Pod starts even if ConfigMap is missing
- name: secret-vol
  secret:
    secretName: db-credentials
    optional: true  # Pod starts even if Secret is missing

The same works for env and envFrom references:

env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-credentials
      key: password
      optional: true
envFrom:
- configMapRef:
    name: app-config
    optional: true

Common gotcha: ConfigMaps and Secrets are namespace-scoped. If your pod is in namespace production but the Secret was created in default, the pod won't find it. Always double-check the namespace.


Debugging Decision Tree

Here's a quick mental model for diagnosing Pending pods:

Pod is Pending
│
├─ kubectl describe pod → Check Events
│
├─ "Insufficient cpu/memory"
│  → kubectl top nodes → reduce requests or add capacity
│
├─ "didn't match node selector/affinity"
│  → kubectl get nodes --show-labels → fix labels or selector
│
├─ "had taint ... didn't tolerate"
│  → kubectl describe nodes | grep Taint → add toleration or remove taint
│
├─ "persistentvolumeclaim not found/unbound"
│  → kubectl get pvc → fix PVC/PV/StorageClass
│
├─ "configmap/secret not found"
│  → kubectl get configmap,secret -n <ns> → create missing resource or mark optional
│
└─ "didn't match pod affinity rules"
   → Review affinity config → switch to preferred or add nodes

Practice These Scenarios

Theory is useful, but nothing beats hands-on troubleshooting. These Kubeasy challenges drop you into a broken cluster and ask you to fix it:

Start the Stuck Pending Challenge →

In this challenge, a pod refuses to schedule despite available nodes. You'll need to investigate the scheduler's decisions and fix the pod configuration to get it running. (~15 min, medium difficulty)

Start the Tainted Out Challenge →

This one focuses on taints and tolerations. A critical application pod stays Pending because of node taints. You'll learn to read taint-related errors and apply the right toleration. (~20 min, medium difficulty)


Prevention Tips

  • Always check kubectl describe pod first — The scheduler events tell you exactly what's wrong. Get in the habit of reading them
  • Set reasonable resource requests — Don't request 4 CPU cores if your app uses 200m. Over-requesting is the number one cause of scheduling failures
  • Use preferredDuringScheduling over required when possible — Hard constraints cause Pending pods. Soft constraints give the scheduler flexibility
  • Monitor cluster capacity — Set up alerts when node allocatable resources drop below 20%. Tools like Prometheus with kube_node_status_allocatable metrics work well
  • Label nodes consistently — Establish a labeling convention and enforce it. Inconsistent labels lead to mysterious scheduling failures
  • Test taints and tolerations in staging — Taints are easy to misconfigure. Verify your tolerations match before applying to production
  • Written by

    Paul Brissaud

    Paul Brissaud

    Paul Brissaud is a DevOps / Platform Engineer and the creator of Kubeasy. He believes Kubernetes education is often too theoretical and that real understanding comes from hands-on, failure-driven learning.

    Related Articles