KubeasyKubeasy
Troubleshooting

How to Fix Kubernetes OOMKilled Pods

Complete troubleshooting guide for OOMKilled errors. Covers diagnosis with kubectl describe, understanding memory limits vs requests, and step-by-step fixes.

Paul BrissaudPaul Brissaud
2 min read
#troubleshooting#pods#scaling#beginner

Your Kubernetes pod just got killed with an OOMKilled status. The container keeps restarting, entering a CrashLoopBackOff state. Sound familiar? This is one of the most common issues in Kubernetes, and fortunately, it's straightforward to diagnose and fix.

Quick Answer

Your pod is using more memory than its configured limit. To fix it:

kubectl describe pod <pod-name> | grep -A 5 "Last State"

If you see Reason: OOMKilled, increase the memory limit in your deployment:

resources:
  limits:
    memory: "512Mi"  # Increase this value
  requests:
    memory: "256Mi"

What Causes OOMKilled?

Kubernetes kills pods with OOMKilled when a container exceeds its memory limit. This is the kernel's Out-Of-Memory (OOM) killer in action, protecting the node from running out of memory.

Common Causes


Step-by-Step Troubleshooting

Step 1: Confirm the OOMKilled Status

First, check if your pod was actually OOMKilled:

kubectl describe pod <pod-name>

Look for this in the output:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Exit code 137 = 128 + 9 (SIGKILL). This confirms the kernel killed your container.

Step 2: Check Current Resource Configuration

See what limits are currently set:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}' | jq

Or for a deployment:

kubectl get deployment <deployment-name> -o jsonpath='{.spec.template.spec.containers[0].resources}' | jq

Step 3: Analyze Actual Memory Usage

If you have metrics-server installed, check real usage:

kubectl top pod <pod-name>

Compare the MEMORY column to your configured limit. If it's close to or at the limit, you've found the problem.

Step 4: Check Events for Patterns

Look at cluster events to see if this is a recurring issue:

kubectl get events --field-selector reason=OOMKilling --sort-by='.lastTimestamp'

Common Solutions

Solution A: Increase Memory Limit

The most common fix is simply giving your container more memory:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: my-app
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"  # Increased from 256Mi
            cpu: "500m"

Apply the change:

kubectl apply -f deployment.yaml

Pro tip: Start with 2x your average memory usage as the limit, then adjust based on monitoring.

Solution B: Fix Memory Leaks

If memory usage grows continuously until OOMKilled, you likely have a memory leak. Steps to debug:

  • Profile the application locally with tools like pprof (Go), VisualVM (Java), or memory_profiler (Python)
  • Check for common patterns:
  • Unbounded caches
  • Event listeners not being removed
  • Large objects held in memory
  • Circular references preventing garbage collection
  • Solution C: Configure JVM Heap (Java Apps)

    For Java applications, ensure the JVM heap fits within the container limit:

    env:
    - name: JAVA_OPTS
      value: "-Xmx384m -Xms256m"
    resources:
      limits:
        memory: "512Mi"  # Must be > Xmx + ~100Mi for non-heap

    Rule of thumb: Container limit = Xmx + 128Mi (for metaspace, threads, native memory)

    Solution D: Add Resource Requests

    If you only have limits, add requests too:

    resources:
      requests:
        memory: "256Mi"  # Guaranteed minimum
      limits:
        memory: "512Mi"  # Maximum allowed

    Why this matters:

  • Requests = guaranteed resources for scheduling
  • Limits = maximum the container can use before being killed

  • Understanding Requests vs Limits

    QoS Classes

    How you set requests and limits determines your pod's Quality of Service class:

    For production workloads, aim for Guaranteed or Burstable with reasonable limits.


    Practice This Scenario

    Start the Pod Evicted Challenge →

    In this hands-on challenge, you'll:

  • Investigate why a pod keeps crashing
  • Understand the difference between requests and limits
  • Fix the configuration to achieve stable operation

  • Prevention Tips

  • Always set resource limits - Never run production workloads without limits
  • Monitor memory trends - Use Prometheus + Grafana to catch issues before OOM
  • Set up alerts - Alert on containers approaching 80% of their memory limit
  • Load test - Profile memory usage under realistic load before deploying
  • Use Vertical Pod Autoscaler - Let VPA recommend appropriate resource values
  • Example VPA Configuration

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-app-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
      updatePolicy:
        updateMode: "Auto"

    Related Articles

  • Kubernetes Resource Limits and Requests Explained
  • Debugging CrashLoopBackOff in Kubernetes
  • Understanding Kubernetes Probes

  • Summary

    OOMKilled errors happen when your container exceeds its memory limit. The fix is usually straightforward:

  • Diagnose with kubectl describe pod
  • Measure actual usage with kubectl top pod
  • Adjust the memory limit to match real needs
  • Monitor to prevent future occurrences
  • Remember: it's better to set slightly higher limits than to have your application constantly restarting. But don't go overboard—wasted resources mean higher costs and less efficient cluster utilization.

    Written by

    Paul Brissaud

    Paul Brissaud

    Paul Brissaud is a DevOps / Platform Engineer and the creator of Kubeasy. He believes Kubernetes education is often too theoretical and that real understanding comes from hands-on, failure-driven learning.

    Related Articles