[Disclaimer]

I struggled a lot with writing this article, as K8S itself is way too big to grasp quickly and I had to make a lot of compromises on the structure and details for it all to make some sense. The article may feel dragged in some places, jumping from topic to topic yet I had to complete it, so sorry not sorry.

GenAI has been used to generate the diagrams from the Kubernetes, Container-D and RunC codebase.

Specific commit hashes used for analysis:

Kubernetes - 19aa01e61cf ContainerD - cd0f2cc23 RunC - b04031d7

Part 3: Graceful shutdown of K8S pods

Part 1: Signals and Linux
Part 2: Containers and signals
Part 3: Graceful shutdown of K8S pods [you’re here]
Part 4: Celery Graceful Shutdown
Part 5: Prometheus Graceful Shutdown
Part 6: Other frameworks and libraries [WIP]

Introduction

So, we’ve through quite a journey. We’ve be able to track down how signals help us to clean-up resources before the imminent death. We’ve also peeked through the keyhole at how layered the containerization software is right now and were able to find how signals passing works as well!

Now let’s approach the arch-nemesis, the kubernetes, with the same question: how does the graceful shutdown works within it?

Kubernetes itself is quite a difficult beast to overview, with an abundance of abstractions, interfaces, configurations.

At its core, Kubernetes provides a way to:

Deploy containerized applications across a cluster of machines (nodes)
Scale applications up and down horizontally and vertically
Ensure high availability through automatic failover
Handle networking between applications
Manage storage and configuration
Provide abstraction layer on top of cloud providers / raw hardware infrastructure

When you deploy an application to Kubernetes, you’re not directly interacting with containers or individual machines. Instead, you declare the desired state of your application (number of replicas, resource requirements, etc.) and Kubernetes handles the nitty-gritty details of scheduling and maintaining that state.

Let’s dive into the key abstractions of Kubernetes responsible for the graceful shutdown.

Understanding Pods: What We’re Actually Shutting Down

Before diving into the shutdown mechanics, let’s clarify what exactly we’re terminating. A pod in Kubernetes isn’t just a single container - it’s actually a collection of containers working together as a single unit.

Pod Components:

Pause container: The infrastructure container that creates shared namespaces
Application containers: Your actual workload (web server, API, etc.)
Sidecar containers: Supporting services (logging, monitoring, service mesh)

Shared Resources: All containers in a pod share the same network IP, storage volumes, and inter-process communication channels. This means when we “shut down a pod,” we’re coordinating the termination of multiple processes that are dependent on one another.

Kubernetes Architecture for Graceful Shutdown

Kubernetes has lots of components provides comprehensive coverage, but let’s focus on what matters for the pod termination:

Control Plane Components:

API Server: The central communication hub that coordinates all shutdown activities
Controller Manager: Manages higher-level constructs like Deployments and Services
Endpoint Controller: Updates network routing when pods terminate

Worker Node Components:

kubelet: The node agent that manages pod lifecycles and executes termination
kube-proxy: Updates network rules to redirect traffic away from terminating pods
Container Runtime: The actual executor of container start/stop operations

The graceful shutdown process orchestrates across all these components to ensure both proper process termination and network traffic management.

graph TB subgraph "Control Plane" API["API Server
Central management hub"] SCHED["Scheduler
Pod placement decisions"] CM["Controller Manager
Maintains desired state"] ETCD["etcd
Cluster data store"] end API --- SCHED API --- CM API --- ETCD subgraph "Worker Node" KUBELET1["kubelet
Node agent"] PROXY1["kube-proxy
Network proxy"] RUNTIME1["Container Runtime
(containerd/runc)"] subgraph "Pod 1" subgraph "Shared Namespaces" PAUSE1["Pause Container
Infrastructure/Sandbox"] NS1["Network Namespace
IPC Namespace
Storage Volumes"] end PAUSE1 --- NS1 NS1 -.->|"shares namespaces"| APP1["Application
Container"] NS1 -.->|"shares namespaces"| SIDECAR1["Sidecar
Container"] end subgraph "Pod 2" subgraph "Shared Namespaces " PAUSE2["Pause Container
Infrastructure/Sandbox"] NS2["Network Namespace
IPC Namespace
Storage Volumes"] end PAUSE2 --- NS2 NS2 -.->|"shares namespaces"| APP2["Application
Container"] end KUBELET1 --- RUNTIME1 RUNTIME1 ---|"manages"| PAUSE1 RUNTIME1 ---|"manages"| PAUSE2 RUNTIME1 ---|"manages"| APP1 RUNTIME1 ---|"manages"| SIDECAR1 RUNTIME1 ---|"manages"| APP2 end API ---|"Pod lifecycle
commands"| KUBELET1 SCHED ---|"Scheduling
decisions"| API CM ---|"Desired state
monitoring"| API KUBELET1 ---|"Node status
Pod status"| API PROXY1 ---|"Service discovery
Load balancing"| APP1 PROXY1 --- APP2 style API fill:#e1f5fe style SCHED fill:#e1f5fe style CM fill:#e1f5fe style ETCD fill:#e1f5fe style APP1 fill:#f3e5f5 style APP2 fill:#f3e5f5 style SIDECAR1 fill:#f3e5f5 style PAUSE1 fill:#e8f5e8 style PAUSE2 fill:#e8f5e8 style NS1 fill:#fff3cd style NS2 fill:#fff3cd

Network Traffic Management During Shutdown

I’m working mostly on backend systems and web applications are of my uttermost interest. For this case graceful shutdown involves two parallel concerns: actually terminating the application process cleanly AND makig sure network traffic is properly managed.

When a pod shuts down, we need to:

Stop new traffic from reaching the shutting down pod
Allow existing connections to complete gracefully (at least as much as we can)
Coordinate timing between network updates and process termination

How Kubernetes Solves This:

EndpointSlices and Traffic Redirection: Kubernetes uses EndpointSlice objects to track which pods can receive traffic. When a pod begins terminating, the Endpoint Controller immediately updates the EndpointSlice with:

endpoints:
- addresses: ["10.244.1.5"]
  conditions:
    ready: false       # Stop new traffic
    serving: true      # Allow existing connections
    terminating: true  # Pod is shutting down

Load Balancer Coordination: Different components handle the transition differently:

kube-proxy: Updates iptables rules to redirect new connections
Ingress Controllers: May implement custom connection draining logic
External Load Balancers: Need time to detect changes (hence PreStop hooks)
Service Mesh: Advanced traffic management during termination

This networking orchestration happens in parallel with process termination, ensuring zero-downtime deployments when configured properly.

Now that we understand the components and networking coordination involved, let’s examine how to configure graceful shutdown properly.

Configuring Graceful Shutdown in Kubernetes

The magic happens through specific configuration settings in your Kubernetes manifests. Here’s how to configure the key graceful shutdown aspects:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # CRITICAL: Prevents traffic disruption during deployments
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      # CORE GRACEFUL SHUTDOWN SETTING: Time allowed for complete shutdown
      terminationGracePeriodSeconds: 60
      
      containers:
      - name: app
        image: my-web-app:latest
        ports:
        - containerPort: 8080
        
        # PRESTOP HOOK: Coordinates with load balancer updates
        lifecycle:
          preStop:
            exec:
              # We can trigger some app before the SIGTERM comes
              command: ["/bin/sh", "-c", "sleep 15"]
        
        # READINESS PROBE: Controls when pod receives traffic
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          periodSeconds: 5
          failureThreshold: 1  # Fast removal from endpoints when unhealthy

Graceful Shutdown Configuration Breakdown:

terminationGracePeriodSeconds: 60: Total time Kubernetes waits before SIGKILL. This is your shutdown budget - choose based on your longest-running requests plus buffer time.
preStop hook with 15s delay: Critical for web apps. This delay ensures load balancers detect the pod is terminating and stop sending new traffic BEFORE your application receives SIGTERM. This may be the place where we can define custom shutdown logic in case we don’t have a SIGTERM processing logic. The example may include waiting some time to finish processing already running processes, signal the cluster regarding the possible rebalance, persist some in-memory state, etc. Important: If the preStop hook runs longer than the grace period, kubelet requests a small 2-second extension, but if it still doesn’t complete, emergency termination occurs.
maxUnavailable: 0: Forces rolling updates to start new pods before terminating old ones. Combined with readiness probes, this ensures zero-downtime deployments.
readinessProbe with failureThreshold: 1: Fast endpoint removal when app becomes unhealthy. During shutdown, your app should fail readiness checks immediately after receiving SIGTERM to stop new traffic.

Timing Relationship:

t=0s:   preStop hook starts (sleep 15)
t=15s:  SIGTERM sent to application  
t=60s:  SIGKILL sent if still running

Your app has 45 seconds (t=15s to t=60s) to drain connections and shut down gracefully.

With the configuration basics covered, let’s dive into the technical implementation details. We’ll trace the complete journey from command execution to signal delivery.

The Graceful Shutdown Flow: From kubectl to kill()

The following breakdown shows the complete pod termination flow from kubectl delete to process termination:

Phase 1 - Initial Request: User executes kubectl delete pod my-app. API Server marks pod for deletion in etcd and triggers parallel workflows: Controller Manager removes pod from ReplicaSet while Endpoint Controller updates networking.

Phase 2 - Networking Updates: Endpoint Controller marks pod as ready: false, serving: true, terminating: true. Load balancers immediately stop routing new traffic while existing connections continue.

Phase 3 - kubelet Coordination: kubelet receives deletion event, calculates grace period (30s default), and executes PreStop hooks if defined. This provides critical time for load balancer updates.

Phase 4 - Signal Delivery: kubelet initiates the CRI chain: kubelet → containerd → containerd-shim → runc → SIGTERM to process 1 inside each container. For pods with sidecar containers, main containers receive SIGTERM first, then sidecars are terminated in reverse order.

Phase 5 - Connection Draining: Application stops accepting new requests but completes existing ones. If graceful exit succeeds, cleanup begins. If grace period expires, SIGKILL forces termination.

Phase 6 - Cleanup: Process termination status propagates back to kubelet and API Server. kubelet transitions the pod to a terminal phase (Failed or Succeeded), then forcibly removes the pod object from the API server by setting grace period to 0. Endpoint is removed from EndpointSlice and load balancers complete updates.

This orchestrated shutdown maintains web application availability during deployments and scaling operations.

Pod Removal Timeline: Events and Coordination

The following timeline shows the chronological flow of events during pod termination, organized by different system components (time slices are exemplary, useful to understand the magnitude):

timeline title Pod Termination Timeline (60s Grace Period) section Control Plane t=0.0s : kubectl delete pod : API Server marks pod for deletion t=0.1s : Controller Manager removes from ReplicaSet : etcd stores deletion timestamp section Networking Layer t=0.1s : Endpoint Controller marks pod as terminating : ready: false, serving: true, terminating: true t=0.2s : Load balancer stops routing new traffic : kube-proxy updates iptables rules t=0.3s : Service mesh detects endpoint change : Existing connections continue section kubelet & Process t=0.1s : kubelet receives deletion event : Calculate grace period (60s configured) t=0.2s : Execute PreStop hook : E.g. sleep 15 or running some custom script. t=15.0s : PreStop hook completes : Send SIGTERM to application t=15.0s : App stops accepting new requests : Connection draining begins t=15.0s : Readiness probe fails (failureThreshold: 1) : Fast removal from endpoints t=60.0s : Grace period expires : Send SIGKILL if still running t=60.1s : Process terminated : Container stops section Cleanup t=60.2s : kubelet reports pod terminated : API Server removes pod from etcd t=60.3s : Endpoint removed from EndpointSlice : Load balancer updates complete

Pod Removal Sequence: The Complete Flow

The following sequence diagram shows how all the components interact during pod removal, from the initial kubectl delete command to the final signal delivery:

sequenceDiagram participant U as User participant K as kubectl participant API as API Server participant CM as Controller Manager participant EP as Endpoint Controller participant LB as Load Balancer/Service participant CLIENT as Active Clients participant KL as kubelet participant CRI as containerd (CRI) participant SHIM as containerd-shim participant RUNC as runc participant APP as Application Process U->>K: kubectl delete pod my-app K->>API: DELETE /api/v1/pods/my-app API->>API: Mark pod for deletion in etcd par Control Plane Updates API->>CM: Pod deletion event CM->>CM: Remove from ReplicaSet/Deployment and Networking Updates API->>EP: Pod termination event EP->>EP: Mark endpoint as terminating Note over EP: ready: false
serving: true
terminating: true EP->>LB: Update endpoint status LB->>LB: Stop routing new connections end API->>KL: Pod deletion event (watch) KL->>KL: Calculate grace period (30s default) Note over KL,APP: PreStop Hook Phase KL->>APP: Execute PreStop hook (if defined) Note over LB,CLIENT: Load balancer stops new traffic
Existing connections continue APP->>KL: PreStop hook completed KL->>KL: Subtract hook time from grace period Note over KL,APP: Sidecar Coordination Phase KL->>KL: Determine termination order KL->>KL: Wait for main containers first Note over KL,APP: Signal Delivery Phase KL->>CRI: StopContainer(containerID, gracePeriod) CRI->>CRI: Parse stop signal (SIGTERM) CRI->>SHIM: Kill(SIGTERM) SHIM->>RUNC: runtime.Kill(SIGTERM) RUNC->>APP: kill(pid, SIGTERM) Note over APP,CLIENT: Application handles SIGTERM
Stop accepting new requests
Complete existing requests alt Application exits gracefully APP->>CLIENT: Complete active requests CLIENT->>APP: Close connections APP->>RUNC: exit(0) RUNC->>SHIM: Process exited SHIM->>CRI: Container stopped CRI->>KL: Container terminated KL->>API: Pod terminated successfully API->>EP: Remove endpoint EP->>LB: Endpoint removed else Grace period timeout Note over CRI,CLIENT: Force kill after timeout CRI->>SHIM: Kill(SIGKILL) SHIM->>RUNC: runtime.Kill(SIGKILL) RUNC->>APP: kill(pid, SIGKILL) APP->>CLIENT: Connections forcibly closed APP->>RUNC: Process killed RUNC->>SHIM: Process terminated SHIM->>CRI: Container stopped CRI->>KL: Container terminated KL->>API: Pod terminated (forced) API->>EP: Remove endpoint EP->>LB: Endpoint removed end API->>K: Pod deletion confirmed K->>U: pod "my-app" deleted

Key Coordination Points:

Parallel Updates: When the API server marks a pod for deletion, both the Controller Manager and Endpoint Controller are notified simultaneously
Endpoint State Transition: The endpoint is immediately marked as terminating: true, ready: false, serving: true
Load Balancer Coordination: New traffic is redirected while existing connections continue
Connection Draining: Active clients can complete their requests during the grace period

The sequence diagram above illustrates how networking events happen in parallel with process termination to achieve zero-downtime deployments.

Anatomy of Graceful Shutdown: Part 3

[Disclaimer]

Part 3: Graceful shutdown of K8S pods

Introduction

Understanding Pods: What We’re Actually Shutting Down

Kubernetes Architecture for Graceful Shutdown

Network Traffic Management During Shutdown

Configuring Graceful Shutdown in Kubernetes

The Graceful Shutdown Flow: From kubectl to kill()

Pod Removal Timeline: Events and Coordination

Pod Removal Sequence: The Complete Flow

References

[Disclaimer]#

Part 3: Graceful shutdown of K8S pods#

Introduction#

Understanding Pods: What We’re Actually Shutting Down#

Kubernetes Architecture for Graceful Shutdown#

Network Traffic Management During Shutdown#

Configuring Graceful Shutdown in Kubernetes#

The Graceful Shutdown Flow: From kubectl to kill()#

Pod Removal Timeline: Events and Coordination#

Pod Removal Sequence: The Complete Flow#

References#

[Disclaimer]

Part 3: Graceful shutdown of K8S pods

Introduction

Understanding Pods: What We’re Actually Shutting Down

Kubernetes Architecture for Graceful Shutdown

Network Traffic Management During Shutdown

Configuring Graceful Shutdown in Kubernetes

The Graceful Shutdown Flow: From kubectl to kill()

Pod Removal Timeline: Events and Coordination

Pod Removal Sequence: The Complete Flow

References