Part 2: Docker Containers and graceful shutdown


The previous chapter was difficult, and I assumed things would get easier along the way. Little did I know. Instead of discussing the actual graceful shutdown topic, this post will focus more on the overview of the modern state of containers application.

If you’ve been developing and deploying web applications recently, you’ve likely stumbled upon containerization. It’s a very convenient way to package and ship your applications in a reproducible, compact, and quite simple fashion. You write down a Dockerfile, defining steps to install required dependencies, your CI builds the image, it then gets shipped to some staging environment and then reaches your holy grail production cluster (running the Kubernetes, Rancher, Nomad, or other orchestration platform of choice).

If you have Docker installed you may become a little bit of the orchestrator yourself:

docker container run --detach --publish 18999:80 --name my-nginx nginx
docker container stop my-nginx
docker container remove my-nginx

You’ve just created a running Nginx container, gracefully shut it down, and removed it. Basically did a thing your Kubernetes cluster does all day (but not using the Docker CLI, of course).

Docker (and) containers

If you still remember, we’re exploring the topic of graceful shutdown, and what we’re interested in is what happens within containers. That’s quite a challenging topic to explore as it is constantly evolving. Docker has been here for more than 10 years and has undergone many changes in its internal architecture and usage by other platforms.

As of now, it is not technically correct to put an equality sign between Docker and the container technology for someone who’s been using Docker technology for all these years.

To understand what I’m talking about, let’s take a look at the architecture of the Docker software that may run on your client machine (I took it from the website as the most extensive picture I could find):

Containers architecture

From here, you can see that containerization and orchestration is very layered with a bunch of moving stuff:

Kernel sandboxing capabilities

Containers on Linux rely upon kernel’s cGroups and namespaces, providing capabilities to isolate and impose limits for processes. Those are system-level APIs allowing you to create independent process hierarchies (so that each container can have it’s own PID 1), user/group segregation (each container may have its own matching users that do not conflict with each other), memory and CPU allocation policies (to control how much resources specific containers may take) and some other ones.

OCI-specification compliant low-level runtime

The low-level details of how to start and stop containers are delegated to a separate layer. Not to go into a lot of details (that may remind you of OpenAI legal entities relationships) there’s an open governance structure called the Open Container Initiative that develops a standard of what is container and how it should be run, packaged, and distributed.

That’s the part we’re mostly interested in, it’s standard de-facto implementation is called runc, written in Go, and contains OCI spec parser/validator and actual wrappers around aforementioned cGroups and Namespacing capabilities in Linux.

There are some other implementations (of course they are), most notably:

In general, this layer task is to run the isolated process (container) and allow it to stop it.

Shims on top of the container runners

If runc is used to spin up processes, we also need to communicate with them (e.g. track the state, attach to pTTY, read logs) - for this a containerd-shim exists. Each docker container has an associated shim process running.

For example, here’s how my ps axfo pid,ppid,command sees the docker container running:

1194531       1 /usr/bin/containerd-shim-runc-v2 -namespace moby -id bfb8dfa3b220b11d4849af8f28e7fdd725154d4c5445f9340e42159b092f5520 -address /run/containerd/containerd.sock
1194578 1194531  \_ redis-server *:6379

Container higher-level OCI-compatible runtime.

A runtime built on top of a lower level one (e.g. RunC) with some extra features, like storage and networking handling.

Examples include:

Actual engines

Something that wraps the whole stuff together could be Docker engine, Rancher, Mesos, K8S.

Following the white rabbit

Given the complexity of container internals, let’s focus on something simple. Let’s discover how the docker stop command works and what is required to stop the running container gracefully.

$ docker version

Client: Docker Engine - Community
 Version:           25.0.4
 API version:       1.44
 OS/Arch:           linux/arm64

Server: Docker Engine - Community
  Version:          25.0.4
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.8
  Version:          1.6.28
  Version:          1.1.12
  Version:          0.19.0
uname -a

Linux home 5.10.110-rockchip-rk3588 #23.02.2 SMP Fri Feb 17 23:59:20 UTC 2023 aarch64 GNU/Linux

Let’s follow the example from the beginning of the article:

docker container run --detach --publish 18999:80 --name my-nginx nginx
docker container stop my-nginx

We need a lot of patience here, as following the codebase is not an easy task due to many layers of abstraction, gRPC calls between components and many go-lang interfaces that do not make following the exact code easily.

Well, that’s it. It’s quite anti-climactic compared to the journey we’ve had before following the Linux kernel. We’ve gone through a bunch of codebases, and it simply results in the Unix signal being sent to the underlying container process!

The way of signal delivery may differ: for example, it is possible to use the systemd as the group manager for runs, and it means that SystemD will be responsible for running the containerized process. For this case, containerd/runc will use the DBus instead of direct process manipulation to stop the running unit. This setting, as far as I understand, is going to become the default for the K8S, but I will talk about K8S in the next post.


Special mention

I can not but recommend an amazing blog by Ivan Belichko that goes in huge details about container internals. When I’ve seen his articles I thought about quitting this whole post, as there are not more details I could add to his content: