Skip to content

Instantly share code, notes, and snippets.

@raunakkathuria
Last active July 18, 2020 03:50
Show Gist options
  • Save raunakkathuria/afe70c08683c9fdba6b0235085c1c2bf to your computer and use it in GitHub Desktop.
Save raunakkathuria/afe70c08683c9fdba6b0235085c1c2bf to your computer and use it in GitHub Desktop.
Docker - Underlying technologies

Docker internals

Underlying technologies

To understand Docker completely, you need to first understand the underlying technologies that make it possible. To understand the technology completely, you first need to understand the many pieces that make it all possible. This blog will mainly cover about:

  • Namespace
  • cgroups
  • Union File System
  • libcontainer

Namespace

Definition from Wikipedia

Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple spaces. Examples of such resources are process IDs, hostnames, user IDs, file names, and some names associated with network access, and interprocess communication.

Namespaces are a fundamental aspect of containers on Linux. They provide isolation of global resources between processes, so it's basically a way to limit what process can see. This isolation is important for containers to work.

Example

Let's see how it works with an example.

unshare - Run a program with some namespaces unshared from the parent.

$ unshare -h
Usage:
 unshare [options] [<program> [<argument>...]]

Run a program with some namespaces unshared from the parent.

Options:
 -m, --mount[=<file>]      unshare mounts namespace
 -u, --uts[=<file>]        unshare UTS namespace (hostname etc)
 -i, --ipc[=<file>]        unshare System V IPC namespace
 -n, --net[=<file>]        unshare network namespace
 -p, --pid[=<file>]        unshare pid namespace
 ...

Create a new UTS (Unix Time Sharing) namespace shell

root@db9326789cbc:/$ unshare -u /bin/sh # -u stands for UTS namespace; unshare -h
$ hostname child # set hostname on new UTS namespace
$ hostname
child
$ exit
$ hostname # it does not change anything on parent host
parent

Let's check the process tree, the new namespace is assigned a different PID (1385) and it's parent ID is the main shell parent ID (1 in this case)

root@db9326789cbc:/$ unshare -u /bin/sh # create a new shell in new UTS namespace
$ ps -ef --forest # inside new UTS
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 05:09 pts/0    00:00:00 /bin/bash
root      1385     1  0 06:14 pts/0    00:00:00 /bin/sh # n
root      1388  1385  0 06:14 pts/0    00:00:00  \_ ps -ef --forest
$ exit

You can check the namespace entry by check /proc/[pid]/ns

root@db9326789cbc:/home/tutorial ls -l /proc/self/ns/uts
lrwxrwxrwx 1 root root 0 Aug 20 06:37 /proc/self/ns/uts -> 'uts:[4026533163]' # parent UTS

root@db9326789cbc:/home/tutorial$ unshare -u /bin/sh
$ ls -l /proc/self/ns/uts
lrwxrwxrwx 1 root root 0 Aug 20 06:50 /proc/self/ns/uts -> 'uts:[4026533147]' # child UTS, separate from parent

cgroup

Definition from Wikipedia

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

The primary design goals of cgroups is to provide a unified interface to many different use cases, from controlling single processes to full operating system-level virtualization (as provided by OpenVZ, Linux-VServer or LXC, for example). Cgroups provides:

  • Resource limiting: groups can be set to not exceed a configured memory limit, which also includes the file system cache
  • Prioritization: some groups may get a larger share of CPU utilization[10] or disk I/O throughput
  • Accounting: measures a group's resource usage, which may be used, for example, for billing purposes
  • Control: freezing groups of processes, their check-pointing and restarting

Example

Let's see how it works with an example.

Install the necessary packages

On Ubuntu or Debian, type:

apt-get install libcgroup1 cgroup-tools

Creating cgroups and moving processes

A cgroup filesystem initially contains a single root cgroup, '/', which all processes belong to. A new cgroup is created by creating a directory in the cgroup filesystem:

$ mkdir /sys/fs/cgroup/memory/mg1

Limit the memory for anything running under the cgroup mg1 to 20MB:

root@dd3d48548fdb:/home/tutorial$ echo 20000000 | tee /sys/fs/cgroup/memory/mg1/memory.limit_in_bytes

A process may be moved to this cgroup by writing its PID into the cgroup's cgroup.procs file:

echo [PID] > /sys/fs/cgroup/memory/mg1/cgroup.procs

You can verify the cgroup of PID by:

$ ps -o cgroup [PID]

Note: if a task exceeds its defined limits, the kernel will intervene and, in some cases, kill that task.

You can also use utilities provided in libcgroup package to simplify the above steps.

$ sudo cgcreate -g memory:mg1 # create memory cgroup
$ echo 50000000 | sudo tee
 ↪/sys/fs/cgroup/memory/mg1/memory.limit_in_bytes # assign memory size
$ sudo cgexec -g memory:mg1 ~/test.sh # run the script under mg1 cgroup
$ ps -o cgroup [PID] # verify
$ sudo cgdelete memory:mg1 # clean up and remove the cgroup

Union File System

Union file systems, or UnionFS, are file systems that operate by creating layers, making them very lightweight and fast. Docker Engine uses UnionFS to provide the building blocks for containers.

Docker Images are actually just multiple Union File Systems stacked on top of each other!

container-layers

Image source: https://docs.docker.com/storage/storagedriver/

References

Official docker docs

https://docs.docker.com/v17.09/engine/userguide/storagedriver/imagesandcontainers/

Others (basic overview though)

https://www.terriblecode.com/blog/how-docker-images-work-union-file-systems-for-dummies/

https://medium.com/@paccattam/drooling-over-docker-2-understanding-union-file-systems-2e9bf204177c

libcontainer and lxc

Docker Engine combines the namespaces, control groups, and UnionFS into a wrapper called a container format. The default container format is libcontainer.

Docker 0.9 introduced the libcontainer and before that lxc was used for containers.

libcontainer

Image source: https://www.docker.com/blog/docker-0-9-introducing-execution-drivers-and-libcontainer/

It's because of libcontainer, Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package. This drastically reduces the number of moving parts, and insulates Docker from the side-effects introduced across versions and distributions of LXC.

You can read about this more here - https://www.docker.com/blog/docker-0-9-introducing-execution-drivers-and-libcontainer/

References

LXC - https://www.linuxjournal.com/content/everything-you-need-know-about-linux-containers-part-ii-working-linux-containers-lxc

LXC vs Docker - https://www.upguard.com/articles/docker-vs-lxc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment