To understand Docker completely, you need to first understand the underlying technologies that make it possible. To understand the technology completely, you first need to understand the many pieces that make it all possible. This blog will mainly cover about:
- Namespace
- cgroups
- Union File System
- libcontainer
Definition from Wikipedia
Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple spaces. Examples of such resources are process IDs, hostnames, user IDs, file names, and some names associated with network access, and interprocess communication.
Namespaces are a fundamental aspect of containers on Linux. They provide isolation of global resources between processes, so it's basically a way to limit what process can see. This isolation is important for containers to work.
Let's see how it works with an example.
unshare - Run a program with some namespaces unshared from the parent.
$ unshare -h
Usage:
unshare [options] [<program> [<argument>...]]
Run a program with some namespaces unshared from the parent.
Options:
-m, --mount[=<file>] unshare mounts namespace
-u, --uts[=<file>] unshare UTS namespace (hostname etc)
-i, --ipc[=<file>] unshare System V IPC namespace
-n, --net[=<file>] unshare network namespace
-p, --pid[=<file>] unshare pid namespace
...
Create a new UTS (Unix Time Sharing) namespace shell
root@db9326789cbc:/$ unshare -u /bin/sh # -u stands for UTS namespace; unshare -h
$ hostname child # set hostname on new UTS namespace
$ hostname
child
$ exit
$ hostname # it does not change anything on parent host
parent
Let's check the process tree, the new namespace is assigned a different PID (1385) and it's parent ID is the main shell parent ID (1 in this case)
root@db9326789cbc:/$ unshare -u /bin/sh # create a new shell in new UTS namespace
$ ps -ef --forest # inside new UTS
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:09 pts/0 00:00:00 /bin/bash
root 1385 1 0 06:14 pts/0 00:00:00 /bin/sh # n
root 1388 1385 0 06:14 pts/0 00:00:00 \_ ps -ef --forest
$ exit
You can check the namespace entry by check /proc/[pid]/ns
root@db9326789cbc:/home/tutorial ls -l /proc/self/ns/uts
lrwxrwxrwx 1 root root 0 Aug 20 06:37 /proc/self/ns/uts -> 'uts:[4026533163]' # parent UTS
root@db9326789cbc:/home/tutorial$ unshare -u /bin/sh
$ ls -l /proc/self/ns/uts
lrwxrwxrwx 1 root root 0 Aug 20 06:50 /proc/self/ns/uts -> 'uts:[4026533147]' # child UTS, separate from parent
Definition from Wikipedia
cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.
The primary design goals of cgroups is to provide a unified interface to many different use cases, from controlling single processes to full operating system-level virtualization (as provided by OpenVZ, Linux-VServer or LXC, for example). Cgroups provides:
- Resource limiting: groups can be set to not exceed a configured memory limit, which also includes the file system cache
- Prioritization: some groups may get a larger share of CPU utilization[10] or disk I/O throughput
- Accounting: measures a group's resource usage, which may be used, for example, for billing purposes
- Control: freezing groups of processes, their check-pointing and restarting
Let's see how it works with an example.
Install the necessary packages
On Ubuntu or Debian, type:
apt-get install libcgroup1 cgroup-tools
A cgroup
filesystem initially contains a single root cgroup
, '/'
, which all processes belong to. A new cgroup
is created by creating a directory in the cgroup
filesystem:
$ mkdir /sys/fs/cgroup/memory/mg1
Limit the memory for anything running under the cgroup mg1
to 20MB:
root@dd3d48548fdb:/home/tutorial$ echo 20000000 | tee /sys/fs/cgroup/memory/mg1/memory.limit_in_bytes
A process may be moved to this cgroup
by writing its PID
into the cgroup's cgroup.procs
file:
echo [PID] > /sys/fs/cgroup/memory/mg1/cgroup.procs
You can verify the cgroup
of PID by:
$ ps -o cgroup [PID]
Note: if a task exceeds its defined limits, the kernel will intervene and, in some cases, kill that task.
You can also use utilities provided in libcgroup
package to simplify the above steps.
$ sudo cgcreate -g memory:mg1 # create memory cgroup
$ echo 50000000 | sudo tee
↪/sys/fs/cgroup/memory/mg1/memory.limit_in_bytes # assign memory size
$ sudo cgexec -g memory:mg1 ~/test.sh # run the script under mg1 cgroup
$ ps -o cgroup [PID] # verify
$ sudo cgdelete memory:mg1 # clean up and remove the cgroup
Union file systems, or UnionFS, are file systems that operate by creating layers, making them very lightweight and fast. Docker Engine uses UnionFS to provide the building blocks for containers.
Docker Images are actually just multiple Union File Systems stacked on top of each other!
Image source: https://docs.docker.com/storage/storagedriver/
References
Official docker docs
https://docs.docker.com/v17.09/engine/userguide/storagedriver/imagesandcontainers/
Others (basic overview though)
https://www.terriblecode.com/blog/how-docker-images-work-union-file-systems-for-dummies/
https://medium.com/@paccattam/drooling-over-docker-2-understanding-union-file-systems-2e9bf204177c
Docker Engine combines the namespaces, control groups, and UnionFS into a wrapper called a container format. The default container format is libcontainer
.
Docker 0.9 introduced the libcontainer
and before that lxc
was used for containers.
Image source: https://www.docker.com/blog/docker-0-9-introducing-execution-drivers-and-libcontainer/
It's because of libcontainer
, Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package. This drastically reduces the number of moving parts, and insulates Docker from the side-effects introduced across versions and distributions of LXC.
You can read about this more here - https://www.docker.com/blog/docker-0-9-introducing-execution-drivers-and-libcontainer/
References
LXC vs Docker - https://www.upguard.com/articles/docker-vs-lxc