Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save draganHR/03b8a91daf60fbd7bb189b38ec2fe80b to your computer and use it in GitHub Desktop.
Save draganHR/03b8a91daf60fbd7bb189b38ec2fe80b to your computer and use it in GitHub Desktop.

Running systemd in a non-privileged container

Docker

https://developers.redhat.com/blog/2016/09/13/running-systemd-in-a-non-privileged-container#the_quest

Systemd expects /sys/fs/cgroup filesystem is mounted. It can work with it being mounted read/only.

...

We can just volume mount in /sys/fs/cgroup into the container using -v /sys/fs/cgroup:/sys/fs/cgroup:ro

...

/run on a tmpfs --- docker upstream supports the docker run --tmpfs /run ... command. This means that you can specify a volume mounted on /run as a tmpfs.

...

Systemd does not exit on sigterm. Systemd defines that shutdown signal as SIGRTMIN+3, docker upstream should send this signal when user does a docker stop.

...

Alternative stop signals --- docker upstream supports a stop-signal option. docker run --stop-signal SIGRTMIN+3 .... The docker upstream build also supports a STOPSIGNAL directive. Below I have included a Dockerfile that I use to define a container that will run httpd as a service using systemd as pid 1.

...

This means you should be able to get systemd running inside of a container without --privileged by executing.

docker run -d --tmpfs /tmp --tmpfs /run -v /sys/fs/cgroup:/sys/fs/cgroup:ro httpd

Dockerfile example

FROM         fedora:24
ENV container docker
RUN dnf -y install httpd; dnf clean all; systemctl enable httpd
STOPSIGNAL SIGRTMIN+3
EXPOSE 80
CMD [ "/sbin/init" ]

k8s

Mounting /sys/fs/cgroup and tmpfs paths:

    spec:
      containers:
      - name: "???"
        volumeMounts:
        - mountPath: /sys/fs/cgroup
          name: sys-fs-cgroup
          readOnly: true
        - mountPath: /tmp
          name: tmp
          subPath: tmp
        - mountPath: /run
          name: tmp
          subPath: run
      volumes:
      - name: sys-fs-cgroup
        hostPath:
          path: /sys/fs/cgroup
          type: Directory
      - name: tmp
        emptyDir:
          medium: Memory

In case changing unsafe sysctl (https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#enabling-unsafe-sysctls), is required, e.g. net.core.somaxconn:

      securityContext:
        sysctls:
        - name: net.core.somaxconn
          value: "1024"

Unsafe sysctls are enabled on a node-by-node basis with a flag of the kubelet; for example:

kubelet --allowed-unsafe-sysctls 'kernel.msg*,net.core.somaxconn' ...

For Minikube, this can be done via the extra-config flag:

minikube start --extra-config="kubelet.allowed-unsafe-sysctls=kernel.msg*,net.core.somaxconn" ...

Changing STOPSIGNAL for an existing image in local repo

Dockerfile:

FROM localhost:5000/myimage
STOPSIGNAL SIGRTMIN+3
docker build .
docker tag <new-image-id> localhost:5000/myimage
docker push localhost:5000/myimage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment