Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save qlyoung/abd217f977399003ba0cc277feca2af9 to your computer and use it in GitHub Desktop.
Save qlyoung/abd217f977399003ba0cc277feca2af9 to your computer and use it in GitHub Desktop.
Some notes on issues faced when using CPU pinning within a docker container

Challenges with running CPU pinned processes within Docker

Programs that want to bind themselves to free cores present some challenges
when run within Docker.

CPU pinning allows a program to request it be assigned exclusively to a core or set of cores This improves cache locality and other factors that become relevant for extremely CPU bound processes, especially in combination with other configurations that prevent the scheduler from assigning other processes to that core.

A common operation in such programs is to look for "free" CPU cores with
various heuristics. One such heuristic is a check that no other process is
bound to the core. This information is available via cpuset values in procfs.
But because /proc is not bind mounted by default in Docker, if a process pins
itself to CPU 0 in container A, a process looking at /proc in container B will
not see this reflected in its procfs (it will be reflected in A's procfs,
though). This may also be the case when one of the pinned programs is running
on the host and the other is in a container, but I haven't tried this. I hope
the answer is yes, because otherwise that would constitute a sandbox escape /
information leak. Containerized processes should not be able to learn that
other processes are bound to certain cores. In this aspect Docker is behaving
(in my opinion) correctly.

The key takeaway is that the system call to pin a process to a core is
successful but the resultant state is not reflected in the procfs of other
containers.

My first thought to solve this was to simply perform core isolation at the
container level. Docker has a flag that can be used to restrict a container to
a certain set of CPUs (--cpuset-cpus). If we can restrict container A to
cores 0,1 and container B to cores 2,3, then our heuristics will work again;
each container has 2 dedicated cores, so provided nothing on the host is bound
to those cores and procfs shows only those 2 cores, procfs becomes accurate and
we can determine which cores do not have bound processes. Unfortunately this
does not work. While --cpuset-cpus does work, in the sense that only 2 cpus
are available to the container, procfs still reflects all host cores. If
the host has 8 cpus and you use --cpuset-cpus 5,6 you will still see all 8
within the container. However, if you now try to pin to core 0, the system
call will fail. See for yourself:

# docker run -it --cpuset-cpus 4,5 467c321fce69 bash    
    
root@b4f35b17820a:/# lscpu | grep "CPU(s)"    
CPU(s):              8    
On-line CPU(s) list: 0-7    
NUMA node0 CPU(s):   0-7    
root@b4f35b17820a:/#     
root@b4f35b17820a:/# for i in `seq 0 7`; do taskset -c $i echo "hi"; done    
taskset: failed to set pid 28's affinity: Invalid argument    
taskset: failed to set pid 29's affinity: Invalid argument    
taskset: failed to set pid 30's affinity: Invalid argument    
taskset: failed to set pid 31's affinity: Invalid argument    
hi    
hi    
taskset: failed to set pid 34's affinity: Invalid argument    
taskset: failed to set pid 35's affinity: Invalid argument    

Unfortunately it seems the only available solutions at this point are either to
bind mount procfs - which works, but is really undesirable given the contents
of /proc - or to modify your program to recover from a failed pinning attempt,
look for the next "free" cpu, and try again. In combination with
--cpuset-cpus, it should eventually find out which cores have been assigned
to the container. This can also be done outside the program with the above
taskset loop until it succeeds. Of course, this only helps if the program has
the ability to turn its own pinning functionality off, or if you can modify it
to do so - in which case, it's cleaner to just implement the above pinning
logic in the program.

Relevant Docker issue: moby/moby#20770

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment