qlyoung/challenges-with-running-cpu-pinned-processes-within-docker.md

## challenges-with-running-cpu-pinned-processes-within-docker.md

      
    Raw
  

              challenges-with-running-cpu-pinned-processes-within-docker.md
            
          
    Challenges with running CPU pinned processes within Docker

Programs that want to bind themselves to free cores present some challenges

when run within Docker.
CPU pinning allows a program to request it be assigned exclusively to a core
or set of cores This improves cache locality and other factors that become
relevant for extremely CPU bound processes, especially in combination with
other configurations that prevent the scheduler from assigning other processes
to that core.
A common operation in such programs is to look for "free" CPU cores with

various heuristics. One such heuristic is a check that no other process is

bound to the core. This information is available via cpuset values in procfs.

But because /proc is not bind mounted by default in Docker, if a process pins

itself to CPU 0 in container A, a process looking at /proc in container B will

not see this reflected in its procfs (it will be reflected in A's procfs,

though). This may also be the case when one of the pinned programs is running

on the host and the other is in a container, but I haven't tried this. I hope

the answer is yes, because otherwise that would constitute a sandbox escape /

information leak. Containerized processes should not be able to learn that

other processes are bound to certain cores. In this aspect Docker is behaving

(in my opinion) correctly.
The key takeaway is that the system call to pin a process to a core is

successful but the resultant state is not reflected in the procfs of other

containers.
My first thought to solve this was to simply perform core isolation at the

container level. Docker has a flag that can be used to restrict a container to

a certain set of CPUs (--cpuset-cpus). If we can restrict container A to

cores 0,1 and container B to cores 2,3, then our heuristics will work again;

each container has 2 dedicated cores, so provided nothing on the host is bound

to those cores and procfs shows only those 2 cores, procfs becomes accurate and

we can determine which cores do not have bound processes. Unfortunately this

does not work. While --cpuset-cpus does work, in the sense that only 2 cpus

are available to the container, procfs still reflects all host cores. If

the host has 8 cpus and you use --cpuset-cpus 5,6 you will still see all 8

within the container.  However, if you now try to pin to core 0, the system

call will fail. See for yourself:
# docker run -it --cpuset-cpus 4,5 467c321fce69 bash    
    
root@b4f35b17820a:/# lscpu | grep "CPU(s)"    
CPU(s):              8    
On-line CPU(s) list: 0-7    
NUMA node0 CPU(s):   0-7    
root@b4f35b17820a:/#     
root@b4f35b17820a:/# for i in `seq 0 7`; do taskset -c $i echo "hi"; done    
taskset: failed to set pid 28's affinity: Invalid argument    
taskset: failed to set pid 29's affinity: Invalid argument    
taskset: failed to set pid 30's affinity: Invalid argument    
taskset: failed to set pid 31's affinity: Invalid argument    
hi    
hi    
taskset: failed to set pid 34's affinity: Invalid argument    
taskset: failed to set pid 35's affinity: Invalid argument    

Unfortunately it seems the only available solutions at this point are either to

bind mount procfs - which works, but is really undesirable given the contents

of /proc - or to modify your program to recover from a failed pinning attempt,

look for the next "free" cpu, and try again. In combination with

--cpuset-cpus, it should eventually find out which cores have been assigned

to the container. This can also be done outside the program with the above

taskset loop until it succeeds. Of course, this only helps if the program has

the ability to turn its own pinning functionality off, or if you can modify it

to do so - in which case, it's cleaner to just implement the above pinning

logic in the program.
Relevant Docker issue: moby/moby#20770