Skip to content

Instantly share code, notes, and snippets.

@norbjd
Created September 21, 2022 16:06
Show Gist options
  • Save norbjd/1773b2930227d89c347f223f1d9b4ff0 to your computer and use it in GitHub Desktop.
Save norbjd/1773b2930227d89c347f223f1d9b4ff0 to your computer and use it in GitHub Desktop.
Kata errors when running containers in parallel

TL;DR

When running containers in parallel with kata runtime, we are facing many failed to create shim task errors.

Environment

Ubuntu 20.04 VM, with 16 CPU and 64 GB of memory. Scaleway GP1-M instance (https://www.scaleway.com/en/virtual-instances/general-purpose/).

# uname -a
Linux kata-test 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/*-release | grep DISTRIB
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.4 LTS"

# cat /proc/cpuinfo | grep processor | wc -l
16

# cat /proc/meminfo | grep MemTotal
MemTotal:       65857924 kB

Install prerequisites

Prerequisites: containerd, runc, cni, nerdctl, kata.

CONTAINERD_VERSION=1.6.8
RUNC_VERSION=1.1.4
CNI_VERSION=1.1.1
NERDCTL_VERSION=0.23.0
KATA_VERSION=2.5.1

# containerd
wget https://github.com/containerd/containerd/releases/download/v$CONTAINERD_VERSION/containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
tar Cxzvf /usr/local containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
wget -P /usr/local/lib/systemd/system/ https://raw.githubusercontent.com/containerd/containerd/v$CONTAINERD_VERSION/containerd.service
systemctl daemon-reload
systemctl enable --now containerd

# runc
wget https://github.com/opencontainers/runc/releases/download/v$RUNC_VERSION/runc.amd64
chmod u+x runc.amd64
mv runc.amd64 /usr/local/sbin/runc

# cni
wget https://github.com/containernetworking/plugins/releases/download/v$CNI_VERSION/cni-plugins-linux-amd64-v$CNI_VERSION.tgz
mkdir -p /opt/cni/bin
tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v$CNI_VERSION.tgz

# nerdctl
wget https://github.com/containerd/nerdctl/releases/download/v$NERDCTL_VERSION/nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz
tar Cxzvvf /usr/local/bin nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz

# kata
wget https://github.com/kata-containers/kata-containers/releases/download/$KATA_VERSION/kata-static-$KATA_VERSION-x86_64.tar.xz
xzcat kata-static-$KATA_VERSION-x86_64.tar.xz | sudo tar -xvf - -C /
rm -f /usr/local/bin/kata-runtime
rm -f /usr/local/bin/containerd-shim-kata-v2
ln -s /opt/kata/bin/kata-runtime /usr/local/bin
ln -s /opt/kata/bin/containerd-shim-kata-v2 /usr/local/bin

mkdir -p /etc/containerd
cat <<EOF > /etc/containerd/config.toml
[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
        runtime_type = "io.containerd.kata.v2"
EOF

systemctl restart containerd

Tests

If we run containers:

  • without kata runtime (one after the other or in parallel), all containers are running correctly
  • with kata runtime one after the other, all containers are running correctly
  • with kata runtime in parallel, we are facing issues (see below details)

5 containers in parallel

nerdctl pull ubuntu

for i in `seq 5`
do
  nerdctl run --rm --runtime io.containerd.run.kata.v2 ubuntu uname -r &
done

Sometimes it work, but most of time we have errors.

Output:

5.15.63         
5.15.63
FATA[0007] failed to create shim task: open /sys/fs/cgroup/systemd/vc/tasks: no such file or directory: not found
FATA[0007] failed to create shim task: open /sys/fs/cgroup/systemd/vc/tasks: no such file or directory: not found
FATA[0007] failed to create shim task: open /sys/fs/cgroup/systemd/vc/tasks: no such file or directory: not found

20 containers in parallel

nerdctl pull ubuntu

for i in `seq 20`
do
  nerdctl run --rm --runtime io.containerd.run.kata.v2 ubuntu uname -r &
done

We always have many errors (95% of errors, with a new no such file or directory error).

Extract of output:

5.15.63
FATA[0030] failed to create shim task: open /sys/fs/cgroup/systemd/vc/tasks: no such file or directory: not found
FATA[0031] failed to create shim task: open /sys/fs/cgroup/cpuset/vc/cpuset.mems: no such file or directory: not found

During that test, all CPUs are reaching 100% of usage.

30 containers in parallel

nerdctl pull ubuntu

for i in `seq 30`
do
  nerdctl run --rm --runtime io.containerd.run.kata.v2 ubuntu uname -r &
done

We always have many errors (95% of errors, with a new timeout error).

Extract of output:

5.15.63
FATA[0030] failed to create shim task: open /sys/fs/cgroup/systemd/vc/tasks: no such file or directory: not found
FATA[0036] failed to create shim task: Failed to Check if grpc server is working: rpc error: code = DeadlineExceeded desc = timed out connecting to vsock 1541455645:1024: unknown

Note that we have many failed to create shim task: Failed to Check if grpc server is working: rpc error: code = DeadlineExceeded desc = timed out connecting to vsock 1541455645:1024: unknown at the end (around 40 seconds after running the command), but not necessarily only at the end.

During that test, all CPUs are reaching 100% of usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment