Skip to content

Instantly share code, notes, and snippets.

@c3d
Created November 26, 2020 17:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save c3d/e342ace5084c5e11662ee2f7fef33097 to your computer and use it in GitHub Desktop.
Save c3d/e342ace5084c5e11662ee2f7fef33097 to your computer and use it in GitHub Desktop.
Kubernetes with swap on Fedora - Raw notes
- Upgraded to 19.4, tried to run on the machines I recently upgraded to Fedora
33. I run into an ongoing issue where /dev/zram0 swap is re-enabled, not sure
what does it yet.
+ kubeadm really does not want to init a system without swap. I know this has
been a topic of recent discussion.
[init] Using Kubernetes version: v1.19.4
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
+ I tried to brute force it, but it seems there is some subltety happening
here. Basically, /dev/zram0 keeps reappearing even after I swapoff it to
death. So I tried with "--ignore-preflight-errors=Swap", but I vaguely
remember that did not work too well in the past.
+ After that ran into a problem where the kubelet now wants cgroup v2:
failed to get the kubelet's cgroup: cpu and memory cgroup hierarchy not unified.
cpu: /system.slice, memory: /system.slice/kubelet.service.
Kubelet system container metrics may be missing.
This does not look fatal, but may be the root cause of my crash.
+ After fixing that, the kubelet still fails with:
F1124 16:31:10.962395 5796 server.go:265] failed to run Kubelet: running
with swap on is not supported, please disable swap! or set --fail-swap-on
flag to false. /proc/swaps contained: ...
Of course, /dev/zram0 is back up.
The friend that blocked me earlier is a service called
swap-create@zram0.service. I could disable that, but I want to see if I can
make the thing to work with swap enabled.
+ Editing /etc/systemd/system/multi-user.target.wants/kubelet.service
to add under [Service]:
Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"
+ After that, kubelet complains because it tries to talk to docker and not
crio. Something was broken during the F33 upgrade. Edited the same service
file and added:
Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false --container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock"
Also added under [Unit]:
Wants=crio.service
as suggested by https://github.com/cri-o/cri-o/blob/master/tutorials/kubernetes.md,
which also suggests adding docker.socket, but I don't have that one.
I don't recall doing that earlier, but maybe I had.
+ After that, ran into another message that only shows with the full command
line
cri-o configured with systemd cgroup manager, but did not receive slice as
parent: /kubepods/burstable/pode95d6f5518631c3f14475cf585810
and then plenty of connexion failurs to port 6443:
k8s.io/client-go/informers/factory.go:134: Failed to watch
*v1beta1.RuntimeClass: failed to list *v1beta1.RuntimeClass: Get
"https://192.168.77.55:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0":
dial tcp 192.168.77.55:6443: connect: connection refused
which lead to:
node "shuttle" not found
(presumably because of the above, 192.168.77.55 is shuttle)
The port 6443 is not open on my system according to nmap. It seems to be the
k8s API server.
+ So it turns out that setting Environment in kubelet.service is ignored,
because it ends up overwriting it with the content of file
/etc/sysconfig/kubelet. That makes kubelet stable. Now on to the crio message
about the "cgroup...did not receive slice as parent" message regarding crio.
+ THe new way to do things seems to be to go through the configuration file
specified by --config. The problem is that what is passed here is
/var/lib/config/kubelet.yaml, which is written by someone else.
+ The next blocking error seems to be
E1125 17:39:28.760148 28021 remote_runtime.go:113] RunPodSandbox from
runtime service failed: rpc error: code = Unknown desc = error converting
cgroup memory value from string to int "max": strconv.ParseInt: parsing
"max": invalid syntax
Looking at my crio, it has a suspicious version:
2:1.17.4-1.module_f32+8729+8e6b62f2
Removing and reinstalling fails to find the package. Where did I get that
CRI-O package from? Apparently, it's part of modular Fedora, and there are
CRI-O for multiple versions. Latest is 1.19.
dnf module enable cri-o:1.19.
+ Following message is
error execution phase preflight: docker is required for container runtime:
exec: "docker": executable file not found in $PATH
What? I just installed CRI-O!
Ah, need `systemctl start crio` :-( Can't really get used to install not
starting services.
+ Now creating some containers at last. Now I have
Nov 25 17:55:39 shuttle crio[30802]: time="2020-11-25
17:55:39.338601792+01:00" level=error msg="Container creation error:
time=\"2020-11-25T17:55:39+01:00\" level=error msg=\"this version of runc
doesn't work on cgroups v2\"\n"
This machine has both crun and runc. Grmnbl, historical crap.
dnf remove runc
+ Next run gives me this:
Nov 25 17:59:04 shuttle crio[33090]: time="2020-11-25
17:59:04.208274806+01:00" level=fatal msg="Validating runtime config: runtime
validation: \"runc\" not found in $PATH: exec: \"runc\": executable file not
found in $PATH"
I love it when they complain about a runtime configuration file but don't
tell you where it is. I'm lucky to know that. OK, that points to runc.
Trying an experiment: removing the rpm, reinstalling. It reinstalls runc, and
the configuration file points to it. Uninstalling / reinstalling crun to see
if it patches the configuration files correctly. Nope.
Adding the following in /etc/crio/crio.conf
[crio.runtime.runtimes.crun]
runtime_path = "/usr/bin/crun"
runtime_type = "oci"
runtime_root = "/run/crun"
+ FINALLY, it starts. On the master node. Now need to repeat the operations on
the worker nodes.
+ After that, had to add the same "--ignore-preflight-error=Swap" to the join
command, because, swap.
+ A little bit extra manual twiddling with all the slaves, editing their
/etc/crio/crio.conf to configure it correcty, and then finally I have a VR
system that runs as a Kubernetes slave node, with Jenkins running in a
container. Yay!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment