Skip to content

Instantly share code, notes, and snippets.

@r10r
Last active November 20, 2020 19:00
Show Gist options
  • Save r10r/21e6dd107fb3aeab4ded20baed8582c1 to your computer and use it in GitHub Desktop.
Save r10r/21e6dd107fb3aeab4ded20baed8582c1 to your computer and use it in GitHub Desktop.
crio-lxc documentation

About

This is a wrapper around LXC which can be used as a drop-in container runtime replacement for use by CRI-O.

Installation

For the installation of the runtime see INSTALL.md
For the installation and initialization of a kubernetes cluster see K8S.md

Glossary

  • runtime the crio-lxc binary and the command set that implement the OCI runtime spec
  • container process the process that starts and runs the container using liblxc (crio-lxc-start)

Bugs

Requirements and restrictions

  • Only cgroupv2 unified cgroup hierarchy is supported.
  • A recent kernel > 5.8 is required for full cgroup support.
  • Cgroup resource limits are not implemented yet. This will change soon.
  • runtime spec additionalGroups requires liblxc and go-lxc development version

Configuration

The runtime binary implements flags that are required by the OCI runtime spec,
and flags that are runtime specific (timeouts, hooks, logging ...).

Most of the runtime specific flags have corresponding environment variables. See crio-lxc --help.
The runtime evaluates the flag value in the following order (lower order takes precedence).

  1. cmdline flag from process arguments (overwrites process environment)
  2. process environment variable (overwrites environment file)
  3. environment file (overwrites cmdline flag default)
  4. cmdline flag default

Environment variables

Currently you have to compile to environment file yourself.
To get all available variables

grep EnvVars cmd/*.go | grep -o CRIO_LXC_[A-Za-z_]* | xargs -n1 -I'{}' echo "#{}="

Environment file

The default path to the environment file is /etc/defaults/crio-lxc.
It is loaded on every start of the crio-lxc binary, so changes take immediate effect.
Empty lines and those commented with a leading # are ignored.

A malformed environment will let the next runtime call fail.
In production it's recommended that you replace the environment file atomically.

E.g the environment file /etc/default/crio-lxc could look like this:

#CRIO_LXC_CONTAINER_HOOK=
#CRIO_LXC_CREATE_TIMEOUT=30s
#CRIO_LXC_INIT_CMD=
#CRIO_LXC_START_CMD=
#CRIO_LXC_START_TIMEOUT=30s

CRIO_LXC_APPARMOR=true
CRIO_LXC_CAPABILITIES=true
CRIO_LXC_CGROUP_DEVICES=true
CRIO_LXC_SECCOMP=true

CRIO_LXC_LOG_FILE=/tmp/crio-lxc.log
CRIO_LXC_LOG_LEVEL=info
CRIO_LXC_CONTAINER_LOG_LEVEL=warn

CRIO_LXC_MONITOR_CGROUP=crio-lxc-monitor.slice
CRIO_LXC_RUNTIME_HOOK=/usr/local/bin/crio-lxc-backup.sh
#CRIO_LXC_RUNTIME_HOOK_RUN_ALWAYS=false
#CRIO_LXC_RUNTIME_HOOK_TIMEOUT=

Runtime (security) features

All supported runtime security features are enabled by default.
There following runtime (security) features can optionally be disabled.
Details see crio-lxc --help

  • apparmor
  • capabilities
  • cgroup-devices
  • seccomp

Logging

There is only a single log file for runtime and container process log output.
The log-level for the runtime and the container process can be set independently.

  • a single logfile is easy to rotate and monitor
  • a single logfile is easy to tail (watch for errors / events ...)
  • robust implementation is easy

Log Filtering

Runtime log lines are written in JSON using zerolog.
The log file can be easily filtered with jq.
For filtering with jq you must strip the container process logs with grep -v '^lxc'

E.g Filter show only errors and warnings for runtime create command:

 grep -v '^lxc ' /var/log/crio-lxc.log |\
  jq -c 'select(.cmd == "create" and ( .l == "error or .l == "warn")'

Runtime log fields

Fields that are always present:

  • l log level
  • m log message
  • c caller (source file and line number)
  • cid container ID
  • cmd runtime command
  • t timestamp in UTC (format matches container process output)

Log message specific fields:

  • pid a process ID
  • file a path to a file
  • lxc.config the key of a container process config item
  • env the key of an environment variable

Debugging

Apart from the logfile following resources are useful:

  • Systemd journal for cri-o and kubelet services
  • coredumpctl if runtime or container process segfaults.

Runtime Hook

If a runtime hook is defined, it is executed when the create command returns with an error.
You can use the runtime hook to backup the runtime spec and container process config for further analysis.

The runtime hook executable must

  • not use the standard file descriptors (stdin/stdout/stderr) although they are nulled.
  • not exceeds CRIO_LXC_RUNTIME_HOOK_TIMEOUT or it gets killed.
  • not modify/delete any resources created by the runtime or container process

The runtime hook process environment contains the following variables:

  • CONTAINER_ID the container ID
  • LXC_CONFIG the path to runtime process config
  • RUNTIME_CMD the runtime command which executed the runtime hook
  • RUNTIME_PATH the path to the container runtime directory
  • BUNDLE_PATH the absolute path to the container bundle
  • SPEC_PATH the absolute path to the the JSON runtime spec
  • RUNTIME_ERROR (optional) the error message if the runtime cmd return with error

Example environment of a shell script:

SPEC_PATH=/var/run/containers/storage/overlay-containers/XXX/userdata/config.json
PWD=/
RUNTIME_PATH=/run/crio-lxc/XXX
CONTAINER_ID=XXX
SHLVL=1
RUNTIME_CMD=create
BUNDLE_PATH=/var/run/containers/storage/overlay-containers/XXX/userdata
LXC_CONFIG=/run/crio-lxc/XXX/config
_=/usr/bin/env

cgroups

Enable cgroupv2 unified hierarchy manually:

mount -t cgroup2 none /sys/fs/cgroup

or permanent via kernel cmdline params:

systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all

build dependencies

Install the build dependencies which are required to build the runtime and runtime dependencies.

debian

# liblxc / conmon build dependencies
apt-get install build-essential libtool automake pkg-config \
libseccomp-dev libapparmor-dev libbtrfs-dev \
libdevmapper-dev libcap-dev libc6-dev libglib2.0-dev
# k8s dependencies, tools
apt-get install jq ebtables iptables conntrack

arch linux

# liblxc / conmon build dependencies
pacman -Sy base-devel apparmor libseccomp libpcap btrfs-progs
# k8s dependencies
pacman -Sy conntrack-tools ebtables jq

runtime dependencies

By default everything is installed to /usr/local

lxc (liblxc)

git clone https://github.com/lxc/lxc.git
cd lxc
./autogen.sh
./configure --enable-bash=no --enable-tools=no \
  --enable-commands=no --enable-seccomp=yes \
  --enable-capabilities=yes --enable-apparmor=yes
make install

echo /usr/local/lib > /etc/ld.so.conf.d/local.conf
ldconfig

crio-lxc

make install

The installation prefix environment variable is set to PREFIX=/usr/local by default.
The library source path for pkg-config is set to $PREFIX/lib/pkg-config by default.
You can change that by setting the PKG_CONFIG_PATH environment variable.

E.g to install binaries in /opt/bin but use liblxc from /usr/lib:

PREFIX=/opt PKG_CONFIG_PATH=/usr/lib/pkgconfig make install

Keep in mind that you have to change the INSTALL_PREFIX in the crio install script below.

conmon

git clone https://github.com/containers/conmon.git
cd conmon 
git reset --hard v2.0.2
make clean
make install

cri-o

#!/bin/sh
git clone https://github.com/cri-o/cri-o.git
cd cri-o
git reset --hard origin/release-1.19
make install

PREFIX=/usr/local
CRIO_LXC_ROOT=/run/crio-lxc

# environment for `crio config`
export CONTAINER_CONMON=${PREFIX}/bin/conmon
export CONTAINER_PINNS_PATH=${PREFIX}/bin/pinns
export CONTAINER_DEFAULT_RUNTIME=crio-lxc
export CONTAINER_RUNTIMES=crio-lxc:${PREFIX}/bin/crio-lxc:$CRIO_LXC_ROOT

crio config > /etc/crio/crio.conf

cgroupv2 ebpf

Modify systemd service file to run with full privileges.
This is required for the runtime to set cgroupv2 device controller eBPF.
See cri-o/cri-o#4272

sed -i 's/ExecStart=\//ExecStart=+\//' /usr/local/lib/systemd/system/crio.service
systemctl daemon-reload
systemctl start crio

storage configuration

If you're using overlay as storage driver cri-o may complain that it is not using native diff mode.
Update /etc/containers/storage.conf to fix this.

# see https://github.com/containers/storage/blob/v1.20.2/docs/containers-storage.conf.5.md
[storage]
driver = "overlay"

[storage.options.overlay] 
# see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt, `modinfo overlay`
# [ 8270.526807] overlayfs: conflicting options: metacopy=on,redirect_dir=off
# NOTE: metacopy can only be enabled when redirect_dir is enabled
# NOTE: storage driver name must be set or mountopt are not evaluated,
# even when the driver is the default driver --> BUG ?
mountopt = "nodev,redirect_dir=off,metacopy=off"

HTTP proxy

If you're system is proxied you can add the proxy environment variables to /etc/default/crio

http_proxy="http://myproxy:3128"
https_proxy="http://myproxy:3128"
no_proxy="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,127.0.0.0/8,127.0.0.1,localhost"

kubernetes

The following skript downloads kubernetes v1.19.4 and installs it to /usr/local/bin.
You have to create the kubelet.service and 10-kubeadm.conf before running the script.

#!/bin/sh
# about: installs kubeadm,kubectl and kubelet to /usr/local/bin
# installs systemd service to /etc/systemd/system 


# Upgrade process:
# * change RELEASE and CHECKSUM
# * remove downloaded archive file
# * run this script again

ARCH="linux-amd64"
RELEASE="1.19.4"
ARCHIVE=kubernetes-server-$ARCH.tar.gz
CHECKSUM="fc9de14121af682af167ef99ce8a3803c25e92ef4739ed7eb592eadb30086b2cb9ede51d57816d1c3835f6202753d726eba804b839ae9cd516eff4e94c81c189"
DESTDIR="/usr/local/bin"

[ -e "$ARCHIVE" ] || wget https://dl.k8s.io/v$RELEASE/$FILE

echo "$CHECKSUM $ARCHIVE" | sha512sum -c || exit 1

tar -x -z -f $ARCHIVE -C $DESTDIR --strip-components=3 kubernetes/server/bin/kubectl kubernetes/server/bin/kubeadm kubernetes/server/bin/kubelet
install -v kubelet.service /etc/systemd/system/
install -v -D 10-kubeadm.conf /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload

systemd service

kubelet.service

[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/

[Service]
ExecStart=/usr/local/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

10-kubeadm.conf

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generate at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

kubeadm init

This initializes the kubernetes control-plane.

  • Replace HOSTIP and HOSTNAME variables in cluster-init.yaml and initialize the cluster:
kubeadm init --config cluster-init.yaml -v 5
# for single node cluster remove taint
taint remove kubectl taint nodes --all node-role.kubernetes.io/master-
  • Install a networking plugin (I'm using calico)

cluster-init.yaml

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: {HOSTIP}
  bindPort: 6443
nodeRegistration:
  name: {HOSTNAME}
  criSocket: unix://var/run/crio/crio.sock
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
#  kubeletExtraArgs:
#   v: "5"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
kind: ClusterConfiguration
kubernetesVersion: v1.19.4
apiVersion: kubeadm.k8s.io/v1beta2
apiServer:
  timeoutForControlPlane: 4m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.66.0.0/16
scheduler: {}
controlPlaneEndpoint: "${HOSTIP}:6443"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment