Skip to content

Instantly share code, notes, and snippets.

@clemenko
Last active May 13, 2024 19:30
Show Gist options
  • Save clemenko/18061e6b040cd2baffac11140c0c0680 to your computer and use it in GitHub Desktop.
Save clemenko/18061e6b040cd2baffac11140c0c0680 to your computer and use it in GitHub Desktop.

Multus

update rke2 config

aka install
add the following to the config.yaml from https://docs.rke2.io/install/network_options#using-multus

# /etc/rancher/rke2/config.yaml
cni:
- multus
- canal

to air gap pull rancher/hardened-multus-cni:v4.0.2-build20230811

valdiate install

validate with kubectl get pods -A | grep -i multus-ds

create macvlan config

From https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/quickstart.md#storing-a-configuration-as-a-custom-resource

create NetworkAttachmentDefinition for local network.

cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.1.0/24",
        "rangeStart": "192.168.1.200",
        "rangeEnd": "192.168.1.216"
      }
    }'
EOF

run test pod

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: samplepod
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-conf
spec:
  containers:
  - name: samplepod
    command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine
EOF

get network config from test pod

kubectl exec -it samplepod -- ip a

Moar Fun

Good article : https://devopstales.github.io/kubernetes/multus/

for fun

DHCP anyone? Keep in mind that nohup /opt/cni/bin/dhcp daemon & needs to be running on the control node for DHCP to be passing into the pod.

cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-dhcp
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": { "type": "dhcp" }
    }'
EOF

and

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: dhcp
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-dhcp
spec:
  containers:
  - name: dhcp
    command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine
EOF

get ip kubectl exec -it dhcp -- ip a and now ping it from an external device.

Or nginx

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-dhcp
spec:
  containers:
  - name: nginx
    image: nginx
EOF

And we can check for the 192.168.1.0/24 address with kubectl describe pod nginx

ipvlan on ubuntu with single nic

cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ipvlan-def
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "ipvlan",
      "master": "enp1s0",
      "mode": "l2",
      "ipam": { "type": "static" }
    }'
EOF


cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  annotations:
    k8s.v1.cni.cncf.io/networks: '[{ "name": "ipvlan-def", "ips": [ "192.168.1.202/24" ] }]'
spec:
  containers:
  - name: nginx
    image: nginx
EOF

for @technotim

@clemenko
Copy link
Author

Here is the "script" I use to test.


echo -e "[keyfile]\nunmanaged-devices=interface-name:cali*;interface-name:flannel*" > /etc/NetworkManager/conf.d/rke2-canal.conf; 

mkdir -p /etc/rancher/{rke2,k3s}/

cat << EOF >> /etc/sysctl.conf
# SWAP settings
vm.swappiness=0
vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
vm.max_map_count = 262144

# Have a larger connection range available
net.ipv4.ip_local_port_range=1024 65000

# Increase max connection
net.core.somaxconn=10000

# Reuse closed sockets faster
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=15

# The maximum number of "backlogged sockets".  Default is 128.
net.core.somaxconn=4096
net.core.netdev_max_backlog=4096

# 16MB per socket - which sounds like a lot,
net.core.rmem_max=16777216
net.core.wmem_max=16777216

# Various network tunables
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_max_tw_buckets=400000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_wmem=4096 65536 16777216

# ARP cache settings for a highly loaded docker swarm
net.ipv4.neigh.default.gc_thresh1=8096
net.ipv4.neigh.default.gc_thresh2=12288
net.ipv4.neigh.default.gc_thresh3=16384

# ip_forward and tcp keepalive for iptables
net.ipv4.tcp_keepalive_time=600
net.ipv4.ip_forward=1

# monitor file system events
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
EOF
sysctl -p

useradd -r -c "etcd user" -s /sbin/nologin -M etcd -U;  

echo -e "#profile: cis-1.23\ncni:\n- multus\n- canal\nselinux: true\nsecrets-encryption: true\nwrite-kubeconfig-mode: 0600\nkube-controller-manager-arg:\n- bind-address=127.0.0.1\n- use-service-account-credentials=true\n- tls-min-version=VersionTLS12\n- tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384\nkube-scheduler-arg:\n- tls-min-version=VersionTLS12\n- tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384\nkube-apiserver-arg:\n- tls-min-version=VersionTLS12\n- tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384\n- authorization-mode=RBAC,Node\n- anonymous-auth=false\nkubelet-arg:\n- protect-kernel-defaults=true\n- read-only-port=0\n- authorization-mode=Webhook\n- streaming-connection-idle-timeout=5m" > /etc/rancher/rke2/config.yaml

curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL=v1.26 sh - ; systemctl enable --now rke2-server.service


echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml PATH=$PATH:/var/lib/rancher/rke2/bin" >> ~/.bashrc
ln -s /var/run/k3s/containerd/containerd.sock /var/run/containerd/containerd.sock
source ~/.bashrc

enjoy

@clemenko
Copy link
Author

For cilium, multus worked as expected with https://docs.rke2.io/install/network_options#using-multus-with-cilium . The net-attach-def and pod annotations didn't change.

@b3rs3rk
Copy link

b3rs3rk commented May 11, 2024

I'm doing the same thing, multus with cilium on RKE2 1.28. Can't get the pod that spins up to recognize the annotation and plumb the extra interface. Also using cis profile (just "cis" now), but not all of the hardening settings you've applied. Which usually only makes things worse. Were there any gotchas you found along the way? Do you feel like any of the extra sysctls and settings you added in your script had any relevant effects?

@clemenko
Copy link
Author

What does your config.yaml look like? The kernel tuning has be collected over the years. There are a few that are important. net.ipv4.ip_forward=1 and

# SWAP settings
vm.swappiness=0
vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
vm.max_map_count = 262144

Start there. Check that all the pods are happy.

@b3rs3rk
Copy link

b3rs3rk commented May 11, 2024

@clemenko Thanks for the response. It was nothing that basic. When Cilium gets installed during RKE2 deployment, the default behavior is to act as an exclusive CNI in the Helm chart values. So when the cilium pod was starting up on the nodes, it mounted /etc/cni/net.d/ and renamed all other CNI configs to make them ineligible so that there is no chance pod startups use another CNI. Long story short, it was renaming my 00-multus.conf to a .bak file and continued to do so if I tried to rename it back while the cilium pod was still running.

I had to deploy a cilium HelmChartConfig manifest into the static folder on one of my nodes with...

cni:
  exclusive: false

...before it would stop. As soon as it stopped changing out the multus conf file, I was able to deploy ipvlan interfaces.

@clemenko
Copy link
Author

Ah.. makes sense. Just curious why you are using Cilium? What features do you need over Canal?

@b3rs3rk
Copy link

b3rs3rk commented May 13, 2024

Just testing the next big thing in my lab, really. Originally, I picked cilium because it might have coexisted with FirewallD on RHEL. Found out pretty quickly it still didn't work. Cilium was still creating chains via iptables, and FirewallD likes to step all over them. I thought Cilium used eBPF for everything if you disable kube proxy. But that didn't appear to be the case for me. Might have a config wrong somewhere.

@clemenko
Copy link
Author

my 2 cents. Start with Rocky, remove firewalld, and stick with the default canal. Change only when it truly makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment