Skip to content

Instantly share code, notes, and snippets.

@platu
Last active November 5, 2024 07:05
Show Gist options
  • Save platu/bc5c1d56df2376cf7bd3a2650a8d3e19 to your computer and use it in GitHub Desktop.
Save platu/bc5c1d56df2376cf7bd3a2650a8d3e19 to your computer and use it in GitHub Desktop.

Start UNprivileged Incus containers on top of Open vSwitch in a few steps

C-3PO has to be fixed

The installation process begins with a Debian trixie base installation on the host system or virtual machine.


Install Netplan and OvS and other tools on host

sudo apt -y install netplan.io openvswitch-switch nftables dnsmasq
apt search ^netplan.io$
netplan.io/testing,now 1.1-1 amd64  [installé]
  Declarative network configuration for various backends at runtime
apt search ^openvswitch-switch$
oopenvswitch-switch/testing,now 3.4.0-1 amd64  [installé]
  Open vSwitch switch implementations

Add new switches to host

Note that the IP addresses specified for the enp0s1 interface in the example file must be changed to suit your context. As the host will be a router we cannot rely on automatic IPv6 addressing via DHCP and/or SLAAC.

Add 2 switches to network setup on host

  • c-3po is an access layer switch. Quote from Star Wars:

    "Don't blame me. I'm an interpreter. I'm not supposed to know a power socket from a computer terminal."

  • vlan10 is a switched virtual interface that feeds the host routing table and acts as the default gateway for all containers

Here is a copy of the /etc/netplan/enp0s1.yaml file.

network:
  version: 2
  renderer: networkd
  ethernets:
    enp0s1:
      dhcp4: false
      dhcp6: false
      accept-ra: false
      addresses:
        - 198.18.20.10/23
        - 2001:678:3fc:14::a/64
      routes:
        - to: default
          via: 198.18.20.1
        - to: "::/0"
          via: fe80::14:1
          on-link: true
      nameservers:
        addresses:
          - 172.16.0.2
          - 2001:678:3fc:3::2

  openvswitch: {}

  bridges:
    c-3po:
      openvswitch: {}

  vlans:
    vlan10:
      id: 10
      link: c-3po
      addresses:
        - 192.0.2.1/24
        - fdc0:7a62:a::1/64
        - fe80:a::1/64

Run the following command to apply all the network parameters declared in the /etc/netplan/enp0s1.yaml file.

sudo netplan apply

Finally, check the status of all configured network interfaces.

sudo netplan status
     Online state: online
    DNS Addresses: 172.16.0.2 (compat)
                   2001:678:3fc:3::2 (compat)
       DNS Search: .

●  1: lo ethernet UNKNOWN/UP (unmanaged)
      MAC Address: 00:00:00:00:00:00
        Addresses: 127.0.0.1/8
                   ::1/128

●  2: enp0s1 ethernet UP (networkd: enp0s1)
      MAC Address: b8:ad:ca:fe:00:00 (Red Hat, Inc.)
        Addresses: 198.18.20.10/23
                   2001:678:3fc:14::a/64
                   fe80::baad:caff:fefe:0/64 (link)
    DNS Addresses: 172.16.0.2
                   2001:678:3fc:3::2
           Routes: default via 198.18.20.1 (static)
                   198.18.20.0/23 from 198.18.20.10 (link)
                   2001:678:3fc:14::/64 metric 256
                   fe80::/64 metric 256
                   default via fe80:14::1 metric 1024 (static)

●  4: c-3po other UNKNOWN/UP (networkd: c-3po)
      MAC Address: d6:0c:16:1e:25:46
        Addresses: fe80::38d1:afff:fe0b:3d37/64 (link)
           Routes: fe80::/64 metric 256

●  5: vlan10 other UNKNOWN/UP (networkd: vlan10)
      MAC Address: d6:0c:16:1e:25:46
        Addresses: 192.0.2.1/24
                   fdc0:7a62:a::1/64
                   fe80:a::1/64 (link)
                   fe80::d40c:16ff:fe1e:2546/64 (link)
           Routes: 192.0.2.0/24 from 192.0.2.1 (link)
                   fdc0:7a62:a::/64 metric 256
                   fe80::/64 metric 256
                   fe80:a::/64 metric 256

Turn IPv(4|6) routing on at the kernel level

. Create a new file named /etc/sysctl.d/10-routing.conf.

cat << EOF | sudo tee /etc/sysctl.d/10-routing.conf
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1
net.ipv4.conf.all.log_martians = 1
EOF

. Make it happen !

sudo sysctl --system
* Applique /usr/lib/sysctl.d/10-coredump-debian.conf …
* Applique /etc/sysctl.d/10-routing.conf …
* Applique /usr/lib/sysctl.d/50-default.conf …
* Applique /usr/lib/sysctl.d/50-pid-max.conf …
kernel.core_pattern = core
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.all.log_martians = 1
kernel.sysrq = 0x01b6
kernel.core_uses_pid = 1
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.c-3po.rp_filter = 2
net.ipv4.conf.enp0s1.rp_filter = 2
net.ipv4.conf.lo.rp_filter = 2
net.ipv4.conf.ovs-system.rp_filter = 2
net.ipv4.conf.vlan10.rp_filter = 2
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.c-3po.accept_source_route = 0
net.ipv4.conf.enp0s1.accept_source_route = 0
net.ipv4.conf.lo.accept_source_route = 0
net.ipv4.conf.ovs-system.accept_source_route = 0
net.ipv4.conf.vlan10.accept_source_route = 0
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.c-3po.promote_secondaries = 1
net.ipv4.conf.enp0s1.promote_secondaries = 1
net.ipv4.conf.lo.promote_secondaries = 1
net.ipv4.conf.ovs-system.promote_secondaries = 1
net.ipv4.conf.vlan10.promote_secondaries = 1
net.ipv4.ping_group_range = 0 2147483647
net.core.default_qdisc = fq_codel
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
fs.protected_regular = 2
fs.protected_fifos = 1
kernel.pid_max = 4194304

Configure dnsmasq for automatic addressing of containers

Edit configuration file /etc/dnsmasq.conf to set the following parameters for container addressing and name resolution.

cat << EOF | sudo tee /etc/dnsmasq.conf
# Specify Container VLAN interface
interface=vlan10

# Enable DHCPv4 on Container VLAN
dhcp-range=192.0.2.100,192.0.2.200,3h

# Enable IPv6 router advertisements
enable-ra

# Enable SLAAC
dhcp-range=::,constructor:vlan10,ra-names,slaac

# Optional: Specify DNS servers
dhcp-option=option:dns-server,172.16.0.2,9.9.9.9
dhcp-option=option6:dns-server,[2001:678:3fc:3::2],[2620:fe::fe]

# Avoid DNS listen port conflict between dnsmasq and systemd-resolved
port=0
EOF

Don't forget to restart service after editing the configuration file.

sudo systemctl restart dnsmasq.service

Masquerade traffic outgoing from host interface

Create a new /etc/nftables.conf file.

cat << EOF | sudo tee /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

table inet nat {
    chain postrouting {
        type nat hook postrouting priority 100;
        oifname "enp0s1" masquerade
    }
}
EOF

Don't forget to restart the nftables systemd service to enable the ruleset.

sudo systemctl restart nftables

Install Incus container manager

We need to add a new package source.

  • Add the new repository key
wget -O - https://pkgs.zabbly.com/key.asc | sudo tee /etc/apt/keyrings/zabbly.asc
  • Add the new repository parameters
cat << EOF | sudo tee /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: bookworm
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc

EOF
  • We are now ready to update package catalog and install Incus
sudo apt update
sudo apt -y install incus --no-install-recommends
  • The normal user etu is the UNprivileged user and must belong to incus-admin and incus groups.
for grp in incus-admin incus
do
	sudo adduser etu $grp
done

Log out and log back in to make it effective.

  • After new login, group assignment is correct
groups
etu adm sudo users incus-admin incus

Set the default container profile

This is done with the `incus admin init' command, which has many options. The main point here is to refuse to create a local network bridge and use our Open vSwitch instead.

incus admin init
Would you like to use clustering? (yes/no) [default=no]:
Voulez-vous configurer un nouveau pool de stockage? (yes/no) [default=yes]:
Name of the new storage pool [default=default]:
Where should this storage pool store its data? [default=/var/lib/incus/storage-pools/default]:
Would you like to create a new local network bridge? (yes/no) [default=yes]: no
Would you like to use an existing bridge or host interface? (yes/no) [default=no]: yes
Name of the existing bridge or host interface: c-3po
Would you like the server to be available over the network? (yes/no) [default=no]:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]:
Would you like a YAML "init" preseed to be printed? (yes/no) [default=no]:

We have to change the nictype: from macvlan to bridged and set the vlan: number.

incus profile device set default eth0 nictype bridged
incus profile device set default eth0 vlan 10
incus profile show default
config: {}
description: Default Incus profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: c-3po
    type: nic
    vlan: "10"
  root:
    path: /
    pool: default
    type: disk
name: default
used_by: []
project: default

We are done with the default profile.


Create a first set of Incus containers

for i in {0..2}
do
	incus launch images:debian/trixie c$i
done
Launching c0
Launching c1
Launching c2
incus ls
+-----+---------+--------------------+-----------------------------------------+-----------+-------------+
| NOM |  ÉTAT   |        IPV4        |                  IPV6                   |   TYPE    | INSTANTANÉS |
+-----+---------+--------------------+-----------------------------------------+-----------+-------------+
| c0  | RUNNING | 192.0.2.153 (eth0) | fdc0:7a62:a:0:216:3eff:fe45:8a58 (eth0) | CONTAINER | 0           |
+-----+---------+--------------------+-----------------------------------------+-----------+-------------+
| c1  | RUNNING | 192.0.2.142 (eth0) | fdc0:7a62:a:0:216:3eff:fec4:ce04 (eth0) | CONTAINER | 0           |
+-----+---------+--------------------+-----------------------------------------+-----------+-------------+
| c2  | RUNNING | 192.0.2.184 (eth0) | fdc0:7a62:a:0:216:3eff:fe54:a941 (eth0) | CONTAINER | 0           |
+-----+---------+--------------------+-----------------------------------------+-----------+-------------+

Test the first Incus container

incus exec c0 -- bash
root@c0:~#

Addressing

root@c0:~# ip addr ls
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
6: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:45:8a:58 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.0.2.153/24 metric 1024 brd 192.0.2.255 scope global dynamic eth0
       valid_lft 10583sec preferred_lft 10583sec
    inet6 fdc0:7a62:a:0:216:3eff:fe45:8a58/64 scope global mngtmpaddr noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe45:8a58/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

Routing and name resolution at the same time

root@c0:~# apt update
Hit:1 http://deb.debian.org/debian trixie InRelease
Hit:2 http://deb.debian.org/debian trixie-updates InRelease
Hit:3 http://deb.debian.org/debian-security trixie-security InRelease
All packages are up to date.

Ping other containers from c0

root@c0:~# for i in {1..2}; do ping -c2 c$i; done
PING c1 (fdc0:7a62:a:0:216:3eff:fec4:ce04) 56 data bytes
64 bytes from c1 (fdc0:7a62:a:0:216:3eff:fec4:ce04): icmp_seq=1 ttl=64 time=0.296 ms
64 bytes from c1 (fdc0:7a62:a:0:216:3eff:fec4:ce04): icmp_seq=2 ttl=64 time=0.122 ms

--- c1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.122/0.209/0.296/0.087 ms
PING c2 (fdc0:7a62:a:0:216:3eff:fe54:a941) 56 data bytes
64 bytes from c2 (fdc0:7a62:a:0:216:3eff:fe54:a941): icmp_seq=1 ttl=64 time=0.274 ms
64 bytes from c2 (fdc0:7a62:a:0:216:3eff:fe54:a941): icmp_seq=2 ttl=64 time=0.121 ms

--- c2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.121/0.197/0.274/0.076 ms

Automation very first step

To update packages in containers, let's create a shell script that runs commands in each running container.

cat << 'EOF' > run-in-c.sh
#!/bin/bash

cmds=("$@")

clist=$(incus list --format csv --columns n status=Running | tr '\n' ' ')

for c in $clist; do
  echo ">>>>>>>>>>>>>>>>> $c"
  for cmd in "${cmds[@]}"; do
    eval "incus exec $c -- $cmd"
  done
done
EOF

Then run the usual apt commands within containers.

bash run-in-c.sh "apt update" "apt -y full-upgrade" "apt clean" "apt -y autopurge"

Check OvS swicth ports and TCAM on host

Display OvS main switch configuration

sudo ovs-vsctl show
6d2b47b6-9804-4d34-abb7-4d06dc132772
    Bridge c-3po
        fail_mode: standalone
        Port vethac01163d
            tag: 10
            Interface vethac01163d
        Port veth052e37f8
            tag: 10
            Interface veth052e37f8
        Port c-3po
            Interface c-3po
                type: internal
        Port veth1e8adaac
            tag: 10
            Interface veth1e8adaac
        Port vlan10
            tag: 10
            Interface vlan10
                type: internal
    ovs_version: "3.4.0"

Say hello to VLAN 10 neighborhood

ping -c2 ff02::1%vlan10
PING ff02::1%vlan10 (ff02::1%vlan10) 56 data bytes
64 bytes from fe80:a::1%vlan10: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from fe80::216:3eff:fe45:8a58%vlan10: icmp_seq=1 ttl=64 time=0.720 ms
64 bytes from fe80::216:3eff:fec4:ce04%vlan10: icmp_seq=1 ttl=64 time=0.753 ms
64 bytes from fe80::216:3eff:fe54:a941%vlan10: icmp_seq=1 ttl=64 time=0.761 ms
64 bytes from fe80:a::1%vlan10: icmp_seq=2 ttl=64 time=0.106 ms

--- ff02::1%vlan10 ping statistics ---
2 packets transmitted, 2 received, +3 duplicates, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.106/0.490/0.761/0.311 ms

As we already knew, we three neighbors.

ip nei ls dev vlan10
192.0.2.153 lladdr 00:16:3e:45:8a:58 STALE
192.0.2.184 lladdr 00:16:3e:54:a9:41 STALE
192.0.2.142 lladdr 00:16:3e:c4:ce:04 STALE
fdc0:7a62:a:0:216:3eff:fe45:8a58 lladdr 00:16:3e:45:8a:58 STALE
fdc0:7a62:a:0:216:3eff:fec4:ce04 lladdr 00:16:3e:c4:ce:04 STALE
fe80::216:3eff:fe45:8a58 lladdr 00:16:3e:45:8a:58 STALE
fe80::216:3eff:fec4:ce04 lladdr 00:16:3e:c4:ce:04 STALE
fdc0:7a62:a:0:216:3eff:fe54:a941 lladdr 00:16:3e:54:a9:41 STALE
fe80:a::1 lladdr d6:0c:16:1e:25:46 router STALE
fe80::216:3eff:fe54:a941 lladdr 00:16:3e:54:a9:41 STALE

Display Open vSwitch Content Addressable Memory (CAM) entries.

 sudo ovs-appctl fdb/show c-3po
 port  VLAN  MAC                Age
    1    10  d6:0c:16:1e:25:46  193
    2    10  00:16:3e:45:8a:58  193
    4    10  00:16:3e:54:a9:41  193
    3    10  00:16:3e:c4:ce:04  192

To conclude...

Here is an image of the achieved logical topology. Achieved logical topology

This gist provides a comprehensive guide on setting up unprivileged Incus containers on top of Open vSwitch (OvS) using a Debian Trixie base installation. It covers everything from installing necessary tools, configuring network switches and routing, setting up DHCP and DNS, installing Incus, creating containers, and verifying the setup, making it a valuable resource for those looking to implement a similar containerized environment with advanced networking capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment