Skip to content

Instantly share code, notes, and snippets.

@platu
Last active December 29, 2023 07:06
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save platu/85efe701163438f68896b628559fd138 to your computer and use it in GitHub Desktop.
Save platu/85efe701163438f68896b628559fd138 to your computer and use it in GitHub Desktop.

Start UNprivileged lxd containers on top of Open vSwitch in a few steps

C-3PO has to be fixed

Starting from a Debian bullseye base install on host system with old naming interface scheme ...


Consistent vs Oldfashioned network interface naming

Nowadays, network interface names follow the consistent naming rule. In screenshots given below, the ethernet interface name is enp0s1 as the chip stands on the PCI bus 0 at slot 1.

In order to get back to old network interface naming like eth0, the GRUB_CMDLINE_LINUX option should be changed to GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0" in /etc/default/grub file. Do not forget to run update-grub after editing.


Install OvS on host

sudo apt -y install openvswitch-switch
apt search ^openvswitch-switch$

openvswitch-switch/testing,now 2.17.2-5+b1 amd64  [installed]
  Open vSwitch switch implementations

Add new switches to host

Notice that the IP adresses given to the interface enp0s1 in the example file below have to be changed to fit your context. As the host will become a router, we cannot rely on automatic IPv6 addressing through DHCP and/or SLAAC.

Add 2 switches to network setup on host

  • C-3PO is the distribution layer switch. Quote from Star Wars:

    "Don't blame me. I'm an interpreter. I'm not supposed to know a power socket from a computer terminal."

  • sw-vlan10 is an access layer switch with all ports belonging to VLAN number 10

Copy of the /etc/network/interfaces file.

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto enp0s1
iface enp0s1 inet static
        address 172.16.96.220/24
        gateway 172.16.96.1
	dns-namservers 172.16.0.2

iface enp0s1 inet6 static
        address 2001:db8:d6::dc/64
        gateway fe80:d6::1
		dns-nameservers 2001:db8:3::2

auto C-3PO
iface C-3PO inet manual
        ovs_type OVSBridge
        ovs_ports sw-vlan10
        up ip link set dev $IFACE up
        down ip link set dev $IFACE down

allow-C-3PO sw-vlan10
iface sw-vlan10 inet static
        ovs_type OVSBridge
        ovs_bridge C-3PO
        ovs_options C-3PO 10
        address 192.0.2.1/24

iface sw-vlan10 inet6 static
        ovs_type OVSBridge
        ovs_bridge C-3PO
        ovs_options C-3PO 10
        address fdc0:2::1/64

Turn IPv(4|6) routing on at the kernel level

. Uncomment the following lines in /etc/sysctl.conf

egrep -v '(^#|^$)' /etc/sysctl.conf 
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1
net.ipv4.conf.all.log_martians = 1

. Make it happen !

sudo sysctl --system
* Applique /usr/lib/sysctl.d/50-pid-max.conf …
kernel.pid_max = 4194304
* Applique /etc/sysctl.d/99-sysctl.conf …
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.all.log_martians = 1
* Applique /usr/lib/sysctl.d/protect-links.conf …
fs.protected_fifos = 1
fs.protected_hardlinks = 1
fs.protected_regular = 2
fs.protected_symlinks = 1
* Applique /etc/sysctl.conf …
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.all.log_martians = 1

Configure dnsmasq

sudo apt -y install dnsmasq
apt search ^dnsmasq$

dnsmasq/testing,now 2.87-1 all [installed]
  Small caching DNS proxy and DHCP/TFTP server

Edit configuration file /etc/dnsmasq.conf to set the following parameters for container addressing and name resolution.

egrep -v '(^#|^$)' /etc/dnsmasq.conf
dnssec
local=/localCloud/
server=9.9.9.9@enp0s1
server=2620:fe::fe%enp0s1
no-dhcp-interface=enp0s1
no-hosts
expand-hosts
domain=local.cloud
dhcp-range=192.0.2.10,192.0.2.100,1h
dhcp-range=fdc0:2::,ra-names
enable-ra
dhcp-option=option6:dns-server,[fdc0:2::1],[2620:fe::fe]

Don't forget to restart service after editing the configuration file.

sudo systemctl restart dnsmasq.service

Masquerade traffic outgoing from host interface

sudo apt -y install iptables
sudo iptables -t nat -A POSTROUTING -o enp0s1 -j MASQUERADE
sudo ip6tables -t nat -A POSTROUTING -o enp0s1 -j MASQUERADE
sudo apt -y install iptables-persistent

When the iptables-persistent package is installed, the two previous rules have to be saved. They will be restores at host reboot.

In order to check rules, just run these two commands : sudo iptables -t nat -vnL and/or sudo ip6tables -t nat -vnL


Install lxd

sudo apt -y install snapd
sudo snap install lxd

Then, we are able to install lxd snap and get the list of installed snaps.

$ snap list
Name    Version      Rev    Tracking       Publisher   Notes
core20  20220826     1623   latest/stable  canonical✓  base
lxd     5.6-794016a  23680  latest/stable  canonical✓  -
snapd   2.57.2       17029  latest/stable  canonical✓  snapd

The normal user etu is the UNprivileged user and must belong to lxd group. Log out and log back in to make it effective.

sudo adduser etu lxd
id | grep -o lxd
lxd

Set the default container profile

This job is done with the lxd init instruction which has may options. Th main point here is to refuse local network bridge creation and use our Open vSwitch instead.

lxd init
Would you like to use LXD clustering? (yes/no) [default=no]:
Do you want to configure a new storage pool? (yes/no) [default=yes]:
Name of the new storage pool [default=default]:
Name of the storage backend to use (btrfs, ceph, cephobject, dir, lvm) [default=btrfs]:
Create a new BTRFS pool? (yes/no) [default=yes]:
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]:
Size in GiB of the new loop device (1GiB minimum) [default=23GiB]:
Would you like to connect to a MAAS server? (yes/no) [default=no]:
Would you like to create a new local network bridge? (yes/no) [default=yes]: no
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: yes
Name of the existing bridge or host interface: sw-vlan10
Would you like the LXD server to be available over the network? (yes/no) [default=no]:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]:
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

We have to change the nictype: from macvlan to bridged and we are done with the default profile.

lxc profile device set default eth0 nictype bridged
lxc profile device get default eth0 nictype
bridged

Create the first lxc containers

for i in {0..2}; do lxc launch images:debian/12 c$i; done
Creating c0
Starting c0
Creating c1
Starting c1
Creating c2
Starting c2
lxc ls
+------+---------+-------------------+------------------------------------+-----------+-----------+
| NAME |  STATE  |       IPV4        |                IPV6                |   TYPE    | SNAPSHOTS |
+------+---------+-------------------+------------------------------------+-----------+-----------+
| c0   | RUNNING | 192.0.2.98 (eth0) | fdc0:2::9875:20ff:fe52:2889 (eth0) | CONTAINER | 0         |
+------+---------+-------------------+------------------------------------+-----------+-----------+
| c1   | RUNNING | 192.0.2.53 (eth0) | fdc0:2::216:3eff:fee7:1d2b (eth0)  | CONTAINER | 0         |
+------+---------+-------------------+------------------------------------+-----------+-----------+
| c2   | RUNNING | 192.0.2.42 (eth0) | fdc0:2::216:3eff:fe55:830a (eth0)  | CONTAINER | 0         |
+------+---------+-------------------+------------------------------------+-----------+-----------+

Test the first lxc container

lxc exec c0 -- /bin/bash
root@c0:~# 

Addressing

root@c0:~# ip addr ls dev eth0
6: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 9a:75:20:52:28:89 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.0.2.98/24 metric 1024 brd 192.0.2.255 scope global dynamic eth0
       valid_lft 3491sec preferred_lft 3491sec
    inet6 fdc0:2::9875:20ff:fe52:2889/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 4294967254sec preferred_lft 4294967254sec
    inet6 fe80::9875:20ff:fe52:2889/64 scope link
       valid_lft forever preferred_lft forever

Routing and name resolution at the same time

root@c0:~# apt update
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm-updates InRelease
Hit:3 http://deb.debian.org/debian-security bookworm-security InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.

Ping other containers from c0

root@c0:~# for i in {1..2}; do ping -c2 c$i; done
PING c1(c1.local.cloud (fdc0:2::216:3eff:fee7:1d2b)) 56 data bytes
64 bytes from c1.local.cloud (fdc0:2::216:3eff:fee7:1d2b): icmp_seq=1 ttl=64 time=0.058 ms
64 bytes from c1.local.cloud (fdc0:2::216:3eff:fee7:1d2b): icmp_seq=2 ttl=64 time=0.092 ms

--- c1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.058/0.075/0.092/0.017 ms
PING c2(c2.local.cloud (fdc0:2::216:3eff:fe55:830a)) 56 data bytes
64 bytes from c2.local.cloud (fdc0:2::216:3eff:fe55:830a): icmp_seq=1 ttl=64 time=0.056 ms
64 bytes from c2.local.cloud (fdc0:2::216:3eff:fe55:830a): icmp_seq=2 ttl=64 time=0.085 ms

--- c2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.056/0.070/0.085/0.014 ms

Automation very first step

Update package database on containers

for i in {0..2}; do lxc exec c$i -- apt update; done

Check OvS swicth ports and TCAM on host

Display OvS main switch configuration

sudo ovs-vsctl show
c90da721-df48-400e-9c16-6de2987971f2
    Bridge C-3PO
        Port veth1f549dc2
            tag: 10
            Interface veth1f549dc2
        Port veth4863fbcc
            tag: 10
            Interface veth4863fbcc
        Port C-3PO
            Interface C-3PO
                type: internal
        Port veth6ae53ba7
            tag: 10
            Interface veth6ae53ba7
        Port sw-vlan10
            tag: 10
            Interface sw-vlan10
                type: internal
    ovs_version: "2.17.2"

Say hello to VLAN 10 neighborhood

ping -c2 ff02::1%sw-vlan10
PING ff02::1%sw-vlan10(ff02::1%sw-vlan10) 56 data bytes
64 bytes from fe80::b894:a9ff:fe80:1243%sw-vlan10: icmp_seq=1 ttl=64 time=0.125 ms
64 bytes from fe80::9875:20ff:fe52:2889%sw-vlan10: icmp_seq=1 ttl=64 time=1.05 ms
64 bytes from fe80::216:3eff:fee7:1d2b%sw-vlan10: icmp_seq=1 ttl=64 time=1.12 ms
64 bytes from fe80::216:3eff:fe55:830a%sw-vlan10: icmp_seq=1 ttl=64 time=1.14 ms
64 bytes from fe80::b894:a9ff:fe80:1243%sw-vlan10: icmp_seq=2 ttl=64 time=0.063 ms

--- ff02::1%sw-vlan10 ping statistics ---
2 packets transmitted, 2 received, +3 duplicates, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.063/0.700/1.139/0.496 ms
ip nei ls dev sw-vlan10
192.0.2.53 lladdr 00:16:3e:e7:1d:2b STALE
192.0.2.42 lladdr 00:16:3e:55:83:0a STALE
192.0.2.98 lladdr 9a:75:20:52:28:89 STALE
fe80::216:3eff:fee7:1d2b lladdr 00:16:3e:e7:1d:2b STALE
fdc0:2::216:3eff:fee7:1d2b lladdr 00:16:3e:e7:1d:2b STALE
fe80::b894:a9ff:fe80:1243 lladdr ba:94:a9:80:12:43 router STALE
fe80::216:3eff:fe55:830a lladdr 00:16:3e:55:83:0a STALE
fdc0:2::216:3eff:fe55:830a lladdr 00:16:3e:55:83:0a STALE
fe80::9875:20ff:fe52:2889 lladdr 9a:75:20:52:28:89 STALE
fdc0:2::9875:20ff:fe52:2889 lladdr 9a:75:20:52:28:89 STALE

Display OvS TCAM

sudo ovs-appctl fdb/show C-3PO
 port  VLAN  MAC                Age
    1    10  ba:94:a9:80:12:43   29
    4    10  00:16:3e:55:83:0a   29
    3    10  00:16:3e:e7:1d:2b   29
    2    10  9a:75:20:52:28:89   29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment