Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Start UNprivileged lxd containers on top of Open vSwitch in a few steps

C-3PO has to be fixed

Starting from a Debian bullseye base install on host system with old naming interface scheme ...


Consistent vs Old fashioned network interface naming

Nowadays, network interface names follow the consistent naming rule. In screenshots given below, the ethernet interface name is enp0s2 as the chip stands on the PCI bus 0 at slot 2.

In order to get back to old network interface naming like eth0, the GRUB_CMDLINE_LINUX option should be changed to GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0" in /etc/default/grub file. Do not forget to run update-grub after editing.


Install OvS on host

$ sudo aptitude -y install openvswitch-switch
$ sudo aptitude versions openvswitch-switch
i   2.15.0+ds1-5                              testing   500

Add new switches to host

Notice that the IP adresses given to the interface enp0s2 in the example file below have to be changed to fit your context. As the host will become a router, we cannot rely on automatic IPv6 addressing through DHCP and/or SLAAC.

Add 2 switches to network setup on host

. C-3PO is the distribution layer switch. Quote from Star Wars:

"Don't blame me. I'm an interpreter. I'm not supposed to know a power socket from a computer terminal."

. sw-vlan10 is an access layer switch with all ports belonging to VLAN number 10

Copy of the /etc/network/interfaces file.

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto enp0s2
iface enp0s2 inet static
        address 172.16.96.220/24
        gateway 172.16.96.1

iface enp0s2 inet6 static
        address 2001:678:3fc:d6::dc/64
        gateway fe80:d6::1

auto C-3PO
iface C-3PO inet manual
        ovs_type OVSBridge
        ovs_ports sw-vlan10
        up ip link set dev $IFACE up
        down ip link set dev $IFACE down

allow-C-3PO sw-vlan10
iface sw-vlan10 inet static
        ovs_type OVSBridge
        ovs_bridge C-3PO
        ovs_options C-3PO 10
        address 192.0.2.1/24

iface sw-vlan10 inet6 static
        ovs_type OVSBridge
        ovs_bridge C-3PO
        ovs_options C-3PO 10
        address fdc0:2::1/64

Turn IPv(4|6) routing on at the kernel level

. Uncomment the following lines in /etc/sysctl.conf

$ egrep -v '(^#|^$)' /etc/sysctl.conf 
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1
net.ipv4.conf.all.log_martians = 1

. Make it happen !

$ sudo sysctl --system
* Applique /usr/lib/sysctl.d/50-pid-max.conf …
kernel.pid_max = 4194304
* Applique /etc/sysctl.d/99-sysctl.conf …
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.all.log_martians = 1
* Applique /usr/lib/sysctl.d/protect-links.conf …
fs.protected_fifos = 1
fs.protected_hardlinks = 1
fs.protected_regular = 2
fs.protected_symlinks = 1
* Applique /etc/sysctl.conf …
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.all.log_martians = 1

Configure dnsmasq

$ sudo aptitude -y install dnsmasq
$ sudo aptitude versions dnsmasq
i   2.85-1       testing     500

Edit configuration file /etc/dnsmasq.conf to set the following parameters for container addressing and name resolution.

$ egrep -v '(^#|^$)' /etc/dnsmasq.conf
dnssec
local=/localCloud/
server=9.9.9.9@enp0s2
server=2620:fe::fe%enp0s2
no-dhcp-interface=enp0s2
no-hosts
expand-hosts
domain=localCloud
dhcp-range=192.0.2.10,192.0.2.100,1h
dhcp-range=fdc0:2::,ra-names
enable-ra
dhcp-option=option6:dns-server,[fdc0:2::1],[2620:fe::fe]

Don't forget to restart service after editing the configuration file.

sudo systemctl restart dnsmasq.service

Masquerade traffic outgoing from host interface

$ sudo iptables -t nat -A POSTROUTING -o enp0s2 -j MASQUERADE
$ sudo ip6tables -t nat -A POSTROUTING -o enp0s2 -j MASQUERADE
$ sudo aptitude -y install iptables-persistent

When the iptables-persistent package is installed, the two previous rules have to be saved. They will be restores at host reboot.

In order to check rules, just run these two commands : sudo iptables -t nat -vnL and/or sudo ip6tables -t nat -vnL


Install lxd

$ sudo apt -y install snapd
$ sudo snap install lxd

If the sudo snap install lxd fails with dial unix /run/snapd.socket: connect: connection refused error, we need to switch snapd confine service to apparmor complain mode.

$ sudo aa-status
apparmor module is loaded.
9 profiles are loaded.
9 profiles are in enforce mode.
   /usr/bin/man
   /usr/lib/snapd/snap-confine
   /usr/lib/snapd/snap-confine//mount-namespace-capture-helper
   /usr/sbin/haveged
   lsb_release
   man_filter
   man_groff
   nvidia_modprobe
   nvidia_modprobe//kmod
0 profiles are in complain mode.
0 profiles are in kill mode.
0 profiles are in unconfined mode.
1 processes have profiles defined.
1 processes are in enforce mode.
   /usr/sbin/haveged (338)
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
0 processes are in mixed mode.
0 processes are in kill mode.

The aa-status command shows /usr/lib/snapd/snap-confine is in enforced mode. That's why access to the socket is refused. Let's switch to complain mode for now.

$ sudo aa-complain /usr/lib/snapd/snap-confine
Setting /usr/lib/snapd/snap-confine to complain mode.
$ sudo systemctl restart snapd.socket

Then, we are able to install lxd snap and get the list of installed snaps.

$ snap list
Name    Version   Rev    Tracking       Publisher   Notes
core20  20210702  1081   latest/stable  canonical✓  base
lxd     4.18      21497  latest/stable  canonical✓  -
snapd   2.51.7    13170  latest/stable  canonical✓  snapd

The normal user etu is the UNprivileged user and must belong to lxd group. Log out and log back in to make it effective.

$ sudo adduser etu lxd
$ id | grep -o lxd
lxd

Set the default container profile

This job is done with the lxd init instruction which has may options. Th main point here is to refuse local network bridge creation and use our Open vSwitch instead.

$ lxd init
WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement
Would you like to use LXD clustering? (yes/no) [default=no]:
Do you want to configure a new storage pool? (yes/no) [default=yes]:
Name of the new storage pool [default=default]:
Name of the storage backend to use (btrfs, dir, lvm, ceph) [default=btrfs]:
Create a new BTRFS pool? (yes/no) [default=yes]:
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]:
Size in GB of the new loop device (1GB minimum) [default=13GB]:
Would you like to connect to a MAAS server? (yes/no) [default=no]:
Would you like to create a new local network bridge? (yes/no) [default=yes]: no
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: yes
Name of the existing bridge or host interface: sw-vlan10
Would you like the LXD server to be available over the network? (yes/no) [default=no]:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

We have to change the nictype: from macvlan to bridged and we are done with the default profile.

$ lxc profile device set default eth0 nictype bridged
$ lxc profile device get default eth0 nictype
bridged

Create the first two lxc containers

$ lxc launch images:debian/bullseye c0
Creating c0
Starting c0
$ lxc launch images:debian/bullseye c1
Creating c1
Starting c1
$ lxc ls
WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement
+------+---------+-------------------+-----------------------------------+-----------+-----------+
| NAME |  STATE  |       IPV4        |               IPV6                |   TYPE    | SNAPSHOTS |
+------+---------+-------------------+-----------------------------------+-----------+-----------+
| c0   | RUNNING | 192.0.2.70 (eth0) | fdc0:2::216:3eff:fe6b:6fa9 (eth0) | CONTAINER | 0         |
+------+---------+-------------------+-----------------------------------+-----------+-----------+
| c1   | RUNNING | 192.0.2.45 (eth0) | fdc0:2::216:3eff:fe25:4288 (eth0) | CONTAINER | 0         |
+------+---------+-------------------+-----------------------------------+-----------+-----------+

Test the first lxc container

$ lxc exec c0 -- /bin/bash
root@c0:~# 

Addressing

root@c0:~# ip addr ls dev eth0
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:6b:6f:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.0.2.70/24 brd 192.0.2.255 scope global dynamic eth0
       valid_lft 3536sec preferred_lft 3536sec
    inet6 fdc0:2::216:3eff:fe6b:6fa9/64 scope global dynamic mngtmpaddr
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe6b:6fa9/64 scope link
       valid_lft forever preferred_lft forever

Routing and name resolution at the same time

root@c0:~# apt update
Get:1 http://security.debian.org bullseye-security InRelease [44.1 kB]
Hit:2 http://deb.debian.org/debian bullseye InRelease
Fetched 44.1 kB in 1s (75.2 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.

Ping other container

root@c0:~# ping -c2 c1
PING c1(c1 (fdc0:2::216:3eff:fe25:4288)) 56 data bytes
64 bytes from c1 (fdc0:2::216:3eff:fe25:4288): icmp_seq=1 ttl=64 time=0.501 ms
64 bytes from c1 (fdc0:2::216:3eff:fe25:4288): icmp_seq=2 ttl=64 time=0.079 ms

--- c1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.079/0.290/0.501/0.211 ms

Automation very first step

Update package database on containers

for c in c0 c1; do lxc exec $c -- apt update; done

Check OvS ports and TCAM on host

Display OvS main configuration

$ sudo ovs-vsctl show
45a45c22-efb1-4d2a-804d-76e6b30d9fc1
    Bridge C-3PO
        Port sw-vlan10
            tag: 10
            Interface sw-vlan10
                type: internal
        Port veth7e17b2b5
            tag: 10
            Interface veth7e17b2b5
        Port veth2ef19732
            tag: 10
            Interface veth2ef19732
        Port C-3PO
            Interface C-3PO
                type: internal
    ovs_version: "2.15.0"

Say hello to VLAN 10 neighborhood

$ ping -c2 ff02::1%sw-vlan10
PING ff02::1%sw-vlan10(ff02::1%sw-vlan10) 56 data bytes
64 bytes from fe80::6427:8ff:fe94:6d4e%sw-vlan10: icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from fe80::216:3eff:fe25:4288%sw-vlan10: icmp_seq=1 ttl=64 time=0.698 ms
64 bytes from fe80::216:3eff:fe6b:6fa9%sw-vlan10: icmp_seq=1 ttl=64 time=0.790 ms
64 bytes from fe80::6427:8ff:fe94:6d4e%sw-vlan10: icmp_seq=2 ttl=64 time=0.078 ms

--- ff02::1%sw-vlan10 ping statistics ---
2 packets transmitted, 2 received, +2 duplicates, 0% packet loss, time 1011ms
rtt min/avg/max/mdev = 0.078/0.418/0.790/0.327 ms
$ ip nei ls dev sw-vlan10
192.0.2.70 lladdr 00:16:3e:6b:6f:a9 STALE
192.0.2.45 lladdr 00:16:3e:25:42:88 STALE
fdc0:2::216:3eff:fe6b:6fa9 lladdr 00:16:3e:6b:6f:a9 STALE
fe80::6427:8ff:fe94:6d4e lladdr 66:27:08:94:6d:4e router STALE
fdc0:2::216:3eff:fe25:4288 lladdr 00:16:3e:25:42:88 STALE
fe80::216:3eff:fe6b:6fa9 lladdr 00:16:3e:6b:6f:a9 STALE
fe80::216:3eff:fe25:4288 lladdr 00:16:3e:25:42:88 STALE

Display OvS TCAM

$ $ sudo ovs-appctl fdb/show C-3PO
 port  VLAN  MAC                Age
    1    10  66:27:08:94:6d:4e   64
    4    10  00:16:3e:25:42:88   64
    3    10  00:16:3e:6b:6f:a9   64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment