Skip to content

Instantly share code, notes, and snippets.

@mortn
Last active March 2, 2024 15:29
Show Gist options
  • Save mortn/95b21a10bbe3210202d30d29f60a9496 to your computer and use it in GitHub Desktop.
Save mortn/95b21a10bbe3210202d30d29f60a9496 to your computer and use it in GitHub Desktop.
Systemd template unit for controlling Cloud-hypervisor guests
[Unit]
Description=Cloud-Hypervisor for %i
After=network.target
After=local-fs.target
After=remote-fs.target
After=systemd-logind.service
After=systemd-machined.service
Wants=network.target
[Service]
SyslogLevel=debug
Type=simple
LogsDirectory=ch
StateDirectory=ch
StandardOutput=append:/var/log/ch/%i.stdout
WorkingDirectory=/var/lib/ch
RuntimeDirectory=ch
RuntimeDirectoryMode=0775
RuntimeDirectoryPreserve=yes
EnvironmentFile=/var/lib/ch/%i.env
ExecStartPre=/bin/bash -c 'for IF in ${CH_BRS};do ip l sh ${IF%:*} >/dev/null 2>&1 || (ip tuntap add ${IF%:*} mode tap && ip l s ${IF%:*} master ${IF#*:});done'
ExecStart=/bin/bash -c 'cloud-hypervisor --api-socket $RUNTIME_DIRECTORY/%i-sock ${CH_CONFIG}'
ExecStop=/bin/bash -c 'ch-remote --api-socket $RUNTIME_DIRECTORY/%i-sock shutdown-vmm'
ExecStop=/bin/bash -c 'for IF in ${CH_BRS};do ip l sh ${IF%:*} >/dev/null 2>&1 && ip l del ${IF%:*};done'
#ExecStop=/bin/bash -c 'rm -fv $RUNTIME_DIRECTORY/%i-sock'
ExecReload=/bin/bash -c 'ch-remote --api-socket $RUNTIME_DIRECTORY/%i-sock reboot'
[Install]
WantedBy=multi-user.target
#!/bin/bash
set -e
#set -x
#[[ "$EUID" -ne 0 ]] && echo "Please run as root" && exit
# ./create-cloud-init.sh trax 44:c1 10.0.1.41/26
usage(){
echo "Usage: $0 [name] [mac] [ip] [gw] [nssearch] [nsaddr] "
}
[[ $# -lt 3 ]] && (usage && exit)
[[ $1 == *.* ]] && (echo "No dots in arg1" && usage && exit)
_dir="/var/lib/ch"
vm="$1"
c_init="${_dir}/${vm}-init.img"
#set -x
bridge="brvirt"
[[ $2 == *:* ]] && mac="22:22:22:22:$2" || mac="22:22:22:22:aa:a1"
[[ $3 == *.*.*.* ]] && ip="$3" || ip="10.0.1.60/26"
[[ $4 == *.*.*.* ]] && gw="$4" || gw="10.0.1.1"
[[ $5 == *.* ]] && nssearch="$5" || nssearch="h3m,h3m.li"
[[ $6 == *.*.*.* ]] && nsaddr="$6" || nsaddr="10.0.1.5"
gen_nw_cfg(){
nw_cfg="version: 2
ethernets:
ens2:
match:
macaddress: "$1"
addresses: ["$2"]
nameservers:
search: ["$4"]
addresses: ["$5"]
routes:
- to: default
via: "$3"
"
printf "${nw_cfg}" > network-config
}
gen_nw_cfg $mac $ip $gw $nssearch $nsaddr
printf "CH_CONFIG=--kernel ./hypervisor-fw \
--cpus boot=2 \
--memory size=2G,shared=on \
--disk path=$vm.raw --disk path=${c_init##*/} \
--net tap=${vm},mac=${mac} \
--serial tty --console off \
--log-file /var/log/ch/${vm}.log -v
CH_BRS="${vm}:${bridge}"
" > "${_dir}/${vm}.env"
printf "instance-id: $vm \nlocal-hostname: $vm\n" > meta-data
## cloud-localds -v -H $vm -N network-config.${vm} $c_init user-data.${vm}
[[ -f "${c_init}" ]] && sudo rm -vf "${c_init}"
#mkdosfs -n CIDATA -C "${c_init}" 4096 > /dev/null
mkdosfs -n CIDATA -C "${c_init}" 64 > /dev/null
mcopy -oi "${c_init}" -s user-data ::
mcopy -oi "${c_init}" -s meta-data ::
mcopy -oi "${c_init}" -s network-config ::
cat meta-data network-config
rm -f meta-data network-config
CH_CONFIG=--kernel ./hypervisor-fw --cpus boot=2 --memory size=8G,shared=on --disk path=px7.raw --net tap=px7,mac=22:22:22:14:fa:a0 --serial tty --console off --log-file /var/log/ch/px7.log -v
CH_BRS=px7:brvirt
@mortn
Copy link
Author

mortn commented Sep 24, 2023

Made with simplicity and minimal dependencies in mind. [1]
A systemd template unit gives consistent control of cloud-hypervisor processes to, for instance, avoid accidental duplicate API sockets and log files. [4]

Prerequisite: cloud-hypervisor and rust-hypervisor-firmware-bin packages are installed.

Setup

Prepare disks for VM guest, for instance with a primary virtual cloud image of your liking and a cloud-init image.
Now
mkdir /var/lib/ch
sudo ln /usr/share/cloud-hypervisor/hypervisor-fw /var/lib/ch/hypervisor-fw [2]
Copy systemd unit ch@.service to /etc/systemd/system
Create environment file for your VM guest in /var/lib/ch as shown in example.env
Enable autostart with systemctl enable ch@example
Networking
Create a bridge link/interface on the VM host used for mapping VM guest's tap device tab into. Note the syntax of ch_brs line in the environment file. This is mapping the tap device name with the VM host bridge device in ExecStartPre line 18. It's ugly but it works, and doesn't have dependencies. [3]

Notes

  • [1] The systemd template unit was intended to executing the cloud-hypervisor binary directly in ExecStart= (and not via a bash process). However, I couldn't convince the native none-Bash Systemd environment to parse the arguments correctly without quoting them and thus b0rking the process. systemd-escape didn't help. I opted for just wrapping everything under a bash process.
  • [2] It seems like AppArmor on Debian 12 hypervisor didn't allow for the process to access the hypervisor-fw file when referred directly like --kernel /usr/share/cloud-hypervisor/hypervisor-fw argument. Defining WorkingDirectory= and putting a hard link there fixed it for now.
  • [3] When scripting/automating this setup for mapping multiple interfaces on multiple VM guests across multiple host bridges the orchestration structure for this is tricky to get just right without risking VMs to interfere with each others device identifiers/names.
    • Using guest names (%i) for tap device names limits the number of interfaces per guest to one, or it sends us into a naming schema like adding an incrementally suffix which is cluttering the view of which tap goes to which bridge.
    • Orchestrating the naming of the tap device setup with file descriptors using a horrible syntax seems too error prone.

@sdake
Copy link

sdake commented Sep 29, 2023

@mortn love it! I have a few things here; https://github.com/artificialwisdomai/origin/tree/main/platform/node

I also will paste a few of my (definately not done definately work in progress definately doesn't work scripts).

The unit file:

sdake@beast-06:/usr/lib/systemd/system$ cat sdacctl@.service
[Unit]
Description=Artificial Wisdom™ SDAC™ core %I
ConditionPathExists=/etc/sdacctl.d/%I

[Service]
StateDirectory=/var/lib/artificialwisdom/%I
RuntimeDirectory=/run/artificialwisdom/%I
ExecStart=/usr/local/bin/cloud-hypervisor --kernel $STATE_DIRECTORY/hypervisor-fw --disk path=$STATE_DIRECTORY/baseline.img,direct=on --disk path=$STATE_DIRECTORY/cloudinit_conifg.img,readonly=on,direct=on --api-socket $RUNTIME_DIRECTORY/api.sock --vsock cid=3,socket=$RUNTIME_DIRECTORY/vsock --serial tty --console pty --cpus boot=32 --net fd=3,mac=${mac} -v --cmdline rd.module_blacklist=nouveau,nvidiafb console=tty0 root=/dev/vda1 rw --memory size=128G,hugepages=on,hugepage_size=2M --device path=/sys/bus/pci/devices/0000:%I:00.0 --log-file $LOGS_DIRECTORY/%I.log

# 3<>$"${tapdevice}" \
#       --fs tag="homefs,socket=${virtiofs_sock},num_queues=1,queue_size=512" \

[Install]
WantedBy=multi-user.target

And an isolation slice:

sdake@beast-06:/usr/local/bin$ cat sdac.slice
[Unit]
Description=Software Defined Accelerated Compute (SDAC™) Slice
Documentation=man:systemd.special(7)
Before=slices.target

And finally, (overrideable configuration:

sdake@beast-06:/etc/sdacctl.d$ cat 00
A=100
B=100

Would be pleased to work with you on this, or come to a common implementation, or altneratively, learn from each other. As you mention, (bridge networking) is a little challenging. As you see, my bridge networking is incomplete, although I have a complete working example in bash.

@sdake
Copy link

sdake commented Sep 29, 2023

One idea I had wanted to approach was using a generator, although it seems too complex. As you can see (if you click through to the github link), there is alot of stuff going on when I launch virtual machines for using VFIO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment