acsulli/okd_libvirt.md

## okd_libvirt.md

      
    Raw
  

              okd_libvirt.md
            
          
    Deploying OKD using libvirt

For this environment, we'll be using these hostname/IP combinations:

helper = 192.168.110.39
bootstrap = 192.168.110.60
controlplane-0 = 192.168.110.61
controlplane-0 = 192.168.110.62
controlplane-0 = 192.168.110.63
worker-0 = 192.168.110.65
worker-1 = 192.168.110.66


Configure libvirt network and storage
Using your tool of choice, configure the libvirt network. You can create a new one or modify the default if desired:

Assign a name, e.g. okd
Mode = NAT
IPv4 config

Choose any subnet that doesn't overlap with your external network. I'll be using 192.168.110.0/24
Disable DHCP
DNS domain name - do not use your regular domain name, I'm using okd.lan


After creating the new libvirt network, we need to inform the local DNS resolver of how to find the domain. With Fedora 33, systemd-resolved is used, so we need to use resovlectl to configure it.
# change these values to match your environment
#   virbr1 = the bridge interface to the libvirt network
#   okd.lan = the domain you chose
#   110.168.192.in-addr.arpa = the reverse subnet you're using
sudo resolvectl domain virbr1 '~okd.lan' '~110.168.192.in-addr.arpa'
sudo resolvectl default-route virbr1 false
sudo resolvectl dns virbr1 192.168.110.1

# verify settings
resolvectl domain
resolvectl dns
If needed, create a storage pool for where you'll be storing the VMs. The control plane nodes, in particular, need low latency storage, e.g. SSD or NVMe.


Create and configure helper node
The helper node will provide DNS and DHCP via dnsmasq, an http server, and load balancing via haproxy. Choose any OS you like, I'll be using Fedora Server. Create the VM (1 CPU, 1 GiB memory), install the OS, apply updates. Use a static IP address, I'm using 192.168.110.39.
SSH to the helper node for the following steps.


Install Podman, haproxy, and dnsmasq
dnf -y install podman haproxy dnsmasq


Configure dnsmasq
cat << EOF > /etc/dnsmasq.d/okd.conf
expand-hosts
domain-needed
domain=okd.lan

# OKD required
address=/api.cluster.okd.lan/192.168.110.39
address=/api-int.cluster.okd.lan/192.168.110.39
address=/.apps.cluster.okd.lan/192.168.110.39

# create node entries
address=/bootstrap.cluster.okd.lan/192.168.110.60
address=/controlplane-0.cluster.okd.lan/192.168.110.61
address=/controlplane-1.cluster.okd.lan/192.168.110.62
address=/controlplane-2.cluster.okd.lan/192.168.110.63
address=/worker-0.cluster.okd.lan/192.168.110.65
address=/worker-1.cluster.okd.lan/192.168.110.66
EOF

# enable and start dnsmasq
systemctl enable --now dnsmasq


Configure haproxy
cat << EOF > /etc/haproxy/haproxy.cfg
global
  log         127.0.0.1 local2
  
  chroot      /var/lib/haproxy
  pidfile     /var/run/haproxy.pid
  maxconn     4000
  user        haproxy
  group       haproxy
  daemon
  
  stats socket /var/lib/haproxy/stats

defaults
  mode                    tcp
  log                     global
  option                  httplog
  option                  dontlognull
  option http-server-close
  option forwardfor       except 127.0.0.0/8
  option                  redispatch
  retries                 3
  timeout http-request    10s
  timeout queue           1m
  timeout connect         10s
  timeout client          10m
  timeout server          10m
  timeout http-keep-alive 10s
  timeout check           10s
  maxconn                 3000

listen stats
  bind :9000
  mode http
  stats enable
  stats uri /
  monitor-uri /healthz

frontend openshift-api-server
  bind *:6443
  default_backend openshift-api-server
  option tcplog

backend openshift-api-server
  balance source
  server bootstrap 192.168.110.60:6443 check
  server controlplane0 192.168.110.61:6443 check
  server controlplane1 192.168.110.62:6443 check
  server controlplane2 192.168.110.63:6443 check

frontend machine-config-server
    bind *:22623
    default_backend machine-config-server
    option tcplog

backend machine-config-server
    balance source
    server bootstrap 192.168.110.60:22623 check
    server controlplane0 192.168.110.61:22623 check
    server controlplane1 192.168.110.62:22623 check
    server controlplane2 192.168.110.63:22623 check

frontend ingress-http
    bind *:80
    default_backend ingress-http
    option tcplog

backend ingress-http
    mode http
    balance source
    server controlplane0-http-router 192.168.110.60:80 check
    server controlplane1-http-router 192.168.110.61:80 check
    server controlplane2-http-router 192.168.110.62:80 check
    server worker0-http-router 192.168.110.65:80 check
    server worker1-http-router 192.168.110.66:80 check

frontend ingress-https
    bind *:443
    default_backend ingress-https
    option tcplog

backend ingress-https
    balance source
    server controlplane0-https-router 192.168.110.60:443 check
    server controlplane1-https-router 192.168.110.61:443 check
    server controlplane2-https-router 192.168.110.62:443 check
    server worker0-https-router 192.168.110.65:443 check
    server worker1-https-router 192.168.110.66:443 check
EOF

# enable and start haproxy
systemctl enable --now haproxy


Configure an http server using podman
The FCOS rootfs image used in this step is here.
# start the httpd container on port 8080
podman run -d \
 --restart=unless-stopped \
 -p 8080:80 \
 -v /var/www/html:/usr/local/apache2/htdocs \
 docker.io/library/httpd:2.4-alpine


Download and place OKD resources
From the libvirt host, download the following resources:
From the OKD release page on GitHub, the openshift-client and openshift-install packages.
Un-gzip and move the binaries to /usr/local/bin:
# download
wget https://github.com/openshift/okd/releases/download/4.6.0-0.okd-2021-02-14-205305/openshift-install-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz
wget https://github.com/openshift/okd/releases/download/4.6.0-0.okd-2021-02-14-205305/openshift-client-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz

# unpack
tar xzf openshift-install-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz
tar xzf openshift-client-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz

# place
sudo mv openshift-install oc kubectl /usr/local/bin
rm README.md
From the helper node:
Links for the most recent binaries to download are here.
# organizational directories
mkdir -p /var/www/html/{install,ignition}

# download the kernel image
wget https://builds.coreos.fedoraproject.org/prod/streams/testing-devel/builds/33.20210212.20.1/x86_64/fedora-coreos-33.20210212.20.1-live-kernel-x86_64
mv fedora-coreos-33.20210212.20.1-live-kernel-x86_64 /var/www/html/install/kernel

# download the initramfs image
wget https://builds.coreos.fedoraproject.org/prod/streams/testing-devel/builds/33.20210212.20.1/x86_64/fedora-coreos-33.20210212.20.1-live-initramfs.x86_64.img
mv fedora-coreos-33.20210212.20.1-live-initramfs.x86_64.img /var/www/html/install/initramfs.img

# download the rootfs image
wget https://builds.coreos.fedoraproject.org/prod/streams/testing-devel/builds/33.20210212.20.1/x86_64/fedora-coreos-33.20210212.20.1-live-rootfs.x86_64.img
mv fedora-coreos-33.20210212.20.1-live-rootfs.x86_64.img /var/www/html/install/rootfs.img

# set permissions for all
chmod 444 /var/www/html/install/*


Create install-config.yaml
From the libvirt host.
Substitue your values for the SSH key and adjust others as needed, e.g. networking.machineNetwork.cidr.
mkdir ~/okd && cd ~/okd

# a real pull secret is not needed for OKD
PULLSECRET='{"auths":{"fake":{"auth": "bar"}}}'

# use the path for your public key
SSHKEY=$(cat ~/.ssh/id_*.pub)

cat << EOF > install-config.yaml
apiVersion: v1
baseDomain: okd.lan
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  creationTimestamp: null
  name: cluster
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 192.168.110.0/24
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
publish: External
pullSecret: '$PULLSECRET'
sshKey: |
  $SSHKEY
EOF


Create ignition files
From the libvirt host
This will be done in two phases so that the control plane can be marked unschedulable.
# create a working directory
cd ~/okd && mkdir cluster && cp install-config.yaml cluster/

# generate manifests
openshift-install create manifests --dir=cluster

# set the control plane non-schedulable
sed -i 's/mastersSchedulable: true/mastersSchedulable: false/' cluster/manifests/cluster-scheduler-02-config.yml

# generate ignition files
openshift-install create ignition-configs --dir=cluster

# copy the ignition files to the helper node
scp cluster/*.ign user@192.168.110.39:/var/www/html/ignition
You may need to adjust permissions for the files on the helper node so that the containerized web server can access them.


Create and configure the VMs
From the libvirt host.
# create the disk images, set the directory according to your host
for NODE in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do
  sudo qemu-img create -f qcow2 /var/lib/libvirt/okd-images/$NODE.qcow2 120G
done

# set permissions
sudo chown qemu:qemu /var/lib/libvirt/okd-images/*
sudo chmod 600 /var/lib/libvirt/okd-images/*

# organization is good
mkdir -p node-configs

wget http://192.168.110.39:8080/install/kernel && sudo mv kernel /var/lib/libvirt/boot/
wget http://192.168.110.39:8080/install/initramfs.img && sudo mv initramfs.img /var/lib/libvirt/boot/
sudo chown qemu:qemu /var/lib/libvirt/boot/kernel
sudo chown qemu:qemu /var/lib/libvirt/boot/initramfs.img

# where to find the install files needed for direct kernel boot of the VMs
KERNEL='/var/lib/libvirt/boot/kernel'
INITRD='/var/lib/libvirt/boot/initramfs.img'
KERNEL_ARGS='coreos.live.rootfs_url=http://192.168.110.39:8080/install/rootfs.img rd.neednet=1 coreos.inst.install_dev=/dev/vda'

# set static IP configuration
IP_STR='ip=192.168.110.NODEIP::192.168.110.1:255.255.255.0:NODENAME.cluster.okd.lan:enp1s0:none nameserver=192.168.110.39'
IP_bootstrap=$(echo $IP_STR | sed 's/NODEIP/60/;s/NODENAME/bootstrap/')
IP_controlplane0=$(echo $IP_STR | sed 's/NODEIP/61/;s/NODENAME/controlplane-0/')
IP_controlplane1=$(echo $IP_STR | sed 's/NODEIP/62/;s/NODENAME/controlplane-1/')
IP_controlplane2=$(echo $IP_STR | sed 's/NODEIP/63/;s/NODENAME/controlplane-2/')
IP_worker0=$(echo $IP_STR | sed 's/NODEIP/65/;s/NODENAME/worker-0/')
IP_worker1=$(echo $IP_STR | sed 's/NODEIP/66/;s/NODENAME/worker-1/')

# bootstrap ignition location
BOOTSTRAP_IGNITION='coreos.inst.ignition_url=http://192.168.110.39:8080/ignition/bootstrap.ign'

# create the bootstrap machine
sudo virt-install \
  --virt-type kvm \
  --ram 12188 \
  --vcpus 4 \
  --os-variant fedora-coreos-stable \
  --disk path=/var/lib/libvirt/okd-images/bootstrap.qcow2,device=disk,bus=virtio,format=qcow2 \
  --noautoconsole \
  --vnc \
  --network network:okd \
  --boot hd,network \
  --install kernel=${KERNEL},initrd=${INITRD},kernel_args_overwrite=yes,kernel_args="${KERNEL_ARGS} ${IP_bootstrap} ${BOOTSTRAP_IGNITION}" \
  --name bootstrap \
  --print-xml 1 > node-configs/bootstrap.xml

# set the ignition location for the control plane nodes
CONTROL_IGNITION='coreos.inst.ignition_url=http://192.168.110.39:8080/ignition/master.ign'

# define the nodes, set values according to your host and desired outcome
for NODE in controlplane-0 controlplane-1 controlplane-2; do
  # jiggery pokery to get the IP string via variable reference
  ipvarname="IP_$(echo $NODE | sed 's/-//')"
  
  sudo virt-install \
    --virt-type kvm \
    --ram 12188 \
    --vcpus 4 \
    --os-variant fedora-coreos-stable \
    --disk path=/var/lib/libvirt/okd-images/$NODE.qcow2,device=disk,bus=virtio,format=qcow2 \
    --noautoconsole \
    --vnc \
    --network network:okd \
    --boot hd,network \
    --install kernel=${KERNEL},initrd=${INITRD},kernel_args_overwrite=yes,kernel_args="${KERNEL_ARGS} ${!ipvarname} ${CONTROL_IGNITION}" \
    --name $NODE \
    --print-xml 1 > node-configs/$NODE.xml
done

# set the worker ignition location
WORKER_IGNITION='coreos.inst.ignition_url=http://192.168.110.39:8080/ignition/worker.ign'

for NODE in worker-0 worker-1; do
  ipvarname="IP_$(echo $NODE | sed 's/-//')"

  sudo virt-install \
    --virt-type kvm \
    --ram 8192 \
    --vcpus 2 \
    --os-variant fedora-coreos-stable \
    --disk path=/var/lib/libvirt/okd-images/$NODE.qcow2,device=disk,bus=virtio,format=qcow2 \
    --noautoconsole \
    --vnc \
    --network network:okd \
    --boot hd,network \
    --install kernel=${KERNEL},initrd=${INITRD},kernel_args_overwrite=yes,kernel_args="${KERNEL_ARGS} ${!ipvarname} ${WORKER_IGNITION}" \
    --name $NODE \
    --print-xml 1 > node-configs/$NODE.xml
done

# define each of the VMs
for VM in `ls node-configs/`; do
  sudo virsh define node-configs/$VM
  
  # for some reason libvirt doesn't like the kernel and initd location, set it forcefully
  sudo virt-xml $VM --edit \
    --xml ./os/kernel=$KERNEL \
    --xml ./os/initrd=$INITRD
done


Install FCOS
Now we need to power on each of the VMs and let them boot the first time to install FCOS:
for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do
  sudo virsh start $VM
  
  # optionally, add a sleep here to not overwhelm the storage
  #sleep 30
done
The VMs will start, install FCOS, then power off. After they have powered off, we need to adjust the settings to boot from the drive as normal
for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do
  sudo virt-xml $VM --edit \
    --xml ./on_reboot=restart \
    --xml xpath.delete=./os/kernel \
    --xml xpath.delete=./os/initrd \
    --xml xpath.delete=./os/cmdline 
done


Finish deploying
Finally, start the VMs so that OKD can finish deploying.
for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do
  sudo virsh start $VM
done
Monitor the progress, and complete the deployment, using these commands, from the libvirt host.
cd ~/okd

# bootstrap
openshift-install wait-for bootstrap-complete --log-level=debug --dir=cluster

# turn off bootstrap when it's done
sudo virsh destroy bootstrap

# connect to the cluster
export KUBECONFIG=~/okd/cluster/auth/kubeconfig

# approve CSRs
watch -n 5 oc get csr

# when there are two CSRs pending, approve them
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

# two additional CSRs will be requested shortly after, repeat the commands above to approve them

# when the CSRs are approved, wait for the install to complete
openshift-install wait-for install-complete --log-level=debug --dir=cluster


Fin.
Cleanup

for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do
  sudo virsh destroy $VM
  sudo virsh undefine --domain $VM
done

sudo rm /var/lib/libvirt/okd-images/bootstrap.qcow2
sudo rm /var/lib/libvirt/okd-images/controlplane*.qcow2
sudo rm /var/lib/libvirt/okd-images/worker*.qcow2