What this PR does / why we need it: Poc for using multus network at kubevirt provider
Problems 1:
We need to expose ignition as load balancer so it can access over the public IP.
[ 342.877333] ignition[905]: GET https://ignition-server-clusters-live-migrate.apps.hypershift.qinqon.local/ignition: attempt #71
[ 342.903038] ignition[905]: GET error: Get "https://ignition-server-clusters-live-migrate.apps.hypershift.qinqon.local/ignition": dial tcp: lookup ignition-server-clusters-live-migrate.apps.hypershift.qinqon.local on 192.168.66.1:53: no such host
For libvirt testbed we can add one of the nodes at dnsmasq.d
# cat /etc/NetworkManager/dnsmasq.d/qinqon.local.conf
# The below defines a Wildcard DNS Entry.
address=/.qinqon.local/192.168.122.34
and disable firewalld to bypass 192.168.66.184: ICMP host 192.168.66.1 unreachable - admin prohibited filter, length 131
sudo systemctl disable --now firewalld
Problem 2
Looks like external routing is not working at libvirt testbed
time="2023-10-02T11:23:48Z" level=warning msg="Failed, retrying in 1s ... (1/3). Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3f8b9638308e4945ce5d93b5645df0666c92fd25f15476b30172f60cabf227a: pinging container registry quay.io: Get \"https://quay.io/v2/\": dial tcp 52.54.61.57:443: i/o timeout"
Activate masquerade at the libvirt hypervisor
iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
Problem 3
Hostname not set correctly at nodes.
[systemd]
Failed Units: 1
node-valid-hostname.service
[core@localhost ~]$ systemctl ^C
[core@localhost ~]$ hostnamectl
Static hostname: (unset)
Transient hostname: localhost
Icon name: computer-vm
Chassis: vm 🖴
Machine ID: 4fabdf5e8ca553a7b3bd7274a577e0e9
Boot ID: 0bcc0e7942c04622a25afc42437c5d6f
Virtualization: kvm
Operating System: Red Hat Enterprise Linux CoreOS 415.92.202309261919-0 (Plow)
CPE OS Name: cpe:/o:redhat:enterprise_linux:9::coreos
Kernel: Linux 5.14.0-284.34.1.el9_2.x86_64
Architecture: x86-64
Hardware Vendor: Red Hat
Hardware Model: OpenShift Virtualization
Firmware Version: 1.16.1-1.el9
This what the script is checking
#!/bin/bash
# First, we need to wait until DHCP finishes and the node has a non-`localhost`
# hostname before `kubelet.service` starts.
# That's the `--wait` argument as used by `node-valid-hostname.service`.
#
# Second, on GCP specifically we truncate the hostname if it's >63 characters.
# That's `gcp-hostname.service`.
We can check if the afterburn thing works now https://coreos.github.io/afterburn/usage/attributes/
It's possible to pass hostname to kernel but it will fail since the other kernel args are needed
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
name: worker1
annotations:
kubevirt.io/allow-pod-bridge-network-live-migration: ""
spec:
architecture: amd64
domain:
firmware:
kernelBoot:
container:
image: quay.io/fedora/fedora-coreos:stable
initrdPath: /usr/lib/modules/6.4.15-200.fc38.x86_64/initramfs.img
kernelPath: /usr/lib/modules/6.4.15-200.fc38.x86_64/vmlinuz
imagePullPolicy: Always
imagePullSecret: IfNotPresent
kernelArgs: mitigations=auto,nosmt ignition.firstboot ostree=/ostree/boot.1/fedora-coreos/5c26f6c66c4f1bc3ca5f1b5c99cc6dc8795495b718536b4468253056117ddad1/0 ignition.platform.id=kubevirt console=ttyS0,115200n8 console=tty0 hostname=foo
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- bridge: {}
name: pod
rng: {}
machine:
type: q35
resources:
requests:
memory: 512Mi
networks:
- pod: {}
name: pod
nodeSelector:
node-role.kubernetes.io/worker: ""
terminationGracePeriodSeconds: 5
volumes:
- containerDisk:
image: quay.io/fedora/fedora-coreos-kubevirt:stable
name: containerdisk
- cloudInitConfigDrive:
userData: '{"ignition":{"version":"3.3.0"},"passwd":{"users":[{"name":"core","passwordHash":"$y$j9T$b7RFf2LW7MUOiF4RyLHKA0$T.Ap/uzmg8zrTcUNXyXvBvT26UgkC6zZUVg3UKXeEp5"}]},"storage":{"files":[{"path":"/etc/nmstate/001-dual-stack-dhcp.yml","contents":{"compression":"gzip","source":"data:;base64,H4sIAAAAAAAC/4zKQQrCMBCF4f2c4l1AUBAXc5sxfaGBOh2SScHbiy5cd/n9/M2TvVrhULnA7UUFPW7jKkC+48tc2Z0pwEhLKmYI0OK4qwAA3Z4bF0X2yV9Z1hJ/tjgep0bAZu5l96qotg3KJwAA//+PTU/JngAAAA=="}},{"path":"/etc/nmstate/002-dual-sack-ipv6-gw.yml","contents":{"compression":"","source":"data:;base64,cm91dGVzOgogIGNvbmZpZzoKICAtIGRlc3RpbmF0aW9uOiA6Oi8wCiAgICBuZXh0LWhvcC1pbnRlcmZhY2U6IGVucDFzMAogICAgbmV4dC1ob3AtYWRkcmVzczogZDdiOjZiNGQ6N2IyNTpkMjJmOjoxCg=="}}]}}'
name: cloudinitdisk
Or set it directly with ssh and hostnamectl (looks like updating libvirt network do not work).
Problem 4
Workers are not authorized to create nodes, looks like something has to be done with living kubeconfig at nodes.
"Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="live-migrate-125e79f8-pxcbs.qinqon.local"
Problem 5
[ 212.924703] ignition[904]: GET error: Get "https://ignition-server-clusters-live-migrate.apps.hypershift.qinqon.local/ignition": dial tcp 192.168.122.253:443: connect: connection refused
[ 217.913046] ignition[904]: GET https://ignition-server-clusters-live-migrate.apps.hypershift.qinqon.local/ignition: attempt #46
Use libvirt forward mode=route
for both network default and secondary.
default
<network>
<name>default</name>
<forward mode='route'/>
<bridge name='virbr0' stp='on' delay='0'/>
<ip family='ipv4' address='192.168.122.1' prefix='24'>
<dhcp>
<range start='192.168.122.2' end='192.168.122.199'/>
</dhcp>
</ip>
</network>
secondary
<network>
<name>secondary</name>
<forward mode='route'/>
<bridge name='secondary' stp='on' delay='0'/>
<ip family='ipv4' address='192.168.66.1' prefix='24'>
<dhcp>
<range start='192.168.66.2' end='192.168.66.199'/>
</dhcp>
</ip>
</network>
Problem 6
4.15.0-ec.0 True False True 8m2s The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
error performing canary route check {"error": "error sending canary HTTP request to \"canary-openshift-ingress-canary.apps.live-migrate.apps.hypershift.qinqon.local\": Get \"https://canary-openshift-ingress-canary.apps.live-migrate.apps.hypershift.qinqon.local\": EOF"}
The passthrough service has to be headless one where hypershift create the endpoints pointing to the workers IPs.
svc
# oc get svc -n clusters-live-migrate default-ingress-passthrough-service-hthfgh5pld -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2023-10-04T09:58:48Z"
labels:
hypershift.openshift.io/infra-id: live-migrate-hrdnr
name: default-ingress-passthrough-service-hthfgh5pld
namespace: clusters-live-migrate
resourceVersion: "173880"
uid: 1954fb4c-77cb-4abc-bd1e-a2a9ec345d57
spec:
clusterIP: 172.30.129.187
clusterIPs:
- 172.30.129.187
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: https-443
port: 443
protocol: TCP
targetPort: 31148
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
Manually created endpoints
# cat endpoints.yaml
apiVersion: v1
kind: Endpoints
metadata:
namespace: clusters-live-migrate
name: default-ingress-passthrough-service-hthfgh5pld
subsets:
- addresses:
- ip: 192.168.66.101
- ip: 192.168.66.102
ports:
- port: 31148
name: https-443
protocol: TCP
Problem 7
Metallb expect control-plane nodes to install, removing those requirements make it install work correctly
Problem 8
When configuring vlan DHCP server from libvirt is not responding,
is needed to configure trunk at hypervisor bridge at all interfaces
bridge vlan add dev vnet23 vid 10 secondary
port:
- name: vnet23
stp-hairpin-mode: false
stp-path-cost: 100
stp-priority: 32
vlan:
enable-native: false
mode: trunk
trunk-tags:
- id: 10
- id: 20
- name: vnet25
stp-hairpin-mode: false
stp-path-cost: 100
stp-priority: 32
vlan:
enable-native: false
mode: trunk
trunk-tags:
- id: 10
- id: 20
- name: vnet27
stp-hairpin-mode: false
stp-path-cost: 100
stp-priority: 32
vlan:
enable-native: false
mode: trunk
trunk-tags:
- id: 10
- id: 20
There is no way to do this with libvirt network.