Skip to content

Instantly share code, notes, and snippets.

@ismailbay
Last active February 21, 2024 16:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ismailbay/5316a3416a2d21d6fda6bca67159a352 to your computer and use it in GitHub Desktop.
Save ismailbay/5316a3416a2d21d6fda6bca67159a352 to your computer and use it in GitHub Desktop.
sidero-baremetal

Sidero Metal (CAPI) bare metal Kubernetes Provisioning

VM with Docker as Controlplane

  • create a VM, install docker and Sidero prerequisities
  • bootstrap the controlplane node:
    # host ip of the VM
    export HOST_IP="192.168.20.51"
    
    talosctl cluster create \
    --name sidero \
    -p 67:67/udp,69:69/udp,8081:8081/tcp,51821:51821/udp \
    --workers 0 \
    --config-patch '[{"op": "add", "path": "/cluster/allowSchedulingOnControlPlanes", "value": true}]' \
    --endpoint $HOST_IP

OpenWrt adaptations for PXE

In order to boot over network, I had to advert the tftp server of Sidero over my router which is the primary DNS for the worker machines. Add to /etc/config/dhcp on OpenWrt:

config boot linux
option filename 'snp.efi'
option serveraddress '192.168.20.51'
option servername 'sidero'
/etc/init.d/dnsmasq restart

Install Sidero

Execute in the VM shell:

export SIDERO_CONTROLLER_MANAGER_AUTO_BMC_SETUP=false
export SIDERO_CONTROLLER_MANAGER_HOST_NETWORK=true
export SIDERO_CONTROLLER_MANAGER_DEPLOYMENT_STRATEGY=Recreate
export SIDERO_CONTROLLER_MANAGER_API_ENDPOINT=192.168.20.51
export SIDERO_CONTROLLER_MANAGER_SIDEROLINK_ENDPOINT=192.168.20.51

clusterctl init -b talos -c talos -i sidero

This takes a while. Once successfully executed, check that all pods are in Running state:

kubectl get pods -A

Result:

NAMESPACE       NAME                                            READY   STATUS    RESTARTS       AGE
cabpt-system    cabpt-controller-manager-6cf56cbfcb-52zdz       1/1     Running   2 (130m ago)   132m
cacppt-system   cacppt-controller-manager-5b4b84c466-n5sxh      1/1     Running   1 (131m ago)   132m
capi-system     capi-controller-manager-69c6947c7d-xh9tz        1/1     Running   1 (130m ago)   132m
cert-manager    cert-manager-5698c4d465-zqsb2                   1/1     Running   0              133m
cert-manager    cert-manager-cainjector-d4748596-vw5fr          1/1     Running   0              133m
cert-manager    cert-manager-webhook-65d78d5c4b-46fmz           1/1     Running   0              133m
kube-system     coredns-78f679c54d-88rgv                        1/1     Running   0              139m
kube-system     coredns-78f679c54d-9smhw                        1/1     Running   0              139m
kube-system     kube-apiserver-sidero-controlplane-1            1/1     Running   0              138m
kube-system     kube-controller-manager-sidero-controlplane-1   1/1     Running   5 (131m ago)   138m
kube-system     kube-flannel-bkw7z                              1/1     Running   0              139m
kube-system     kube-proxy-8gcwn                                1/1     Running   0              139m
kube-system     kube-scheduler-sidero-controlplane-1            1/1     Running   5 (131m ago)   138m
sidero-system   caps-controller-manager-74db9cc77f-6sc9n        2/2     Running   1 (130m ago)   131m
sidero-system   sidero-controller-manager-66bf879798-w6psr      5/5     Running   1 (130m ago)   131m

✅ The management cluster has been initialized successfully!

Serverclass

cat <<EOF | kubectl create -f -
apiVersion: metal.sidero.dev/v1alpha2
kind: ServerClass
metadata:
  name: m720q
spec:
  qualifiers:
    hardware:
      - system:
          manufacturer: LENOVO
          family: ThinkCentre M720q
  configPatches:
    - op: replace
      path: /machine/install/disk
      value: /dev/sda
    # - op: add
    #   path: /cluster/extraManifests
    #   value:
        # - https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
        # - https://gist.githubusercontent.com/ismailbay/13ead7f4a3147ef82b455d839a632b91/raw/3132c1af9a75efb03b445493302a586739768168/csr-approver.yaml
    - op: add
      path: /machine/network
      value:
        interfaces:
          - interface: eth0
            dhcp: true
            # vip:
              # ip: 192.168.20.254
        nameservers:
          - 192.168.1.2
          - 192.168.1.1
          - 1.1.1.1
        disableSearchDomain: true
    - op: add
      path: /machine/time
      value:
        disabled: false
        servers:
          - 192.168.1.1
          - time.cloudflare.com
    - op: add
      path: /machine/features
      value:
        rbac: true
        stableHostname: true
        apidCheckExtKeyUsage: true
        diskQuotaSupport: true
    - op: add
      path: /machine/certSANs
      value:
        - cluster
        - cluster.local
        - homelab
        - homelab.local
        - homelab.ibay.dev
    - op: add
      path: /machine/kubelet/extraArgs
      value:
        feature-gates: CronJobTimeZone=true,GracefulNodeShutdown=true
        rotate-server-certificates: true
    - op: add
      path: /cluster/allowSchedulingOnControlPlanes
      value: true
EOF

Boot Sequence on worker machines

Set 'UEFI PXE IPv4' as first boot device and boot.

If successfully booted over network, the devices should be registered by the controlplane:

kubectl get servers
NAME                                   HOSTNAME   ACCEPTED   CORDONED   ALLOCATED   CLEAN   POWER   AGE
18205380-35e7-11e9-b80f-c11c52871100   m720q-03   false                                     on      6s
49f7e100-360c-11e9-8335-7bbe5b721100   m720q-01   false                                     on      2m35s
a1ca2400-35db-11e9-a47d-786c8b771100   m720q-02   false                                     on      106s

Accept servers

⚠️ Accepting a newly-connected computer will wipe its disk!

KUBE_EDITOR="vim" kubectl edit servers

in Vim:

:%s/accepted: false/accepted: true/g

:wq

Check state

kubectl get servers
NAME                                   HOSTNAME   ACCEPTED   CORDONED   ALLOCATED   CLEAN   POWER   AGE
18205380-35e7-11e9-b80f-c11c52871100   m720q-03   true                              true    on      2m46s
49f7e100-360c-11e9-8335-7bbe5b721100   m720q-01   true                              true    on      5m15s
a1ca2400-35db-11e9-a47d-786c8b771100   m720q-02   true                              true    on      4m26s

Create a cluster config

export CONTROL_PLANE_SERVERCLASS=m720q
export WORKER_SERVERCLASS=m720q
export TALOS_VERSION=v1.5.4
export KUBERNETES_VERSION=v1.28.3
export CONTROL_PLANE_PORT=6443
export CONTROL_PLANE_ENDPOINT=192.168.20.100

clusterctl generate cluster homelab -i sidero > homelab-cluster.yaml
kubectl apply -f homelab-cluster.yaml && \
kubectl scale taloscontrolplane homelab-cp --replicas=3 && \
watch kubectl get servers,machines,clusters

grab a ☕️

after a while the state should transtition from:

NAME                                                           HOSTNAME   ACCEPTED   CORDONED   ALLOCATED   CLEAN   POWER   AGE
server.metal.sidero.dev/18205380-35e7-11e9-b80f-c11c52871100   m720q-03   true                              true    on      19m
server.metal.sidero.dev/49f7e100-360c-11e9-8335-7bbe5b721100   m720q-01   true                              true    on      22m
server.metal.sidero.dev/a1ca2400-35db-11e9-a47d-786c8b771100   m720q-02   true                              true    on      21m

NAME                                        CLUSTER   NODENAME   PROVIDERID   PHASE          AGE   VERSION
machine.cluster.x-k8s.io/homelab-cp-nxr8g   homelab                           Provisioning   9s    v1.28.3
machine.cluster.x-k8s.io/homelab-cp-qlbpc   homelab                           Provisioning   8s    v1.28.3
machine.cluster.x-k8s.io/homelab-cp-xlvlc   homelab                           Provisioning   10s   v1.28.3

NAME                               PHASE         AGE   VERSION
cluster.cluster.x-k8s.io/homelab   Provisioned   12s

to:

NAME                                                           HOSTNAME   ACCEPTED   CORDONED   ALLOCATED   CLEAN   POWER   AGE
server.metal.sidero.dev/18205380-35e7-11e9-b80f-c11c52871100   m720q-03   true                  true        false   on      38m
server.metal.sidero.dev/49f7e100-360c-11e9-8335-7bbe5b721100   m720q-01   true                  true        false   on      40m
server.metal.sidero.dev/a1ca2400-35db-11e9-a47d-786c8b771100   m720q-02   true                  true        false   on      39m

NAME                                        CLUSTER   NODENAME   PROVIDERID                                      PHASE     AGE   VERSION
machine.cluster.x-k8s.io/homelab-cp-nxr8g   homelab   m720q-01   sidero://49f7e100-360c-11e9-8335-7bbe5b721100   Running   18m   v1.28.3
machine.cluster.x-k8s.io/homelab-cp-qlbpc   homelab   m720q-02   sidero://a1ca2400-35db-11e9-a47d-786c8b771100   Running   18m   v1.28.3
machine.cluster.x-k8s.io/homelab-cp-xlvlc   homelab   m720q-03   sidero://18205380-35e7-11e9-b80f-c11c52871100   Running   18m   v1.28.3

NAME                               PHASE         AGE   VERSION
cluster.cluster.x-k8s.io/homelab   Provisioned   18m

Kubernetes & Talos config

# grab the talosconfig
kubectl --context admin@sidero get secret homelab-talosconfig -o jsonpath='{.data.talosconfig}' | base64 -d > homelab-talosconfig
# check talos dashboard
talosctl --talosconfig homelab-talosconfig --nodes 192.168.20.100,192.168.20.101,192.168.20.102 dashboard
# get the kubeconfig (writes to $USER/.kube/config)

talosctl --talosconfig homelab-talosconfig --nodes 192.168.20.254 kubeconfig
# check the final state
ismailbay@docker-vm:~$ kubectl --context admin@sidero get nodes
NAME                    STATUS   ROLES           AGE     VERSION
sidero-controlplane-1   Ready    control-plane   6h16m   v1.28.2

ismailbay@docker-vm:~$ kubectl --context admin@homelab get nodes
NAME       STATUS   ROLES           AGE   VERSION
m720q-01   Ready    control-plane   29m   v1.28.3
m720q-02   Ready    control-plane   14m   v1.28.3
m720q-03   Ready    control-plane   29m   v1.28.3

Reboot

As a validation that everyhing is working as expected, reboot all nodes:

talosctl --talosconfig homelab-talosconfig -n 192.168.20.100,192.168.20.101,192.168.20.102 reboot

After a few minutes the talos cli should print a success message. Check the state of the cluster using:

talosctl --talosconfig homelab-talosconfig -n 192.168.20.100,192.168.20.101,192.168.20.102 dashboard

Destroy

# wipe out machines
kubectl delete cluster homelab 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment