Skip to content

Instantly share code, notes, and snippets.

@PhilipSchmid
Last active August 1, 2023 17:47
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save PhilipSchmid/b7d7cc31c73c86bc5d4101d57744b1f8 to your computer and use it in GitHub Desktop.
Save PhilipSchmid/b7d7cc31c73c86bc5d4101d57744b1f8 to your computer and use it in GitHub Desktop.
How to set up a Rancher K8s cluster on VMware (incl. vSphere StorageClass)

Rancher K8s Cluster on VMware vSphere

Prerequisites

vCenter Configuration

Rancher vSphere Node Templates

Configure stuff like networking, folder, CPU, memory, etc.

Add the following cloud-init config YAML:

#cloud-config
users:
  - name: master
    shell: /bin/bash
    groups: wheel
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
    ssh-authorized-keys:
      - ssh-rsa AAAABaEQ...PbQ== My Awesome Key
packages:
 - open-vm-tools
write_files:
  - path: /root/configure-netplan.sh
    content: |
        #!/bin/bash
        vmtoolsd --cmd 'info-get guestinfo.ovfEnv' > /tmp/ovfenv
        IPAddress=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.address" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
        SubnetMask=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.netmask" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
        Gateway=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.route.0.gateway" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
        DNS=$(sed -n 's/.*Property oe:key="guestinfo.dns.servers" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
        
        cat > /etc/netplan/01-netcfg.yaml <<EOF
        network:
          version: 2
          renderer: networkd
          ethernets:
            ens192:
              addresses: 
                - $IPAddress/27
              gateway4: $Gateway
              dhcp6: no
              accept-ra: no
              nameservers:
                addresses : [$DNS]
        EOF
 
        sudo netplan apply
        sleep 30
        sudo systemctl start open-vm-tools
runcmd:
  - bash /root/configure-netplan.sh
bootcmd:
  - [ cloud-init-per, once, rmdefaultnetconf, rm, -f, /etc/netplan/50-cloud-init.yaml ]
  - [ cloud-init-per, once, tempstopvmtools, sudo, systemctl, stop, open-vm-tools ]

Please note: The "hack" with the open-vm-tools service is required since otherwise Rancher will try to connect to the nodes using the temporarily link-local IPv6 or temporarily DHCP IPv4 address. This would prevent Rancher from being able to access the notes in order to install Docker etc..

Check Provide a custom vApp config and set the following values (replace vlan-123 with the actual port group name!):

  • com.vmware.guestInfo IPv4 fixedAllocated
  • guestinfo.interface.0.ip.0.address ip:vlan-123
  • guestinfo.interface.0.ip.0.netmask ${netmask:vlan-123}
  • guestinfo.interface.0.route.0.gateway ${gateway:vlan-123}
  • guestinfo.dns.servers ${dns:vlan-123}

Rancher vSphere Cluster Creation

  1. Fisit https://rancher.example.com/g/clusters/add/select and select vSphere
  2. Fill out the regarding options:
    1. Cluster Name: test
    2. Create two type of node groups:
      1. Master nodes:
        1. Name Prefix: test-master-
        2. Count: 3
        3. Template: Ubuntu Bionic Master Test
        4. Auto Replace: 0 minutes (default value)
        5. etcd: checked
        6. Control Plane: checked
        7. Worker: unchecked
        8. Taints: none (default value)
      2. Worker nodes:
        1. Name Prefix: test-worker-
        2. Count: 3
        3. Template: Ubuntu Bionic Worker Test
        4. Auto Replace: 0 minutes (default value)
        5. etcd: unchecked
        6. Control Plane: unchecked
        7. Worker: checked
        8. Taints: none (default value)
    3. Member Roles: Add admins as Owner
    4. For the Kubernetes Options section, just click on Edit as YAML and replace the whole shown YAML with the one from 1-cluster-template.yaml.tmpl (or at least add the cloud_provider section).
    5. Click Create.

vSphere K8s Storage

Apply 2-vsphere-thin-standard.yaml (replace MY-VMWARE-DATA-STORE with your actual datastore name from the vSphere cluster): kubectl apply -f 2-vsphere-thin-standard.yaml

#
# Cluster Config
#
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: true
local_cluster_auth_endpoint:
enabled: true
name: 'test'
#
# Rancher Config
#
rancher_kubernetes_engine_config:
addon_job_timeout: 30
cloud_provider:
name: vsphere
vsphereCloudProvider:
global:
insecure-flag: true
virtual_center:
10.10.10.10:
# Use the "user@domain" syntax to workaround https://github.com/rancher/rancher/issues/16371
user: svc_user_rancher@vsphere.local
password: ------secret------
port: 443
datacenters: /mydc
workspace:
server: 10.10.10.10
folder: /mydc/vm/Prod/K8s/storage
default-datastore: /mydc/datastore/MY-VMWARE-DATA-STORE
datacenter: /mydc
resourcepool-path: RP-K8s
authentication:
strategy: x509
dns:
nodelocal:
ip_address: ''
node_selector: null
update_strategy: {}
ignore_docker_version: true
#
# # Currently only nginx ingress provider is supported.
# # To disable ingress controller, set `provider: none`
# # To enable ingress on specific nodes, use the node_selector, eg:
# provider: nginx
# node_selector:
# app: ingress
#
ingress:
provider: none
kubernetes_version: v1.17.6-rancher2-1
monitoring:
provider: metrics-server
replicas: 1
#
# If you are using calico on AWS
#
# network:
# plugin: calico
# calico_network_provider:
# cloud_provider: aws
#
# # To specify flannel interface
#
# network:
# plugin: flannel
# flannel_network_provider:
# iface: eth1
#
# # To specify flannel interface for canal plugin
#
# network:
# plugin: canal
# canal_network_provider:
# iface: eth1
#
network:
mtu: 0
options:
flannel_backend_type: vxlan
plugin: canal
#
# services:
# kube-api:
# service_cluster_ip_range: 10.43.0.0/16
# kube-controller:
# cluster_cidr: 10.42.0.0/16
# service_cluster_ip_range: 10.43.0.0/16
# kubelet:
# cluster_domain: cluster.local
# cluster_dns_server: 10.43.0.10
#
services:
etcd:
backup_config:
enabled: true
interval_hours: 12
retention: '20'
s3_backup_config:
access_key: ------secret------
bucket_name: backup-etcd-test
endpoint: minio.example.com
secret_key: ------secret------
safe_timestamp: false
creation: 12h
extra_args:
election-timeout: 5000
heartbeat-interval: 500
gid: 0
retention: 72h
snapshot: false
uid: 0
kube_api:
always_pull_images: false
pod_security_policy: false
service_node_port_range: 30000-32767
ssh_agent_auth: false
upgrade_strategy:
drain: true
max_unavailable_controlplane: '1'
max_unavailable_worker: '1'
node_drain_input:
delete_local_data: 'true'
force: false
grace_period: 60
ignore_daemon_sets: true
timeout: 120
scheduled_cluster_scan:
enabled: true
scan_config:
cis_scan_config:
debug_master: false
debug_worker: false
override_benchmark_version: rke-cis-1.4
profile: permissive
schedule_config:
cron_schedule: 0 0 * * *
retention: 24
windows_prefered_cluster: false
apiVersion: v1
items:
- allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
storageclass.kubernetes.io/is-default-class: "true"
labels:
cattle.io/creator: norman
name: vsphere-thin-standard
selfLink: /apis/storage.k8s.io/v1/storageclasses/vsphere-thin-standard
parameters:
datastore: MY-VMWARE-DATA-STORE
diskformat: thin
fstype: ext4
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate
kind: List
metadata:
resourceVersion: ""
selfLink: ""
@ahgraber
Copy link

ahgraber commented Nov 1, 2020

Thanks for sharing this publicly.
I'm having trouble replicating -- what version of ESXi are you running? Could you also provide details about how you created the VM template?

Thank you!!

@PhilipSchmid
Copy link
Author

PhilipSchmid commented Nov 1, 2020

Hi @ahgraber,
Please note that this gist is kind of outdated (at least for the open-vm-tools "hack" part) - it didn't work well and we chose to explicitly disable all network configuration via a cloud-init configuration (/etc/cloud/cloud.cfg.d/99_no_networking.cfg) directly built into our custom image:

network:
  config: disabled

We used this script here to create such a custom modified cloud-init image:

#!/bin/bash

wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.ova

tar -xf bionic-server-cloudimg-amd64.ova

cat > 99_no_networking.cfg <<EOL
network:
  config: disabled
EOL

virt-copy-in -a bionic-server-cloudimg-amd64.img 99_no_networking.cfg /etc/cloud/cloud.cfg.d/
virt-cat -a bionic-server-cloudimg-amd64.img /etc/cloud/cloud.cfg.d/99_no_networking.cfg

qemu-img convert -f qcow2 -O vmdk -o subformat=streamOptimized bionic-server-cloudimg-amd64.img ubuntu-bionic-18.04-cloudimg.vmdk

printf '\x03' | dd conv=notrunc of=ubuntu-bionic-18.04-cloudimg.vmdk bs=1 seek=$((0x4))

openssl sha256 ubuntu-bionic-18.04-cloudimg.vmdk ubuntu-bionic-18.04-cloudimg.ovf > ubuntu-bionic-18.04-cloudimg.mf

We then imported the image manually via the vCenter web UI:

  1. Right click on your folder where to would like to store the template and select Deploy OVF Template ...
  2. Use local file to upload the artifacts (select ALL the *.vmdk *.ovf and *.mf files)
  3. Use the name tpl-ubuntu-bionic-18.04-cloudimg-disabled-net
  4. Use the regarding datastore.
  5. The other configuration can be left default
  6. Convert the deployed VM to a template by a right click

Sorry but I can't tell you the exact used vSphere/ESXi version since I'm not working for this company anymore but it was vSphere/ESXi version 7+.

Hope this helps 😄.

Regards,
Philip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment