Skip to content

Instantly share code, notes, and snippets.

@svrc
Last active February 14, 2023 00:22
Show Gist options
  • Save svrc/e89f0e4af0733cc33814c9ec7f235188 to your computer and use it in GitHub Desktop.
Save svrc/e89f0e4af0733cc33814c9ec7f235188 to your computer and use it in GitHub Desktop.

Vmware Tanzu Kubernetes Grid and VMware Telco Cloud Automation

Welcome! This is a series of quick guides to setting up dev/test/lab environment with Tanzu Kubernetes Grid and/or with VMware Telco Cloud Automation.

This isn't a replacement for the official documentation but rather is a curated, streamlined set of "how tos" from several locations based on my experiences.

Intalling Tanzu Kubernetes Grid and NSX Advanced Load Balancer Lite

This is a distillation of the official docs.

Prerequisites

  • Internet-connected (proxies are OK) Linux VM as a bastion/jumpbox host with minimum 8 GB RAM and 2 vCPU, 200 GB disk.
    • Software requirements on jumpbox:
      • Docker Community Edition
      • Chrony or NTPD, configure to point to your corporate NTP if desired
      • (optional) DNSmasq for local DNS and/or DHCP server; DHCP is required for Tanzu Kubernetes Grid 1.3.x
      • curl, git, openssh (client and server), openssl, tar, gzip, jq
      • if Airgapped
  • At least one VDS Portgroup for a combined Management & Data network.

Setting up Linux jumpbox Docker CE with HTTP or SOCKS proxy

  • For Docker Daemon, create a Systemd configuration file /etc/systemd/system/docker.service.d/http-proxy.conf for setting proxy environment variables
    • change 192.168.1.5:8118 to your proxy host/port
    • change http to whatever protocol your proxy is e.g. socks5 or socks5h (remote DNS resolution over socks)
    • 172.17.0.0/16, 172.18.0.0/16, localhost, 127.0.0.1, cluster.local are mandatory in NO PROXY to cover internal docker networks and/or kubernetes KIND cluster networks
    • change harbor.bigco.lab to the DNS you want for Harbor, if installing Harbor
    • change 192.168.1.0/24 to your lab network(s)
    [Service]
    Environment="HTTP_PROXY=http://192.168.1.5:8118"
    Environment="HTTPS_PROXY=http://192.168.1.5:8118"
    Environment="NO_PROXY=harbor.bigco.lab,127.0.0.1,localhost,192.168.1.0/24,172.17.0.0/16,172.18.0.0/16,cluster.local"
    
  • For Docker CLI, in user home directories, create a config file ~/.docker/config.json
    {
      "proxies": {
         "default": {
            "httpProxy": "http://192.168.1.5:8118",
            "httpsProxy": "http://192.168.1.5:8118",
            "noProxy": "harbor.bigco.lab,127.0.0.1,localhost,192.168.1.0/24,172.17.0.0/16,172.18.0.0/16,cluster.local"
          }
       }
    }
    

Downloads

  • If using with TCA 1.9.0 use TKG 1.3.0; TCA 1.9.1 can work with TKG 1.3.1
  • VMware Customer Download: Tanzu Kubernetes Grid
    • Download to Linux jumpbox:
      • VMware Tanzu CLI for v1.3.0 for Linux
      • kubectl cluster cli v1.20.4 for Linux
      • YQ CLI (optional, for airgap)
    • Download & Upload to vCenter (import these into vCenter as an OVF, then convert to template):
      • Tanzu Kuberentes Grid 1.3.0 / Telco Cloud Automation 1.9.0 operating system OVAs
        • Photon v3 Kubernetes v1.19.8 OVA (if you want to test upgrades)
        • Photon v3 Kubernets v1.20.4 OVA

Install CLIs

  • Unpack Tanzu CLI and Kubectl
    • tar zxvf tanzu-cli-bundle-v1.3.0-linux-amd64.tar (or similar filename)
    • gunzip kubectl-linux-v1.20.5-vmware.1.gz (or similar filename)
    • mv kubectl-linux-v1.20.5-vmware.1 /usr/local/bin/kubectl
    • chmod +x /usr/local/bin/kubectl
  • Navigate to the unpacked tanzu/cli folder unpacked from the Tanzu CLI and install the binary
    • sudo install core/v1.3.0/tanzu-core-linux_amd64 /usr/local/bin/tanzu
  • Install Tanzu plugins
    • tanzu plugin clean
    • tanzu plugin install --local cli all
  • Check the plugin list
    • tanzu plugin list
    • Example:
      • NAME                LATEST VERSION  DESCRIPTION                                                        REPOSITORY  VERSION  STATUS                                       
        cluster             v1.3.1          Kubernetes cluster operations                                      core        v1.3.1   installed
        login               v1.3.1          Login to the platform                                              core        v1.3.1   installed
        pinniped-auth       v1.3.1          Pinniped authentication operations (usually not directly invoked)  core        v1.3.1   installed
        kubernetes-release  v1.3.1          Kubernetes release operations                                      core        v1.3.1   installed
        management-cluster  v1.3.1          Kubernetes management cluster operations                           tkg         v1.3.1   installed
        
  • Install the Carvel tools, included with the Tanzu CLI bundle
    • YTT, a yaml templating cli
    • gunzip ytt-linux-amd64-v0.31.0+vmware.1.gz (or similar filename)
    • sudo mv ./ytt-linux-amd64-v0.31.0+vmware.1 /usr/local/bin/ytt
    • sudo chmod +x /usr/local/bin/ytt
    • KAPP, a kubernetes application install utility
    • gunzip kapp-linux-amd64-v0.36.0+vmware.1.gz (or similar filename)
    • sudo mv ./kapp-linux-amd64-v0.36.0+vmware.1 /usr/local/bin/kapp
    • sudo chmod +x /usr/local/bin/kapp
    • KBLD, a kubernetes image build/relocation/management utility
    • gunzip kbld-linux-amd64-v0.28.0+vmware.1.gz (or similar filename)
    • sudo mv ./kbld-linux-amd64-v0.28.0+vmware.1 /usr/local/bin/kbld
    • sudo chmod +x /usr/local/bin/kbld
    • IMGPKG, a way to store files inside a container image, used by Tanzu for config templates
    • gunzip imgpkg-linux-amd64-v0.5.0+vmware.1.gz (or similar filename)
    • sudo mv ./imgpkg-linux-amd64-v0.5.0+vmware.1 /usr/local/bin/imgpkg
    • sudo chmod +x /usr/local/bin/imgpkg
  • Install the YQ binary you downloaded
    • sudo mv yq_linux_amd64 /usr/local/bin/yq
    • sudo chmod +x /usr/local/bin/yq

Configure DNSmasq for DHCP (if needed)

  • Edit /etc/dnsmasq.conf
    • listen-address=192.168.1.5 (replace with your jumpbox network address)
    • listen-address=127.0.0.1 (seperate line)
    • dhcp-range=192.168.1.50,192.168.1.180,12h (change to the range of DHCP addresses you want dnsmasq to manage)
    • If you ever need to reserve a static IP add:
    • dhcp-host=00:50:56:ab:6d:db,192.168.1.57 (MAC address, IP address)
    • Restart DNSmasq after every config change
    • sudo systemctl restart dnsmasq
  • This doesn't setup DNSmasq for DNS, the Airgap guide will do that.

Tanzu Kubernetes Grid is now ready for cluster creation!

  • Now we'll setup NSX ALB (Avi) so that any clusters we create will get HA service type load balancers.

Installing NSX Advanced Load Balancer (Avi)

Tanzu includes the "Essentials" version of NSX ALB which allows for Layer 4 (TCP/UDP) Kubernetes Service Type Load Balancers only. This is an active/passive similar-to-VRRP HA load balancer.

There's also the "Basic" version of Avi if you are an NSX-T customer which also allows for Ingress controllers (Layer 7 HTTP, virtual host routing, on top of a TCP Service Type Load Balalncer) implemented and managed by Avi.

Full NSX ALB Enterprise supports multi-cloud, multi-cluster controllers, BGP ECMP scale-out load balancers, multi-cluster GSLB, etc.

Offical Docs Links

Downloads

  • VMware NSX Advanced Load Balancer
    • This will allow VMware Customer Connect users to federate into the Avi networks download site
    • Download VMware Controller OVA 20.1.5 & Upload to vCenter
      • vApp properties should be fairly self-explantory management network settings
      • Avi controller management IP should be a static IP either via DHCP reservation or by putting the network information in the vApp properties
      • Leave the key field in the template empty.

Setup the Avi Controller

Conceptual Explanation

  • Avi will create on-demand VMs, called Service Engines, to serve up traffic. By default this is an "N+M" buffered HA, which is like a more sophisticated version of an Active/Passive VRRP setup. Avi will expose a Virtual IP address for each Service Type Load Balancer using Layer 2 ARP/GARP, and perform some novel Layer 2 load balancing tricks among its service engine members to distribute traffic.
  • Avi can also be setup for BGP ECMP scale-out load balancing, though that's not discussed here.
  • Each Avi service engine is "two armed", i.e. it has a Management NIC and a Data NIC. These can be on the same network if you want. One is for the Avi Controller to configure the service engines, the other is for data traffic to be served.
  • Avi can rely on DHCP or its own static IPAM. The common practice is to use DHCP for the Management network and static IPAM for the data network (SEs and their VIPs).
  • In this guide we'll just assume one network for everything, and we'll carve out a non-DHCP managed range for the Avi data network VIPs & SEs.

Step by Step Guide (This all can be automated via API too!)

  1. Open a browser to the controler IP
  2. Configure a password for the admin accountr
  3. Set DNS resolvers and NTP information, along with the backup passphase, then -> Next
  4. Select None for SMTP configuration, then -> Next
  5. For Orchestrator Integration, select VMware
  6. Enter the vCenter credentials and FQDN (an admin user will do, amore narrow role are in the Avi documentation)
  7. for SDN integration select None, then -> Next
  8. Select the vSphere Datacenter
  9. For System IP Address Management, select DHCP. This assumes that dynamically created VMs for service engines will be assigned management IPs via a DHCP server on their subnet.
  10. For Virtual Service Placement Settings leave both boxes unchecked, then -> Next
  11. Select a distributed virtual switch for the Management network (this should be the same network as the Controller OVA), select DHCP, and then -> Next
  12. For Support Multiple Tenants, Select No
  13. In the main controller UI, navigate to Applications > Templates > Profiles > IPAM/DNS Profiles, then -> Create, select IPAM Profile
  14. Enter an aribtrary name for the IPAM profile. The Type Should be Avi Vantage IPAM, Leave Allocate IP in VRF unchecked.
  15. Click Add Usable Network, Select Default-Cloud, for Usable Network, select the Management VDS portgroup
  16. Click Save
  17. In the main controller UI, navigate to Infrastructure > Networks, and configure your Portgroup to have a Static IP block for your Load Balancer VIPs and/or Service Engine IPs.
  18. In the main controller UI, navigate to Infrastructure > Clouds, select Default-Cloud, and select the IPAM profile in the drop down that we created in steps 13-16.
  19. Finally, we need to create a TLS cert for the Avi controller for a trust-relationship with the TKG management cluster.
  20. In the main controller UI, select Templates > Security > SSL/TLS Certificates, then -> Create and select Controller Certificate
  21. Enter the same name in the Name and Common Name boxes. Select Self-Signed. For Subject Alternative Name, enter the IP address of the avi controller VM, then Save
  22. Select the certificate in the list and click the Export icon so we can import this CA self-signed cert into TKG later.
  23. In the main controller UI, select Admnistration > Settings > Access Settings, click the edit icon in System Access Settings
  24. Delete existing SSL/TLS certificates. Use the SSL/TLS Certifcate Drop down menu to add the newly created custom certificate.

Next steps!

  • If running airgapped (No Internet access except the jumpbox), follow the "Setting up Tanzu Kuberentes Grid for Airgap" guide
  • If using Tanzu standalone, follow the "Creating Tanzu Kubernetes Clusters Standalone" guide
  • If using TCA, keep reading in order!
    1. First you need to "Installing Telco Cloud Automation"
    2. (Then, if airgapped) "Setting up Telco Cloud Automation for Airgap"
    3. Then, follow the "Creating Tanzu Kuberentes Clusters with TCA" guide

Why would I use TKG Standalone?

  • You want to get a cluster launched ASAP
  • You just want a vanilla Kubernetes cluster with no special networking or performance requirements.
  • You want to learn how TKG works with Cluster API "under the covers", since this is what TCA wraps.

Note this assumes HTTP Proxy access to the internet for image downloads; airgap setup is below

Placeholder docs

Working on a distallation of these, but for now, here are the official docs:

Deploying a Management cluster

  1. (For reference) Prepare a Management cluster: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-mgmt-clusters-vsphere.html (you should already have done this from doc 01- installing TKG)
  2. Use the installer interface to bootstrap a config: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-mgmt-clusters-deploy-ui.html
  3. TKG config file reference: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-mgmt-clusters-create-config-file.html

Deploying a workload cluster

  1. Deploy Tanzu Kubernetes Clusters https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-tanzu-k8s-clusters-deploy.html
  2. Example Manifest: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-tanzu-k8s-clusters-vsphere.html

Why do I need TCA?

Telco Cloud Automation (TCA) provides full infrastructure automation, i.e.

  • Starting with raw ESXi hardware, it will install/configure most of an SDDC: vCenter, vRealize Orchstrartor, vRealize Log Insight, VSAN, NSX, TCA control planes, and Tanzu Kubernetes management or workload clusters aligned to the Telco Cloud Platform 5G Edition Reference Architecture
  • Automated lifecycle management (upgrades) for Tanzu Kubernetes clusters, and support to be released in a future release for the rest of the SDDC

TCA provides enhanced Tanzu Kubernets Grid (TKG) for Telco features beyond the standard TKG release

  • Node pools on different vSphere clusters and different VM sizes
  • Remote worker nodes (i.e. on remote/edge ESXi nodes)
  • VM Anti-Affinity rules
  • CSI NFS provisioner and client
  • Mutlus CNI (which also is in TKG GA 1.4 forthcoming)
  • Multi vNIC , including SR-IOV vNICs, on the worker nodes (which TKG 1.4 GA doesn't yet do)

TCA provides for installing CNF software and customizing both the OS and VMs for that software

  • Assumes that most CNF deployers / overall network managers aren't Kubernetes experts so provides a GUI for knowing the status/health and instantiation of CNFs

  • Also dramatically simplifies the CI/CD required to manage an entire global rollout of CNFs across domains & clusters - CI/CD like Concourse is great and necessary but should be glue to orchestrate-the-orchestrators, rather than the platform that does everything itself (which never works).

  • Most CNFs are just Helm charts wrapped with extra files/metadata in a ZIP file called a CSAR (Cloud Service Archive). The main metadata is a TOSCA YAML file, which describes the Helm chart and any VM/OS customizations. TOSCA and CSAR are an OASIS standard that was adopted by ETSI for VNF packaging, and being repurposed for CNF & Kubernetes. You can't really do this kind of node specs/customization solely with Helm/K8s beyond some DaemonSet hackery, this standard feels more clean/standardized though likely isn't the endgame in this space. See https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/004/02.05.01_60/gs_nfv-sol004v020501p.pdf

  • Some VM customization examples:

    • NUMA alignment
    • CPU static scheduling
    • SR-IOV NIC assignment
  • Some OS configuration examples

    • Realtime kernel
    • Extra packages (e.g. PCI drivers)
    • SR-IOV OS configuration
    • DPDK OS configuration

Requirements

  1. TCA has a single OVA in either Manager or Control Plane mode. THe Manager is the GUI and API. The Control Plane is paired with each vCenter you want to manage.
  2. TCA requires vRealize Orchestrator (vRO) though this can be a shared instance across many vCenters

Preparation

Manager Install

  1. Deploy the OVF Template with vCenter, vApp properties should be fairly self-explantory management & admin login settings
  2. Select the Appliance Role (Manager) in the vApp properites (this is optional, you can do it in the GUI)
  3. Boot the VM, when it's up, open a web browser to https://ip-address:9443
  4. Login to the TCA management interface.
  5. For license, click Activate Later. We'll need an HTTP Proxy to activate the license at some point in the Configuration tab.
  6. Select the location of the Manager on the map, click continue
  7. Enter an arbitrary system name, click continue
  8. For SSO server URL, enter the vCenter Server URL or the Platform Services Controller (PSC) URL.
  9. Review the system and click Restart

Control plane install

For each existing vCenter you want to deploy Tanzu clusters on, you need a TCA CP appliance. Similar to the above:

  1. Deploy the OVF Template with vCenter, vApp properties should be fairly self-explantory management & admin login settings
  2. Select the Appliance Role (Control Plane) in the vApp properites (this is optional, you can do it in the GUI on boot)
  3. Boot the VM, when it's up, open a web browser to https://ip-address:9443
  4. Login to the TCA management interface.
  5. For license, click Activate Later. We'll need an HTTP Proxy to activate the license at some point in the Configuration tab.
  6. Select the location of the Control Plane VM on the map, click continue
  7. Enter an arbitrary system name, click continue
  8. Enter the config details of the vSphere cloud you want to connect to. a. vCenter Server URL, username & password a. (optional) NSX Manager URL, username & password a. SSO Server URL (usually just the vCenter URL again) a. vRealize Orchestrator URL (leave blank for now until you deploy it - note with vRO 8.x this is port 443)
  9. Review the system and click Restart

Login to the TCA Manager

  1. Use the vCenter login (e.g. administrator@vsphere.local) on https://tca-ip (regular port 443).
  2. From here we can create Tanzu management clusters & Tanzu clusters in the "Caas Administration" setting. If this is greyed out, we need to activate the license key. This will be discussed later.

Why do I need vRO?

  • TCA allows for Workflows to conform to the ETSI MANO-SOL interoperability standard, using vRO to accomplish this
  • It is not otherwise used by Tanzu Kubernetes Clusters provisioned by TCA. I think it's a mandatory install, haven't really tried skipping this.

Requirements

  • vRO must have a DNS record and static IP address. This domain must also have a PTR (reverse domain) record.

Minimal vRO Install

  • Download the vRealize Orchestrator OVA Appliance
  • Upload it to vCenter as an OVF template, the vApp properties should be mostly self-explanatory network & admin settings. be sure to use the FQDN of the DNS record you created.
  • Boot the appliance, and wait a few minutes for it to initialize
  • Login to the appliance https://vro_fqdn/vco , with the "root" username, and password as configured in the vApp properties. Validate it's up.

Link to TCA

  • Log in to one of your TCA Control Plane VMs on https://tca-cp:9443
  • On the Configure vRealize Orchestrator with https://vro-fqdn (this should be port 443 standard https for vRO 8.x)

Requirements

  • Harbor for Tanzu kubernetes Grid (or some other container registry). We'll install this on the jumpbox.
  • A web server (e.g. NGINX) for PhotonOS updates or extra package installs. We'll also install this on the jumpbox
  • an Internet (HTTP Proxy OK) connected jumpbox with a Docker CE daemon.

Installing Harbor

  • Download the Harbor offline installer onto the jumpbox
  • Untar the installer tar zxvf harbor-offline-installer-v2.3.0.tgz in the directory you want it in
  • Generate a TLS cert. Here's how to do a self-CA-signed one with OpenSSL.
    1. openssl genrsa -out ca.key 4096 to generate the CA key
    2. openssl req -x509 -new -nodes -sha512 -days 3650 -subj "/C=CA/L=Toronto/O=bigco/OU=lab/CN=harborCA" -key ca.key -out ca.crt to generate the CA cert, change the values to your preferences
    3. openssl genrsa -out harbor.bigo.lab.key 4096 to generate the server key
    4. openssl req -sha512 -new -subj "/C=CA/L=Toronto/O=bigco/OU=Lab/CN=harbor.bigo.lab" -key yourdomain.com.key -out yourdomain.com.csr to generate the server cert and CSR, edit the CN to your desired FQDN for harbor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment