Skip to content

Instantly share code, notes, and snippets.

@jayunit100
Last active July 8, 2022 19:14
Show Gist options
  • Save jayunit100/54f29a63fac32fda884655e22f2df183 to your computer and use it in GitHub Desktop.
Save jayunit100/54f29a63fac32fda884655e22f2df183 to your computer and use it in GitHub Desktop.
Plumbing PCI Devices to Tanzu Framework

YTT Goo for Tanzu GPUs

Here is a relatively clean way to modify Tanzu to inject device and vendor Ids for PCIs that leverages kubernetes-sigs/cluster-api-provider-vsphere#1264.

I've pushed the image for this here:

#### docker.io/jayunit100/capv-gpu:geetika
wget https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.1.5/clusterctl-linux-amd64 
clusterctl upgrade apply --contract v1beta1

If anyone wants to try testing it.

You can create this cluster YAML by running tanzu cluster create....

tanzu cluster create -f tkg-omg-gpus-vc.yaml asdf --dry-run=true

With a yaml tkg-omg-gpus-vc.yaml such as:

AVI_...
CLUSTER_NAME:  tkg-omg-gpus-vc
CLUSTER_PLAN: dev
INFRASTRUCTURE_PROVIDER: vsphere
OS_ARCH: amd64
OS_NAME: ''
OS_VERSION: ''
VSPHERE_INSECURE: 'false'
VSPHERE_TLS_THUMBPRINT: 71:BD:FA:25:6E:79:EC:F3:05:F2:29:5C:37:D2:A4:97:9D:70:B6:95

pcivendorId0: 44444444
pcideviceId0: aaaaaaaa

The final output is a VsphereMachineTemplate with PCIs plumbed in per the spec in the above PR.

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  annotations:
    vmTemplateMoid: vm-44
  name: tkg-aditi-vc-worker
  namespace: default
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: /dc0
      datastore: /dc0/datastore/sharedVmfs-0
      diskGiB: 40
      folder: /dc0/vm/folder0
      memoryMiB: 8192
      network:
        devices:
        - dhcp4: true
          networkName: /dc0/network/VM Network
      numCPUs: 2
      #####################################
      pciDevices:
      - vendorId: 67
        deviceId: 63
      - deviceId: 33333333
        vendorId: 44444444
      ############# Note this needs to be added to our YTT somehow , not in this gist.
      # customVMXKeys:
      #   ossthru.allowP2P = true
      #   pciPassthru.RelaxACSforP2P = true
      #   pciPassthru.use64bitMMIO TRUE
      #   pciPassthru.64bitMMIOSizeGB 512
      resourcePool: /dc0/host/cluster0/Resources/rp0
      server: 10.180.101.57
      storagePolicyName: ""
      template: /dc0/vm/photon-3-kube-v1.22.8+vmware.1-tkg.1

Below are the changes required to Tanzu Framework installation files to support this

tkg/providers/config_default.yaml

+# admittedly this is a little hacky, but a string interpolation thing could be done as a 2nd iteration...
+pcivendorId0:
+pcideviceId0:
+# Add more of these (pcivendorId1,2,3...) or else, we can parse these out of a YTT string and be fancier
 #! ---------------------------------------------------------------------
 #! Cluster creation basic configuration
 #! ---------------------------------------------------------------------

tkg/providers/infrastructure-vsphere/v1.0.3/ytt/overlay.yaml

The main logic here is that we are injecting YAML for pci's if the user enters devices in, this loop supports up to 10 devices.

@@ -204,7 +204,13 @@ spec:
       resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
       server: #@ data.values.VSPHERE_SERVER
       template: #@ data.values.VSPHERE_TEMPLATE
-
+      #@overlay/match missing_ok=True
+      pcidevices:
+      #@ for x in range(0,10):
+      #@ if/end hasattr(data.values, "pcideviceId"+str(x)):
+      - deviceId: #@ getattr(data.values, "pcideviceId"+str(x))
+        vendorId: #@ getattr(data.values, "pcivendorId"+str(x))
+      #@ end
 #@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})

tkg/providers/ytt/lib/config_variable_association.star

The "star" file here is needed to confirm that this is a valid parameter for VSphere clusters, which it is of course..

@@ -11,6 +11,10 @@ load("@ytt:yaml", "yaml")
 def config_variable_association():

 return {
+
+"pcivendorId0": ["vsphere"],
+"pcideviceId0": ["vsphere"],
+
 "CLUSTER_NAME": ["vsphere", "aws", "azure", "tkg-service-vsphere", "docker"],
 "CLUSTER_PLAN": ["vsphere", "aws", "azure", "tkg-service-vsphere", "docker"],
 "NAMESPACE": ["vsphere", "aws", "azure", "tkg-service-vsphere", "docker"],

UPDATE

Note that we need to add VMX parameters to

      #####################################
      pciDevices:
      - vendorId: 67
        deviceId: 63
      customVMXKeys:
        omg: sagar
        ossthru.allowP2P = true
        pciPassthru.RelaxACSforP2P = true
        pciPassthru.use64bitMMIO TRUE
        pciPassthru.64bitMMIOSizeGB 512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment