Skip to content

Instantly share code, notes, and snippets.

@tahmmee
Last active November 13, 2018 01:04
Show Gist options
  • Save tahmmee/9e3c35adccf7705443950aad42e23a68 to your computer and use it in GitHub Desktop.
Save tahmmee/9e3c35adccf7705443950aad42e23a68 to your computer and use it in GitHub Desktop.
Life & times of an AKS disk mount
  1. PV Disk is created. Unattached to any vm
  2. Pod is created
Nov 12 21:20:08 aks-agentpool-38464279-0 kubelet[3501]: I1112 21:20:08.674963    3501 kubelet.go:1837] 
SyncLoop (ADD, "api"): "cb-example-0002_default(bb5fac63-e6c0-11e8-8fcc-062ea35501ea)"
  1. The attach controller is watching for the disk to be attached to node where Pod is running
Nov 12 21:20:08 aks-agentpool-38464279-0 kubelet[3501]: I1112 21:20:08.857333    3501 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-acc51c36-e6c0-11e8-8fcc-062ea35501ea" (UniqueName: "kubernetes.io/azure-disk//subscriptions/a384b1e1-47d0-4067-8d5e-8d9e16e650e4/resourceGroups/MC_tmcafee-group-east_tmcafee-prod-2_eastus/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-acc51c36-e6c0-11e8-8fcc-062ea35501ea") pod "cb-example-0002" (UID: "bb5fac63-e6c0-11e8-8fcc-062ea35501ea")
  1. But Azure is at the same time attaching the disks to the vm where the Pod is running. This results in exponential backoffs up to a minute:
Nov 12 21:21:12 aks-agentpool-38464279-0 kubelet[3501]: E1112 21:21:12.595970    3501 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/azure-disk//subscriptions/..."" failed. No retries permitted until 2018-11-12 21:22:16.595939421 +0000 UTC m=+59822.661305484 (durationBeforeRetry 1m4s). Error: "Volume not attached according to node status for volume \"pvc-acc51c36-e6c0-11e8-8fcc-062ea35501ea\" ..."

Also looks like NewNestedPendingOperations always have exponentialBackOffOnError enabled: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_executor.go#L159

  1. After the 1m4s retry duration, the next mount check suceeds (~2m8s since Pod creation)
Nov 12 21:22:16 aks-agentpool-38464279-0 kubelet[3501]: I1112 21:22:16.605734    3501 operation_generator.go:1174] Controller attach succeeded for volume "pvc-acc51c36-e6c0-11e8-8fcc-062ea35501ea" ... device path: "3"
  1. Filesystem mount is attempted, but fails since the volume is unformated (note device path: "3"):
Nov 12 21:22:24 aks-agentpool-38464279-0 kubelet[3501]: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m1193319129 --scope -- mount -o defaults /dev/disk/azure/scsi1/lun3 /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m1193319129
...
Nov 12 21:22:24 aks-agentpool-38464279-0 kubelet[3501]: I1112 21:22:24.995463    3501 mount_linux.go:519] Disk "/dev/disk/azure/scsi1/lun3" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/disk/azure/scsi1/lun3]
  1. Disk gets ext4 formatted and mounted (~2m22s Mount time.)
Nov 12 21:22:30 aks-agentpool-38464279-0 kubelet[3501]: I1112 21:22:30.565701    3501 mount_linux.go:523] Disk successfully formatted (mkfs): ext4 - /dev/disk/azure/scsi1/lun3 /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m1193319129

Nov 12 21:22:30 aks-agentpool-38464279-0 kubelet[3501]: I1112 21:22:30.694720    3501 operation_generator.go:517] MountVolume.MountDevice succeeded for volume "pvc-acc51c36-e6c0-11e8-8fcc-062ea35501ea" ...

Nov 12 21:22:30 aks-agentpool-38464279-0 kubelet[3501]: I1112 21:22:30.748032    3501 operation_generator.go:558] MountVolume.SetUp succeeded for volume "pvc-acc51c36-e6c0-11e8-8fcc-062ea35501ea" 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment