Skip to content

Instantly share code, notes, and snippets.

@ifeulner
Last active March 15, 2024 09:00
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save ifeulner/d311b2868f6c00e649f33a72166c2e5b to your computer and use it in GitHub Desktop.
Save ifeulner/d311b2868f6c00e649f33a72166c2e5b to your computer and use it in GitHub Desktop.
Longhorn hcloud best practices

Longhorn best practices

The following settings are provided as an example how longhorn should be configured in a production cluster, especially if it is deployed on Hetzner Cloud infrastructure.

Hetzner server nodes provide local storage and allow up to five attached volumes (with a size of up to 10TiB each) Local storage is provided by NVMe storage and therefore is much faster than the attached volumes, but limited in size (max 300GiB usable).

It is assumed that the cluster creation is already done, e.g. via terraform scripts provided by the great kube-hetzner project.

Initial configuration

Also you want to control the nodes that are used for storage. So it is suggested to set the option Create default disk only on labeled node to true This is done via a setting in the longhorn helm chart:

defaultSettings:
  createDefaultDiskLabeledNodes: true
  kubernetesClusterAutoscalerEnabled: true # if autoscaler is active in the cluster
  defaultDataPath: /var/lib/longhorn
  # ensure pod is moved to an healthy node if current node is down:
  node-down-pod-deletion-policy: delete-both-statefulset-and-deployment-pod
persistence:
  defaultClass: true
  defaultFsType: ext4
  defaultClassReplicaCount: 3

Node preparation

Label the node

The following label signals longhorn to use a dedicated config. The config must be provided via an annotation. If the label is present, longhorn will not create a default disk but follows the config provided in the annotation.

add the 'config' label

kubectl label node <node> node.longhorn.io/create-default-disk='config'

remove the label

# remove label
kubectl label node <node> node.longhorn.io/create-default-disk-

Longhorn-relevant annotation

See also longhorn default disk configuration

The default disk configuration is provided by annotating the node. In the example below you find a configuration for a two disk setup - the internal disk and one external hcloud volume (mounted at /var/longhorn).

storageReserved should be 25% for the local disk, 10% for attached dedicated hcloud volumes. So if the local disk space is 160GiB, 40GiB (42949672960 bytes) should be defined as reserved. Also we define tags for the different disks - "nvme" for the fast, internal disk, "ssd" for the slow, hcloud volume.

kubectl annotate node <storagenode> node.longhorn.io/default-disks-config='[ { "path":"/var/lib/longhorn","allowScheduling":true, "storageReserved":21474836240, "tags":[ "nvme" ]}, { "name":"hcloud-volume", "path":"/var/longhorn","allowScheduling":true, "storageReserved":10737418120,"tags":[ "ssd" ] }]'

StorageClasses

To ensure that the volume is using the right storage, the corresponding StorageClass needs to be defined:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880" # 48 hours in minutes
  fromBackup: ""
  fsType: "ext4"
  diskSelector: "nvme"
  
  #  backingImage: "bi-test"
  #  backingImageDataSourceType: "download"
  #  backingImageDataSourceParameters: '{"url": "https://backing-image-example.s3-region.amazonaws.com/test-backing-image"}'
  #  backingImageChecksum: "SHA512 checksum of the backing image"
  #  diskSelector: "ssd,fast"
  #  nodeSelector: "storage,fast"
  #  recurringJobSelector: '[
  #   {
  #     "name":"snap",
  #     "isGroup":true,
  #   },
  #   {
  #     "name":"backup",
  #     "isGroup":false,
  #   }
  #  ]'

Benchmarks

Below find some results with different StorageClasses using the benchmark utility dbench. Besides disk selection and replicas als the used server type for the storage nodes has a huge impact (vCPU, RAM).

Dbench

# StorageClass longhorn (3 replicas, hcloud-volume)
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 4202/240. BW: 228MiB/s / 26.2MiB/s
Average Latency (usec) Read/Write: 2716.04/18.33
Sequential Read/Write: 305MiB/s / 66.3MiB/s
Mixed Random Read/Write IOPS: 823/272


# StorageClass longhorn-fast (3 replicas, internal disk)
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 7305/3737. BW: 250MiB/s / 62.1MiB/s
Average Latency (usec) Read/Write: 1789.55/
Sequential Read/Write: 299MiB/s / 67.8MiB/s
Mixed Random Read/Write IOPS: 4446/1482

Random Read/Write IOPS: 6914/3661. BW: 256MiB/s / 57.9MiB/s
Average Latency (usec) Read/Write: 1801.84/
Sequential Read/Write: 275MiB/s / 62.6MiB/s
Mixed Random Read/Write IOPS: 4908/1632

# StorageClass longhorn-fast-xfs (3 replicas, internal disk)
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 5887/4083. BW: 244MiB/s / 44.8MiB/s
Average Latency (usec) Read/Write: 2720.66/
Sequential Read/Write: 278MiB/s / 46.2MiB/s
Mixed Random Read/Write IOPS: 3880/1298
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880" # 48 hours in minutes
fromBackup: ""
fsType: "ext4"
diskSelector: "nvme"
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-fast-xfs
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880" # 48 hours in minutes
fromBackup: ""
fsType: "xfs"
diskSelector: "nvme"
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-fast-2
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "2"
staleReplicaTimeout: "2880" # 48 hours in minutes
fromBackup: ""
fsType: "ext4"
diskSelector: "nvme"
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-fast-2-xfs
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "2"
staleReplicaTimeout: "2880" # 48 hours in minutes
fromBackup: ""
fsType: "xfs"
diskSelector: "nvme"
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: dbench-pv-claim
spec:
# storageClassName: longhorn
storageClassName: longhorn-fast
# storageClassName: local-path
# storageClassName: gp2
# storageClassName: local-storage
# storageClassName: ibmc-block-bronze
# storageClassName: ibmc-block-silver
# storageClassName: ibmc-block-gold
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 25Gi
---
apiVersion: batch/v1
kind: Job
metadata:
name: dbench
spec:
template:
spec:
containers:
- name: dbench
image: storageos/dbench:latest
imagePullPolicy: Always
env:
- name: DBENCH_MOUNTPOINT
value: /data
# - name: DBENCH_QUICK
# value: "yes"
# - name: FIO_SIZE
# value: 1G
# - name: FIO_OFFSET_INCREMENT
# value: 256M
# - name: FIO_DIRECT
# value: "0"
volumeMounts:
- name: dbench-pv
mountPath: /data
restartPolicy: Never
volumes:
- name: dbench-pv
persistentVolumeClaim:
claimName: dbench-pv-claim
backoffLimit: 4
@ifeulner
Copy link
Author

ifeulner commented Apr 3, 2023

@ifeulner thanks for providing your config! How do you add the volumes to the nodes? While setup (e.g. with terraform) or by hand?

@okaufmann There is a great project for: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner :)

@okaufmann
Copy link

@ifeulner nice thank you

@carstenblt
Copy link

I am having weird longhorn problems on hetzner regularly. I believe most of the time it is a node hosting the instance-manager that has some kind of network problem. The volume then gets unwritable and can't be remounted again because it fails with a time out.
Usually draining the right node helps.
I guess hetzner is just unreliable when it comes to this. What's weird however is, that longhorn is not resilient enough to cope with this.

@ifeulner Have you had any such experiences?

@sharkymcdongles
Copy link

I am having weird longhorn problems on hetzner regularly. I believe most of the time it is a node hosting the instance-manager that has some kind of network problem. The volume then gets unwritable and can't be remounted again because it fails with a time out. Usually draining the right node helps. I guess hetzner is just unreliable when it comes to this. What's weird however is, that longhorn is not resilient enough to cope with this.

@ifeulner Have you had any such experiences?

are you using the local nvme or attached disk?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment