werdnum/homelab-rants.md

## homelab-rants.md

      
    Raw
  

              homelab-rants.md
            
          
    Storage

In a low power homelab environment, all storage is a trade-off.
Shared drive

Most people coming to homelabs initially want a big shared replicated volume that is stored redundantly across the cluster and operates for all intents and purposes as if it's independently attached to each node.
This setup has the following properties:

Consistency: Medium
Concurrency: Read and write from multiple nodes concurrently.
Performance: Lousy
Availability: Pretty good, you can store data redundantly enough that an interruption won't break your system.

In this situation, every read and write needs to consult a quorum of storage nodes, which is slow. The consensus algorithm really slows things down.
Implementations of this pattern include glusterfs (end-of-life 2024), Seaweedfs, MooseFS and others.
This kind of volume is not suitable for high-throughput workloads (e.g. Prometheus, any kind of database server, even git repositories can be a problem).
Personally, I use this pattern only for my 'workspace' (coding, configuration, etc).
Alternative: NAS

If you can manage to put your storage only on a single node (e.g. a NAS box), then you can do this fairly well over NFS, but it's still not going to be as high-performance or safe as a local disk. But performance will be much higher because you don't need a consensus algorithm.
Replicated non-concurrent

This is a common CSI pattern in Kubernetes and approximately matches Amazon EBS, Google Cloud PD, etc. Typically there's a replicated block device which is accessed over iSCSI.
This pattern is OK when you are storing something that can't be replicated at the application layer. Note that in many cases 3 replicas is not enough because Longhorn will go read-only when 1/3 replicas is down, so you really need 4 replicas to maintain high write availability.

Consistency: Strong
Concurrency: Read-write from only one node
Performance: Medium
Availability: Good, but you usually need 4 replicas.

Typical storage systems for this are Longhorn, OpenEBS (-Jiva is unmaintained, built on Longhorn. Mayastor is newer but requires raw NVMe access).
Local volumes

Where your application provides replication at the application layer (e.g. CockroachDB, Patroni, Grafana Loki, MinIO, SeaweedFS, etc), then you should store the data on local disk using the standard k3s local-path (or OpenEBS LocalPV) storage class. This will give the highest performance, and you don't need the redundancy if your application is doing the replication at a higher level.
Configuration

If you're creating a Kubernetes volume for infrequently edited configuration, your best bet is the built-in ConfigMap object rather than some kind of replicated storage. This will have the highest availability and consistency guarantees, at the cost of not permitting writes from your workloads (but you won't need that).
On replication and horizontal scalability

Big Tech companies, cloud providers and other high-performance builds have different requirements to your homelab. In particular, because it isn't possible to serve their entire workload from a single machine, horizontal scalability is a mandatory requirement.
However, horizontal scalability incurs several significant trade-offs:

Without a single coordinator node (or a single source of truth), the systems need to have algorithms for resolving all kinds of issues. How many nodes are required for serving? What if two nodes disagree on the state of the world? How can I guarantee that a write has been durably applied?

All of these questions are trade-offs. For example, the more certain you need to be that a write has been durably applied, the longer it takes to make a write.


These trade-offs in aggregate form a consensus algorithm (e.g. Raft) - but that consensus algorithm incurs a significant performance penalty, both in terms of compute usage and in terms of latency.

Your home lab probably doesn't need horizontal scalability - most of your workloads are unlikely to be bigger than a single node.
You may however want redundancy or high availability, which is a separate but related concept (meaning: more than one node has to fail in order for you to lose availability). Systems that are horizontally scalable are sometimes better suited for high availability, but this is not necessarily the case.
For example, some replicated systems degrade into read-only mode when only 2/3 nodes are available. This is strictly worse than a single point of failure, because now you have to stop 3 nodes from going down instead of 1.
High-availability services

Systems can also be high-availability without being horizontally scalable. The most common strategy for this is leader election, where a consensus algorithm (or external locking system such as etcd, Zookeeper or the Kubernetes API server) is used to decide one node to be the 'leader' (or 'primary', 'master' is a synonym now considered dated), with its state replicated to 'followers' (or 'replicas'). When the leader goes down, the remaining nodes elect a new leader which can pick up where the old one left off.
An example of this system in practice is Patroni (commonly run in Kubernetes as Postgres-operator), which is basically a managed Postgres cluster with replication and failover.
Leader-elected services with automatic failover do not permit horizontal scalability (in Patroni, all writes have to go through the one read-write postgres instance), but are much simpler and higher performance. This is a good trade-off in a homelab environment.