In my storage quests, I finally decided I want to lazily use S3 for ReadWriteMany and to do do some experiments with.
There are a few options, but to save you some time if you just want what I landed on, I like csi-s3.
Well... this works great! The only problem was that it needed security privileges for mounting. That would be terrible if a container with this power got compromised, so I immediately moved on to getting this a layer away from being managed in-pod.
My initial plan was to just use the nfs-subdir-external-provisioner on top of a multi-replica S3 backed deployment of NFS Ganesha.
When running time echo hi > /mnt-path/hello.txt
against s3fs directly and NFS Ganesha, I was finding that there was roughly 0.5 seconds of time before NFS Ganesha completed it's work where-as directly using s3fs was responsive. So responsive that time had been reporting 0.000. This alone was a big turn off for me.
So I moved on to trying to implement the in kernel NFS implementation. Admittently, I have no clue why, but this defeated me. I couldn't win and this is something I've done professionally for the fortune 500 for half a decade on RHEL based systems. This experiment never made it past testing in plain docker containers.
I had showmount -e
showing my exports and I even had it wide open to the world with a wildcard. Anytime I would go to mount -t nfs ...
mount would just hang. Spending hours trying different formulas and seeing how other people implemented nfs in Alpine, Ubuntu, and CentOS; I restarted docker one last time to rid the hung processes and hung up my hat on this.
A highly available NFS share with S3 lost all appeal to me at this point. There's still block volumes and DRBD testing I want to do later here though.
I must confess before continuing, I am affiliated with IBM at the time of writing this. However this doesn't change my opinion on datashim.
The first time I saw datashim.io it looked appealing, but I wasn't interested in using S3 at the time. It looks like it can mount Apache Hive as well.
In my testing, it worked as well as s3fs did inside the container as far as writes go. It also took away the need for having a privileged container.
The only downsides I found were:
- it doesn't support symlinks, which is a deal breaker for my own needs
- there's an additional CRD called
Dataset
that you use to make your PhysicalVolumeClaims.
Overall though, it does work great and I have a bit of trust behind a bigger name like IBM for stuff like this.
So I moved on to the idea of looking for an S3 specific CSI, or if one didn't exist, finding out how to write my own.
Thankfully, someone out there was already on point and made a CSI for S3: https://github.com/ctrox/csi-s3
Also this is a really simple CSI if you need some example code to work off of for making your own.
This is the bachelor chow I'm about to consume. It provides 4 different ways to mount S3 buckets, including my favorite pal s3fs
.
First problem I ran into was getting the storageclass example in README.md is incomplete, found the example here to be complete though.
For my development purposes, I have formulated local-s3.yaml
which is an inclusive local development kit using minio for S3, for it to work, the cluster nodes must be able to resolve cluster DNS. On random providers, using resolve-host-patcher should work. CSI-S3 will have a bucket created per pvc and when you delete a PVC the reclaimPolicy is Delete
from this manifest.
On a live cluster, I would store a secret with the command below and use live-storage-class.yaml
. The reclaimPolicy is Retain
and it will create the PVCs inside a bucket called lantern.
kubectl -n kube-system create secret generic csi-s3-secret \
--from-literal="accessKeyID=..." \
--from-literal="secretAccessKey=..." \
--from-literal="endpoint=https://nyc3.digitaloceanspaces.com" \
--from-literal="region="
With S3 buckets, backups should be pretty easy to accomplish. For places like digital ocean, I plan to just run a job that uses the secret for the CSI. It's fine to be a privileged container, so I can just build an alpine utility container like so for the job.
FROM alpine:3
RUN apk add -U --no-cache bash curl && \
apk add -U --no-cache s3fs-fuse kubectl helm --repository=http://dl-cdn.alpinelinux.org/alpine/edge/testing/
The reliability of reading the files will be dependent on the underlying S3 storage consistency guarantees.
There doesn't appear to be a way to add the flag to s3fs for caching. If needed, patching functionality in for this will be required. Ref: ctrox/csi-s3/pkg/mounter/s3fs.go
It looks like S3FS has implemented SlowDown handling.
While I have never been able to break s3fs, I'm sure there's a way. There's always a way when you have people using your systems in the wild.