This document is very far from exhaustive; I just wanted to write a few things down before an extended PTO.
- This wouldn't have been feasible if David hadn't deployed it as a POC.
- It will be even easier "next time" if we build a strong habit of keeping services and their configuration in version control.
- While there are a lot of rough edges in the OpenShift alerting UX, I was able to piece together solutions to many problems without much prior knowledge.
- I had the only useable login - and that was partly by chance! Without that, I'm not sure we'd have been able to use it.
- letsencrypt's normal challenge mechanism of "write this string to a file on your webserver" isn't acceptable for wildcards; it needs to be given control of your DNS to create a TXT record.
- Due to the complexity of configuring bind (the nameserver) to allow updating the zone that letsencrypt expects to be able to modify, I felt the need to implement that in the ceph-cm-ansible nameserver role.
- Since the cert was to be deployed as the OpenShift cluster's new cluster-wide cert, the client side of the process couldn't just be "run the certbot script on a host via SSH"; I had to discover and deploy the cert-manager operator.
- None of the above worked on the first try - or the tenth ;)
- Surprisingly, OpenShift Routes, while being extremely convenient, are only useable for HTTP. Any other protocol has to be implemented in a considerably more complex manner.
- We still don't know how to properly open the cluster up to the public Internet, so we are using a lot more reverse proxies than we used to. This also invovles rewriting request headers.
- We could have used more hands.