To be clear we continue to run many Redis services in our production environment. It’s a great tool for prototyping and small workloads. For our use case however, we believe the cost and complexity of our setup justifies urgently finding alternate solutions.
- Each of our Redis servers are clearly numbered with a current leader in one availability zone, and a follower in another zone.
- The servers run ~16 different individual Redis processes. This helps us utilize CPUs (as Redis is single-threaded) but it also means we only need an extra 1/16th memory in order to safely perform a BGSAVE (due to copy-on-write), though in practice it’s closer to 1/8 because it’s not always evenly balanced.
- Our leaders do not every run BGSAVE unless we’re bringing up a new slave which is carefully done manually. Since issues with the slave should not affect the leader and new slave connections might trigger an unsafe BGSAVE on the leader, slave Redis processes are set to not automatically rest