I was originally hoping to calculate the appropriate
from metrics at 10% traffic and then scale up from there, however it
proved harder than I thought.
We have the following variables that we can tweak:
cache per pod: needs to be large enough that we can take spikes in trace volume or duration but not too large that we're wasting memory
memory per pod: should be less than 8G so that we can effectively fit