Skip to content

Instantly share code, notes, and snippets.

@timmc timmc/
Last active Mar 19, 2019

What would you like to do?
Success-prioritized, concurrency-limited stochastic fallback cascade


  • State
    • For each node, store:
      • Rolling window of 6 historical stat buckets
      • One additional "sticky" bucket
    • Write stats to newest bucket
    • Age out the oldest bucket every 5 seconds, and add a new one
      • If the oldest bucket had data, copy it over the "sticky" bucket
    • Recorded stats: Number of finished requests, number that finished successfully
  • Prioritization by health
    • Assign each node a success rate:
      • If there are finished requests in the historical buckets: Divide the success count sum by the finished count sum, sums weighted exponentially (most recent bucket highest)
      • No finished requests in historical buckets:
        • If there is data in the sticky bucket: Compute success rate from that data, with a floor of 0.0001 divided by node count
        • No data in sticky bucket either: Use value of 1
    • Derive a weight for each node: Success rate cubed
    • Perform a weighted shuffle of nodes. Alternatively, approximate it:
      • Shuffle connections
      • Order by descending success rate
      • Pick one connection by weighted random and promote to first
  • Fallback cascade
    • Iterate over prioritized connections
    • Try to get a lease from each one (using concurrency-limits library) and use the first one that is available
    • If none available, return error to caller
    • Otherwise, make the call
    • Tell concurrency-limits whether it was a success, a timeout, or an unrelated error
    • Atomically increment connection's request count and (if applicable) success count


  • Startup: Initially, all nodes are tabula rasa and have equal chance of being selected. However, we don't want a positive feedback loop that leads to one node being considered healthy and the others being untrusted by default, which is why a clean-slate node is considered completely healthy—it needs to be "competitive" with nodes that already do have data.
    • However: Incrementing request stats on the tail of the request means there's a delay before a bad node is uncovered at startup or after a dynamic config change. If we're willing to wait 6 seconds before timing out and considering it an error, that's 6 seconds of requests that might fail before we have stats. We'll rely on concurrency-limits to limit the damage from this slow-failure scenario.
  • Recovery: Accomplished by aging-out of buckets combined with the no-data minimum. A connection with 0% success rate is never otherwise picked as long as other connections are healthy. The sticky bucket helps us distinguish between a newly configured node and one that has simply not received any requests recently, either due to bad health or low request rate. Even if a node has no recent requests, and the last recorded data was 0% success, we give it a slight chance to be picked again. 0.0001 divided by number of nodes keeps the damage low if the node is still unhealthy, but could bring it back into service quickly at a high request rate.
  • Fast detection: Each stat bucket is weighted more heavily than the next older one, perhaps by a factor of 3. Narrower buckets would enhance this effect by reducing dilution.
    • There is likely a more precise algorithm that takes fully into account the number of outstanding requests and more quickly detects a sudden timeout condition. However, the concurrency-limits library should help with this.
  • Decorrelation: This algorithm uses weighting to prefer healthier nodes, but weighted-random selection to keep from sending all requests to the current-healthiest one.
    • When a node is completely down, it takes no requests unless the concurrency limit is reached on the healthy nodes.
    • When all are down, fail fast.
    • When all nodes are suffering, distribute requests proportionally.
  • Fast-fail: Under non-timeout error conditions, we always try to make a request. This assumes that if the backends are truly struggling under load, they will show a latency increase, at which point the concurrency-limits will kick in.
    • If this assumption is challenged, consider implementing a circuit breaker (or perhaps use Hystrix) and include it in the cascade alongside concurrency-limits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.