Skip to content

Instantly share code, notes, and snippets.

@nobl9-mikec
Last active July 19, 2022 14:41
Show Gist options
  • Save nobl9-mikec/a1a55d97d77f10216be775eaad7221ac to your computer and use it in GitHub Desktop.
Save nobl9-mikec/a1a55d97d77f10216be775eaad7221ac to your computer and use it in GitHub Desktop.
SLO Budgeting Method Comparison

Summary

A quick write up on the different outcomes from various error budget calculation methods.
Scenario data will be summarized as an array of pairs with a timestamp and value, ex:

Good Events
[ t0: 4, t1: 10, t2: 0, t3: 15 ...]
Total Events
[ t0: 4, t1: 11, t2: 0, t3: 15 ...]

Scenario 1 - Slow rise and fall traffic

In this scenario were looking at traffic that slowly rises and falls, for example, throughout various hours of the day. During peak load, the service begins to suffer, so some requests fail.
It also suffers a small performance hiccup at launch due to startup costs.

Our target for the window is 95%.

Traffic t0 t1 t2 t3 t4 t5 t6 t7 t8 t9
Good Events 2 10 12 13 13 12 13 13 12 11
Valid Events 4 10 12 13 14 15 14 13 12 11

Occurrence Based EB

With occurrence based error budgeting, we simply sum the Good and Total events for the window and divide them.
It's straightforward, and automatically weights impact by the total number of requests served.

Good Events summed - 111 Total Events summed - 118

Window performance -> Good/Total -> 94.0677%

Error Budget burnt during window -> (1 - 0.940677) / (1 - 0.95) -> 118.646%

TimeSlice based EB

With timeslice based error budgeting, we evaluate each time interval individually against the target, and then divide the total number of successful time intervals by the total number of time intervals. For example, t4's interval value would be a boolean denoting whether 13/14 > 0.95, or false/0. Interval results (success=1, failure=0):

Interval t0 t1 t2 t3 t4 t5 t6 t7 t8 t9
Result 0 1 1 1 0 0 0 1 1 1

Good intervals = 6 Total intervals = 10

Window performance -> 6/10 -> 60.0%

Error Budget burnt during window -> (1 - 0.6) / (1 - 0.95) -> 800%

Weighted TimeSlice based EB

With weighted time slicing, rather than store a boolean value of each time interval, a success float would be stored representing how successful the time interval was. The result is then averaged across all time windows. For example, t4's interval value would be 13/14 -> 0.92857

Interval t0 t1 t2 t3 t4 t5 t6 t7 t8 t9
Result 0.5 1 1 1 0.92857 0.8 0.92857 1 1 1

Average of all intervals = 9.15174 / 10 -> 91.5174%

Error budget burnt during window ->(1 - 0.915174) / (1 - 0.95) -> 169.652%

@nobl9-mikec
Copy link
Author

Some considerations from @mmazur on Edge-based performance evaluation
https://gist.github.com/mmazur/e94e6c31cb3821c15a948c24d6baaece

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment