nobl9-mikec/slo-eb-method-comparison.md

## slo-eb-method-comparison.md

      
    Raw
  

              slo-eb-method-comparison.md
            
          
    Summary

A quick write up on the different outcomes from various error budget calculation methods.

Scenario data will be summarized as an array of pairs with a timestamp and value, ex:
Good Events
[ t0: 4, t1: 10, t2: 0, t3: 15 ...]
Total Events
[ t0: 4, t1: 11, t2: 0, t3: 15 ...]

Scenario 1 - Slow rise and fall traffic

In this scenario were looking at traffic that slowly rises and falls, for example, throughout various hours of the day.
During peak load, the service begins to suffer, so some requests fail.

It also suffers a small performance hiccup at launch due to startup costs.
Our target for the window is 95%.


Traffic
t0
t1
t2
t3
t4
t5
t6
t7
t8
t9


Good Events
2
10
12
13
13
12
13
13
12
11


Valid Events
4
10
12
13
14
15
14
13
12
11


Occurrence Based EB

With occurrence based error budgeting, we simply sum the Good and Total events for the window and divide them.

It's straightforward, and automatically weights impact by the total number of requests served.
Good Events summed - 111
Total Events summed - 118
Window performance -> Good/Total -> 94.0677%
Error Budget burnt during window -> (1 - 0.940677) / (1 - 0.95) -> 118.646%
TimeSlice based EB

With timeslice based error budgeting, we evaluate each time interval individually against the target, and then
divide the total number of successful time intervals by the total number of time intervals.
For example, t4's interval value would be a boolean denoting whether 13/14 > 0.95, or false/0.
Interval results (success=1, failure=0):


Interval
t0
t1
t2
t3
t4
t5
t6
t7
t8
t9


Result
0
1
1
1
0
0
0
1
1
1


Good intervals  = 6
Total intervals = 10
Window performance -> 6/10 -> 60.0%
Error Budget burnt during window -> (1 - 0.6) / (1 - 0.95) -> 800%
Weighted TimeSlice based EB

With weighted time slicing, rather than store a boolean value of each time interval, a success float would be stored
representing how successful the time interval was.  The result is then averaged across all time windows.
For example, t4's interval value would be 13/14 -> 0.92857


Interval
t0
t1
t2
t3
t4
t5
t6
t7
t8
t9


Result
0.5
1
1
1
0.92857
0.8
0.92857
1
1
1


Average of all intervals = 9.15174 / 10 -> 91.5174%
Error budget burnt during window ->(1 - 0.915174) / (1 - 0.95) -> 169.652%
Traffic	t0	t1	t2	t3	t4	t5	t6	t7	t8	t9
Good Events	2	10	12	13	13	12	13	13	12	11
Valid Events	4	10	12	13	14	15	14	13	12	11