Integralist/SLI, SLO, SLA.md

## SLI, SLO, SLA.md

      
    Raw
  

              SLI, SLO, SLA.md
            
          
    SLI, SLO, SLA ?

When building a service, we have a responsibility to define some baseline agreements as to the service’s expected uptime and performance. This document focuses on the various terminology that we use to define these values.


Service Level Indicator (SLI)

What the service owner has chosen to measure progress towards their goal.

e.g. What is good/bad service for your users.


Service Level Objective (SLO)

What the service owner’s goal is for the given indicator.

e.g. A percentage that tells how many SLI errors your service can have in a specific period of time.


Service Level Agreement (SLA)

What the service owner is promising its users (typically the SLO + wiggle room).


Example


Indicator: request latency.
Objective: 99.5% of requests will be completed in 5ms.
Agreement: 99% of requests complete in 5ms or you get a refund.

Below is an alternative example where we state (via a Datadog graph) that if the average request latency over a 1hr period exceeds one second, then we have a ‘service’ issue and it’ll display with a red background. This will signify that we’ve failed our SLA which is defined as being less than one second.
If the average request latency over a 1hr period is greater than half a second, then we have a ‘team’ issue and it’ll display with an orange background. This will signify that we’ve failed our SLO which is defined as being less than half a second.
If the average request latency over a 1hr period is less than half a second, then we have no issues and it’ll display with a green background. This will signify that we’ve reached our SLO which is defined as being less than half a second.