Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
[SLI, SLO, SLA] #SLI #SLO #SLA #Process #Service

SLI, SLO, SLA ?

When building a service, we have a responsibility to define some baseline agreements as to the service’s expected uptime and performance. This document focuses on the various terminology that we use to define these values.

  • Service Level Indicator (SLI)
    What the service owner has chosen to measure progress towards their goal.

  • Service Level Objective (SLO)
    What the service owner’s goal is for the given indicator.

  • Service Level Agreement (SLA)
    What the service owner is promising its users (typically the SLO + wiggle room).

Example

  • Indicator: request latency.
  • Objective: 99.5% of requests will be completed in 5ms.
  • Agreement: 99% of requests complete in 5ms or you get a refund.

Below is an alternative example where we state (via a Datadog graph) that if the average request latency over a 1hr period exceeds one second, then we have a ‘service’ issue and it’ll display with a red background. This will signify that we’ve failed our SLA which is defined as being less than one second.

If the average request latency over a 1hr period is greater than half a second, then we have a ‘team’ issue and it’ll display with an orange background. This will signify that we’ve failed our SLO which is defined as being less than half a second.

If the average request latency over a 1hr period is less than half a second, then we have no issues and it’ll display with a green background. This will signify that we’ve reached our SLO which is defined as being less than half a second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.