Skip to content

Instantly share code, notes, and snippets.

@Nj-kol
Created September 7, 2021 16:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Nj-kol/d5211ed218557a6e398212838dffa3e9 to your computer and use it in GitHub Desktop.
Save Nj-kol/d5211ed218557a6e398212838dffa3e9 to your computer and use it in GitHub Desktop.
Difference between Fault tolerance and Resiliency

Fault Tolerance is not the same as Resilency. These two terms are sometimes used interchangeably, but are indeed different.

Fault Tolerance

  • Fault Tolerant means the ability of a system to survive (tolerate) when a fault occurs, e.g, surviving a server crash or network partition etc
  • There may be some temporary drop in overall performance, however system features are not affected
  • Mechanisms such as checkpoint/restore, Replicated State Machines can solve this issue
  • The systems usually has the ability to self-detect faults and do failovers

Example(s)

If out of N instances of a microservice sitting behind a reverse proxy like nginx, one instance fails, the service is still available. However, There will be a decrease in throughput . Hence the service is fault-tolerant.

Relisiency

Resiliency is a measure of the system's ability to self-recover from problems

Example(s)

  • If we run the above example on Kubernetes, the instance that failed will be automatically brought back online (maynot be the same instance) because Kubenetes automatically maintains the exact number of pods in a replica set. Hence in addition to being fault tolerance this it is also resilient
    • Circuit Breaker pattern often used in micro service architecture address resilience issue, wherein you give system time to recover
  • In Hadoop or any distributed storage system, if the number of replicas fall below the replication factor, the system creates the number of under replicated copies automatically

You can have a system that is fault tolerant but not resilient . For example, from the first example , if you had to manually bring up an addtional instance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment