tvoklov/distributed transactions are a headache.md

## distributed transactions are a headache.md

      
    Raw
  

              distributed transactions are a headache.md
            
          
    Hey there. This is an edit I made after I talked to some people who know much more than I do and shared my grievances over distributed transactions in a microservice architecture. Here's how my view changed:

Yes, distributed transactions are not desirable
Sometimes, especially considering times when talking to outside services they are necessary

I'm gonna keep this post up since I still believe that when working on a fully internal architecture (you have access to all the source code and/or are the one designing the system) distributed transactions might show a failure in design. They are a clutch, and you shouldn't depend too much on them.

sometimes microservices seem like they create more problems than they solve.
the biggest problem, in my opinion, is not even the fact that to add a middle name to a user you have to spend half a year
on refactoring mounds of code.
it's the distributed transaction problem.
say you want to make a method call that is supposed to change state in multiple m/services. how would you do that?
well, some very smart people came up with two big solutions. and, unfortunately, they both kinda suck.

let's start with the solution everyone starts with: 2 phase commits
the idea is that, with each m/service, you

ask if the state of the mutated entity can be mutated in that way
ask it to lock the state

if any service returns false on any step - rollback transaction.
if all services return true on both steps - commit transaction.
rolling back in this case is simply telling the services to un-lock the state of the entity.
here's an example of a createOrder call that needs to mutate both order and account m/services' states:

tell order m/s that an order has to be created & locked
tell account m/s that a sum needs to be withdrawn from an account & lock that account
if either fails - fail the transaction

so, a pretty smart solution, right? it's really close to being strongly consistent, like an ACID transaction would be, and it still uses
microservices.
well, it sucks.
firstly - this is a synchronous operation. you are locking state in a bunch of microservices. that's cringe!
you started using microservices to decouple your state and logic, and here you're effectively creating a small cluster of coupled state for a few seconds/minutes/years/who knows how long.
secondly - this requires some kind of transaction coordinator. one more piece of the chain that can (and will) blow up catastrophically, not to mention increase the difficulty of your application, increasing the salaries of your employees and the time it takes to fix issues in your cluster.
third-ly(?) - you actually now have to store two states: the state before the transaction and the state that says "you're in a transaction". throw in the fact that m/services can crash and burn at any point in time, including when they're in a transaction, and you have one hell of a problem on your hands.

so, what's the second solution? sagas. they fix some of the issues 2pc has, but don't worry, they suck too.
in a saga, instead of locking state of each microservice you do this:
(starting with the first service in the saga chain)
if the state change is acceptable
  accept the state change in this service;
  tell the next service in the saga about the saga;
else
  rollback the state changes in the services before you in the saga;

this way you decouple the state of services and introduce a little bit of this cool new eventual consistency word into our résumé.
wait, but how do i rollback the state changes?
well, you just... implement a rollback function. in event sourcing it's called a compensating event.
like, for a balance change of +$10 it would be a balance change of -$10.
here's the same createOrder example, but implemented using a saga:

tell order m/s that an order has been created
order m/s tells the account m/s that the order sum needs to be withdrawn

2.1. if the account m/s fails - the order m/s marks the order as "failed" and the transaction ends

2.2. if the account m/s says "ok", then the transaction is completed

here's a more visual representation, now involving three services:
[ service 1: OK ] -> [ service 2: OK ] -> [ service 3: Failed ]
      /\               ||        /\                         ||
      || rollback      \/        || rollback                \/
      ===================        =============================

so, why is this bad? you might already know, based on the $10 example. not only do you have extremely eventual consistency, but now you also have functions that can break your god damn state that are written BY YOU. here's an example:
let's say that an account change is a saga between 5 services that takes 5 seconds to complete on average.

to start with, i have $10 in my bank account
i input $20 into my account

2.1. the account service accepts my changes. I now have $30 in my account

2.2. the other services in the saga start talking
i buy something worth $30

3.1. the account service says "yes, you have $30"

3.2. the account service withdraws my $30

3.3. my account balance is now $0
an error happens during the saga from step 2

4.1. the account service rolls back the +$20

4.2. i get -$20 applied to my account

4.3. my account balance is now -$20

now that's funny!

so, they both suck. how do we fix this?
i don't fucking know. what I do know is using distributed transactions is an absolute nightmare and break so much because they break the m/services pattern. remember - they are supposed to be decoupled. their distributed state shouldn't be updated in an ACID way because there isn't supposed to be any reason for a global state. each m/service is supposed to own its data.
whatever. m/services are cool, i'm just cranky over trying to find a solution for this issue. btw, it's not even an issue - it's a drawback. a drawback of using a distributed architecture. and, in my opinion, if you need a lot of distributed transactions - your domains probably have shit boundaries. or your logic is just that coupled and you should just merge the m/services that share that much state into one or two bigger m/services.
oh, and yeah, I know - the -$20 example is of a very naive and dumbly implemented saga, but what is important is that it's something that can happen even if you use this cool and fresh saga pattern. comparatively, with an ACID database transaction it is impossible to do something like this. you have to think and know a whole lot more when implementing sagas and m/services. comparatively, you barely need to think or really know anything when using an ORM that creates transactions for you automatically.
you don't even need to try to imagine what would happen if you put junior engineers anywhere near implementing a saga to get a heart attack.
microservices aren't a cure for everything. they are an extremely difficult tool designed for a certain range of problems and misusing them will result in a lot of pain when they introduce problems you didn't have and make your head throb when they don't solve the ones you expected them to solve.

here are a few places i took this knowledge from:

developers.redhat.com
microservices.io
a lightbend academy course (i recommend you go through the entire course if you're at all interested in microservices)