Skip to content

Instantly share code, notes, and snippets.

@markpapadakis
Last active February 18, 2019 20:42
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save markpapadakis/5c0e0ee74fe5fcc4d06fd87563c29236 to your computer and use it in GitHub Desktop.
Save markpapadakis/5c0e0ee74fe5fcc4d06fd87563c29236 to your computer and use it in GitHub Desktop.

Suppose topic with a partition P0.

  1. Producer PR1 publishes a message to P0 with required acks = 2 (two peers from the replicas set, aside from leader, needs to ack. it before the update is considered committed). Producer PR1 is waiting for ack from the leader. That message’s LSN is going to be 100.
  2. Producer PR2 publishes a message to P0 with required acks = 1(one peer from the replicas set, aside from the leader, needs to ack. it before the update is considered committed). Producer PR2 is waiting for ack from the leader. That message’s LSN is going to be 101
  3. One Peer asks to consume from x(< 100) and then immediately asks to consume from 102. Other peers in the replicas set are busy and haven’t gotten to consume any updates yet(but not too busy as to be expelled from the ISR).

What happens now? Is PR2 getting acknowledged (because one peer consumed > 101 and acks == 1 and PR's message LSN = 101) ? Or, will Kafka wait for all messages before it to be aknowledgd (e.g the message from PR1 with LSN 100) before acknowledging the message from PR2 with LSN 101?` That is to say, will it respond to PR2 with an acknowledgement before PR1 is provided an acknowledgement, considering that PR1's message LSN < PR2's message LSN?

@gwenshap
Copy link

gwenshap commented Feb 13, 2019

Nice scenario :)

Since acks=1 means "ack only from leader" and acks=2 is unsupported :)
I am translating the scene to: acks=all with "min.isr=3" in first case (PR1) (leader + 2 replica) and "min.isr=2" in second case (PR2).

Also, replicas consume in order, always. Replicas can't skip from 100 to 102 when fetching events.

Basically, the high watermark advances based on "last offset replicated by the entire ISR". And it doesn't depend on acks.

Lets assume that one follower consumed all the way to 102 (because it can't consume 102 without consuming 101 and 100), and the others haven't consumed anything yet.

  • PR2 will get his ack because it is written to leader+follower
  • PR1 will keep waiting until timeout
  • Consumers won't see any event after 99 (because they weren't replicated to the entire ISR, and so HWM didn't move)

Fast forward 10 seconds.

If the remaining replicas still didn't consume the updates, they will be removed from ISR and HWM will move to 102. And then:

  • PR1 will get "Not enough replicas" error. It can retry, and if it does, the retry will probably end up as a duplicate because the leader got the event and didn't crash.
  • Consumers will be able to get events 100, 101 and 102 because the leader has them.

If the remaining replicas managed to catch up, then PR1 will finally get its ack and consumers will be able to see 100, 101 and 102.

The point of shrinking the ISR is unblocking consumers.

@markpapadakis
Copy link
Author

markpapadakis commented Feb 13, 2019

Thanks Gwen. That makes sense.

I wasn't sure if PR2 would get an ack if there was another pending response to be ack for an LSN < PR2's LSN. I also thought that HWM would advance based on ack. semantics, not based on the maximum LSN attempted to be consumed by all nodes in the ISR.
That is to say, if a producer produces a message with LSN X and required peer acks = 1 (i.e 1 peer ack + the implicit ack from the leader), then as soon as just one replica consumed from LSN > X, the producer would get its ack and HWM would advance (because the producer who produced at X was acknowledged).

@gwenshap
Copy link

Ah, yeah. It is one of the things that make sense when you get used to it, but very confusing at first. We get LOTS of support calls with "my producer got an ack, why didn't the consumer see the message immediately?".

This is why it is so important to measure "produce to consume" latency as well as "producer to broker" latency.

@gwenshap
Copy link

... and I just realized that min.isr is set at topic level, not producer level... so the scenario is quite imaginary... but it was interesting to think through anyway :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment