Skip to content

Instantly share code, notes, and snippets.

@andrewloux
Last active December 10, 2022 21:18
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andrewloux/5fd10d8041aeaf733d3acfbd61f6bbef to your computer and use it in GitHub Desktop.
Save andrewloux/5fd10d8041aeaf733d3acfbd61f6bbef to your computer and use it in GitHub Desktop.
Kafka: Notes on Consuming from Multiple Topics

How are partitions assigned in a consumer group?

This is based on the partition assignment strategy configuration, an important note here is that this is a topic-specific configuration. It can only be used to describe the strategy used to assign partitions to a specific consumer group.

The default strategy is called the RangeAssignor strategy. In this strategy, we divide the number of partitions over the number of consumer instances, and assign one by one. If the division does not result in a nice round number, the first few consumer instances are assigned more partitions.

Example

Suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, resulting in the following partitions: t0p0, t0p1, t0p2, t1p0, t1p1, and t1p2

The assignment for these will be:

C0: [t0p0, t0p1, t1p0, t1p1]

C1: [t0p2, t1p2]

Multi-Topic Consumers

We may have a consumer group that listens to multiple topics. If they have the same key-partitioning scheme and number of partitions across two topics, we can join data across the two topics.

In other words, if we had two topics partitioned by a UserID, with the same number of partitions, we can ensure that the same consumer instance consumes all events for any specific user (the order of course is still only guaranteed by partition, and thereby topic in this scenario). Using the previous example, given we have only two partitions in t1 and t2; the assignment here will be:

Ci: [t0p0, t1p0]

Cj: [t0p1, t1p1]

And the result of PartitionerFunc(UserID, len(partitions)) will be the same here, so for any pariticular UserID, our messages would be consumed by the same consumer.

Why would we want this?

For example - you might be listening to a central stream of events, and have a separate retry topic (for events that failed being handled). Both topics are partitioned by UserID, and you want the same consumer to consume all a particular User's events. You might have done this so that each consumer instance could maintain a user-based cache.

This feature will also enable us to do "joins" of data across multiple topics, paritioned using the same key. This is the feature that Kafka Streams is built on.

What could go wrong?

Using the above example - if we had mismatched partition counts, we could wind up having events from the same key being consumed by different consumer instances in the group.

Further Reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment