andrewloux/consuming-from-multiple-topics.md

## consuming-from-multiple-topics.md

      
    Raw
  

              consuming-from-multiple-topics.md
            
          
    How are partitions assigned in a consumer group?

This is based on the partition assignment strategy configuration, an important note here is that this is a topic-specific configuration. It can only be used to describe the strategy used to assign partitions to a specific consumer group.
The default strategy is called the RangeAssignor strategy. In this strategy, we divide the number of partitions over the number of consumer instances, and assign one by one. If the division does not result in a nice round number, the first few consumer instances are assigned more partitions.
Example

Suppose there are two consumers C₀ and C₁, two topics t₀ and t₁, and each topic has 3 partitions, resulting in the following partitions: t₀p₀, t₀p₁, t₀p₂, t₁p₀, t₁p₁, and t₁p₂
The assignment for these will be:
C₀: [t₀p₀, t₀p₁, t₁p₀, t₁p₁]
C₁: [t₀p₂, t₁p₂]
Multi-Topic Consumers

We may have a consumer group that listens to multiple topics. If they have the same key-partitioning scheme and number of partitions across two topics, we can join data across the two topics.
In other words, if we had two topics partitioned by a UserID, with the same number of partitions, we can ensure that the same consumer instance consumes all events for any specific user (the order of course is still only guaranteed by partition, and thereby topic in this scenario). Using the previous example, given we have only two partitions in t₁ and t₂; the assignment here will be:
C_i: [t₀p₀, t₁p₀]
C_j: [t₀p₁, t₁p₁]
And the result of PartitionerFunc(UserID, len(partitions)) will be the same here, so for any pariticular UserID, our messages would be consumed by the same consumer.
Why would we want this?

For example - you might be listening to a central stream of events, and have a separate retry topic (for events that failed being handled). Both topics are partitioned by UserID, and you want the same consumer to consume all a particular User's events. You might have done this so that each consumer instance could maintain a user-based cache.
This feature will also enable us to do "joins" of data across multiple topics, paritioned using the same key. This is the feature that Kafka Streams is built on.
What could go wrong?

Using the above example - if we had mismatched partition counts, we could wind up having events from the same key being consumed by different consumer instances in the group.
Further Reading


A more detailed overview of the other available partition assignor strategies: https://medium.com/streamthoughts/understanding-kafka-partition-assignment-strategies-and-how-to-write-your-own-custom-assignor-ebeda1fc06f3