Skip to content

Instantly share code, notes, and snippets.

@agarman
Created June 27, 2017 18:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save agarman/882fcab96ee5dee97f1837fec0322c05 to your computer and use it in GitHub Desktop.
Save agarman/882fcab96ee5dee97f1837fec0322c05 to your computer and use it in GitHub Desktop.
MessageHub is Kafka

MessageHub is Kafka

Kafka/MessageHub is a distributed log. It scales writes via partitioning data. You can write a custom partitioner, though the DefaultPartitioner is often sufficient for most applications.

The DefaultPartitioner uses the message key to determine which partition to write a message. This allows control of where messages are written (important for ConsumerGroups…). This also means that Kafka is susceptible to hot partitions, where more messages are being routed to a partition than can be supported by the disk IO of the Kafka nodes hosting that partition.

Writes are also replicated (recommended 3 replicas). If there’s not a quorum of ISR (in-sync replicas) writes will be refused for that partition (Kafka maintains consistency within data partitions).

There are two styles of consumers in Kafka: 1.) a stateless or low-level API where the consumer tracks it’s read position on the partition(s) being consuming 2.) a high-level, consumer group API where Kafka tracks it’s read position but allows at most 1 consumer per partition.

The consumer group API looks like a queue, but it has many limitations not present in a queue: 1.) Messaging is only ordered within a partition…not across all topics. 2.) The dequeue is exclusive…a single consumer at a time. Which means if you need a shared queue, you have to implement your own dequeue and routing.

It’s best to think of the ConsumerGroup API as a non-shared message iterator with checkpointing support for fail-over/fault tolerance.

Also, a distinct advantage of Kafka ConsumerGroup versus a traditional queue is that you can rewind to a previous offset or time. The data written to Kafka is immutable…

Though it’s immutable, there is retention settings and compaction. Retention removes old data, compaction removes all but the newest values for a given message key.

Compaction: [foo|123],[foo|456],…[foo|789] becomes [foo|789]

That’s Kafka. Compose.io supports Scylla, RabbitMQ & Redis. Bluemix.net also has Cloudant & DashDB. Depending upon use case any of these may be a better choice than Kafka. IMO, Bluemix.net needs something equivalent to Google PubSub or Azure Queue Storage… and should avoid looking at AWS SQS for inspiration (as that system dictates that all message consumers be idempotent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment