MessageHub is Kafka
Kafka/MessageHub is a distributed log. It scales writes via partitioning data. You can write a custom partitioner, though the DefaultPartitioner is often sufficient for most applications.
The DefaultPartitioner uses the message key to determine which partition to write a message. This allows control of where messages are written (important for ConsumerGroups…). This also means that Kafka is susceptible to hot partitions, where more messages are being routed to a partition than can be supported by the disk IO of the Kafka nodes hosting that partition.
Writes are also replicated (recommended 3 replicas). If there’s not a quorum of ISR (in-sync replicas) writes will be refused for that partition (Kafka maintains consistency within data partitions).
There are two styles of consumers in Kafka: 1.) a stateless or low-level API where the consumer tracks it’s read position on the partition(s) being consuming 2.) a high-level, consumer group API where Kafka tracks it’s read position but allows at most 1 consumer per partition.
The consumer group API looks like a queue, but it has many limitations not present in a queue: 1.) Messaging is only ordered within a partition…not across all topics. 2.) The dequeue is exclusive…a single consumer at a time. Which means if you need a shared queue, you have to implement your own dequeue and routing.
It’s best to think of the ConsumerGroup API as a non-shared message iterator with checkpointing support for fail-over/fault tolerance.
Also, a distinct advantage of Kafka ConsumerGroup versus a traditional queue is that you can rewind to a previous offset or time. The data written to Kafka is immutable…
Though it’s immutable, there is retention settings and compaction. Retention removes old data, compaction removes all but the newest values for a given message key.
Compaction: [foo|123],[foo|456],…[foo|789] becomes [foo|789]
That’s Kafka. Compose.io supports Scylla, RabbitMQ & Redis. Bluemix.net also has Cloudant & DashDB. Depending upon use case any of these may be a better choice than Kafka. IMO, Bluemix.net needs something equivalent to Google PubSub or Azure Queue Storage… and should avoid looking at AWS SQS for inspiration (as that system dictates that all message consumers be idempotent).