Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Why I was previously not a fan of Apache Kafka

Update, September 2016

OK, you can pretty much ignore what I wrote below this update, because it doesn't really apply anymore.

I wrote this over a year ago, and at the time I had spent a couple of weeks trying to get Kafka 0.8 working with .NET and then Node.js with much frustration and very little success. I was rather angry. It keeps getting linked, though, and just popped up on Hacker News, so here's sort of an update, although I haven't used Kafka at all this year so I don't really have any new information.

In the end, we managed to get things working with a Node.js client, although we continued to have problems, both with our code and with managing a Kafka/Zookeeper cluster generally. What made it worse was that I did not then, and do not now, believe that Kafka was the correct solution for that particular problem at that particular company. What they were trying to achieve could have been done more simply with any number of other messaging systems, with a subscriber reading messages off and writing them to some form of persistent storage (like Elasticsearch). I'm sure there are issues of scale or whatever where Kafka makes sense.

It is true, as many people have pointed out in the comments, that my primary problem was the lack of a good Kafka client for .NET. If I'd been able to install a Kafka Nuget package and it had just worked, this would never have been written. But I couldn't. Today I could probably use a thin wrapper around librdkafka, and if I ever have to work with Kafka from .NET again, that's probably what I'll do. C/C++ libraries are great for stuff like that: C can talk to anything, and everything can talk to C. Yay.

I do understand the performance-related reasons that drove the decision to design a clever-client architecture, but it was, apparently, extremely difficult to create a good client unless you were working with either Java, or with a lower-level language such as C or Go which could work with the complex protocols and implementation requirements.

So, anyway, like I said, you can ignore the stuff below which was written about an old version of the software, while I was in a very bad mood. But I'm going to leave it here, in the hopes that it may serve as a warning to future developers of really complicated infrastructure components. It probably won't, though.

Yesterday on Twitter I opined that Apache Kafka is not good software.

Obviously this suggestion requires more explanation than is reasonable in 140 characters, so here goes:


It's not a server application; it's a Java library with a server component.


I'm not 100% sure that everything here is accurate; if I have made any factual errors that you would like to correct please comment or fork or whatever.


Kafka is managed by the Apache Foundation, but it was originally created by LinkedIn for internal use. LinkedIn are heavy Java/JVM users; as I understand it, a lot of their infrastructure was built with Scala, and now they're going all Java 8. So they wrote their custom-built distributed message bus in Scala, because why wouldn't they?

How (I think) Kafka works.

The problem is that in what I would guess was an attempt to maximise performance on the server, they built a lot of the complexity of dealing with distributed, clustered systems into the client code. Kafka clients indirectly connect to all the nodes in a Kafka cluster by first talking to another system, ZooKeeper, which is a distributed configuration/synchronisation service. ZooKeeper tells the client where the nodes are and which node is the Leader for a particular topic, and then the client opens a TCP socket to those nodes and talks a binary protocol to them. This is what Kafka does instead of just sitting behind a load balancer like a normal server.

Sending messages

You get messages onto the bus via the Produce API, which for reasons of efficiency allows you to submit multiple messages (a "Message Set") at a time (and compress them if you want to).

When you Produce a Message Set onto the bus, you don't directly get back a response telling you that the messages have successfully been persisted to one or more partitions. Instead, you must also Consume the bus, and you should eventually receive multiple messages acknowledging the persistence of each message in the set.

Partition tolerance

If a Node dies then a "leadership election" happens, ZooKeeper is updated with the new metadata, and your application must react to this and handle the changes. There's a six second delay while this happens, and who knows what happens if you try and send messages to a dead node during that time.

If you are writing your application in Java and using the official Kafka client, or a JVM language with a library which correctly provides a thin wrapper around the official Kafka client, such as clj-kafka, then this complexity is largely handled for you.

If, on the other hand, you are writing your application in any other language(s), then as far as Kafka is concerned you can go piss up a rope:

How The Kafka Project Handles Clients

Starting with the 0.8 release we are maintaining all but the jvm client external to the main code base. The reason for this is that it allows a small group of implementers who know the language of that client to quickly iterate on their code base on their own release cycle. Having these maintained centrally was becoming a bottleneck as the main committers can't hope to know every possible programming language to be able to perform meaningful code review and testing. This lead to a scenario where the committers were attempting to review and test code they didn't understand.

We are instead moving to the redis/memcached model which seems to work better at supporting a rich ecosystem of high quality clients.

We haven't tried all these clients and can't vouch for them. The normal rules of open source apply.

If you are aware of other clients not listed here (or are the author of such a client), please add it here. Please also feel free to help fill out information on the features the client supports, level of activity of the project, level of documentation, etc.

Why this is a problem

It's the "redis/memcached model", apparently. Now, I'm mostly a C# developer so as far as Redis goes I'm pretty spoiled with the ServiceStack and StackExchange clients to choose from, but take a look at the Redis clients page compared to the Kafka clients page. The Redis page offers recommendations, smiley faces and stars, and there are a lot of clients there, because Redis is a mature and widely-used technology. By comparison, the Kafka page tells you whatever the client's author said, without even a cursory attempt at curation or guidance, and there's really not a lot of choice, probably because (a) Kafka is still pretty new, and (b) they appear to break the protocol/API on a regular basis.

I also don't know how Kafka's complexity regarding Producers, Consumers, Nodes, Partitions, ZooKeepers and whatever else is at play compares to the complexity of Redis or Memcached, but it's pretty complicated. I briefly skimmed the Protocol Guide and browsed the code for the official Java client, and I think I died a little bit inside. It's not something for which you're going to invest the time writing a client unless you really, really want to use it, over and above the many existing message bus/queue/distributed commit log servers that already exist and are generally much easier to work with.

I understand from a colleague that the Go client created by Shopify is pretty good.

The thing that pushed me over the edge and caused me to tweet that I hate Kafka was reading this post about Kafka on LinkedIn, which says that the "Kafka Ecosystem at LinkedIn" includes

  1. A REST interface: This interface enables non-java applications to easily publish and consume messages from Kafka using a thin client model.

That's right; they know trying to write clients in other languages is a task that the gods would not have wished upon Sisyphus, so they've written a server, presumably in Java or Scala, that takes care of it for you.


Final thoughts

If you are using Java/Scala/Clojure/Kotlin/whatever and can use the Official Java Client then I'm sure Kafka is a perfectly reasonable choice for a message bus, although there are plenty of others that seem to me to be far less bloody-minded.

If, like Shopify, you have the resources and inclination to write your own client in your chosen language (Go, in their case) and maintain it in the face of breaking changes to the protocol, and you think Kafka sounds wonderful, then go ahead, I guess.

If you do not fit into either of those categories, then I would strongly suggest that you investigate other alternatives, such as Redis, RabbitMQ, or perhaps a cloud-based queuing system such as Azure Storage Queues or Service Bus (which you can also run on-prem) or Amazon SQS or SNS.

If what you're actually after is a distributed commit log, then maybe the HTTP-compliant EventStore would work for you.

Like I say, if I'm way off the mark in any of what I've written here, please let me know.

Copy link

jkreps commented Nov 9, 2015

A number of people continue to point me to this, so I thought it might be worth adding a few relevant points. This critique is correct on several points:

  1. Kafka historically relied on a fairly think consumer client that directly accessed the servers that had data to consumer, rather than routing everything through a proxy layer. This design trades client complexity for efficiency.
  2. The clients are separate open source projects.

One correction is probably worth making:

  1. Kafka has been extremely disciplined about backwards compatibility. The protocol comes with versioning and changes are always implemented in a way that supports both the old and new version and can be rolled out without downtime. In the five year history of the project we did one backwards incompatible release--the break from 0.5.x-0.7.x to 0.8.x. This was done intentionally to allow us to refactor the apis. I think this is a pretty good track record.

To address the critiques it's worth noting that for the 0.9 release (in progress now) we have made the consumer group management protocol part of the core Kafka protocol rather than having the consumer directly access zookeeper. This was part of an effort over a couple years to detach the clients from direct ZK access. That effort is now complete and substantially thins out the consumer clients.

It's worth also addressing why Kafka clients directly access nodes in the cluster rather than requiring a proxy layer. The reason we do this is to allow very high throughput, partition aware processing. This is really required for use cases like stream processing that need to process data efficiently, especially in cases where you are reprocessing data. You can always build a proxy layer on top of direct access but not vice versa.

Confluent (where I work) is doing two things that help the non-java client ecosystem:

  1. We maintain an open source REST proxy that provides decoupled access (albeit with a little overhead compared to the direct clients)
  2. We are picking up work on the non-java clients to bring them into parity with the java clients.

Both of these efforts are open source and apache licensed and included in the Confluent Platform distribution of Kafka.

Copy link

dkhofer commented Nov 9, 2015

Minor quibble: Although SNS and SQS are perfectly fine services, AWS provides one called Kinesis that I believe is meant to be its alternative to Kafka.

Copy link

h12w commented Nov 14, 2015

I have written a Go client for Kafka recently (, and here is my thought:

  1. Kafka server is good: distributed, efficient and stable, its design philosophy is simple and profound.
  2. The low level API is good enough, though the specification is not well written.
  3. Kafka Java client sucks, especially the high level API, and the clients in other languages are worse.

What Kafka needs is an improvement to its low level API and a good client that provides middle level API with good quality. High level API is not useful at all and should be abandoned.

Copy link

daluu commented Dec 16, 2015

@h12w, might you elaborate on thought #3, why high level API (or high level consumer I also assume) sucks (compared to the low level API/consumer)?

Copy link

rovarghe commented Jan 12, 2016

Rather than this being a critique of kafka design, its just basically saying - there are no good clients in my language (C#?), therefore Kafka sucks. I rather like the fact that efficiency was chosen over complexity in letting clients talk directly to the node. Thats more closer to the redis philosophy - which I happen to like.

Copy link

CloudMarc commented Jan 13, 2016

By the way redis cluster doesn't use ZooKeeper or any other external dependencies. Git clone && make, boom.

However, the clients and tools suffer the same criticism - not yet cluster-mature clients, and the tools are a little unwieldy.

People point out that kafka has persistence so you can get behind in processing your high volume queues by hours or days. For real-time work (e.g. anything older than an hour is worthless), this seems like a disadvantage.

Copy link

akauppi commented Jan 26, 2016

@CloudMarc The Kafka Design section is rather good, and explains e.g. about the Persistence choice. To me, the logic seems solid (trusting the OS to be good) and I don't think real time work would suffer from the choice. Have a look. :)

Copy link

siepkes commented Mar 2, 2016

@markrendle Isn't this the REST API you are talking about ?

The Redis client page can hardly be called a good example. I mean come on, look at this: . 3 Java clients with a star. A star means "Recommend client". So which of these 3 clients am I supposed to pick? Also some clients have a star but no text whatsoever (like jedis) that makes picking even harder.

Copy link

akauppi commented Apr 2, 2016

I now read all of your gist through.

Like I say, if I'm way off the mark in any of what I've written here, please let me know.

Well, you could start with a reference to which Kafka version this critique applies to. I think it's based on Kafka 0.8 and I do think 0.9 fixes all of the downsides you are listing. Would like to go through the remaining once (that you think are still valid) with you, if you have time.

ps. Also the link to the referred tweet no longer works, so readers have no way to know the context and relevance of your rant.

Copy link

richardqiao2000 commented Jun 25, 2016

I understand this is struggling a .net echosystem developer trying to work with java echosystem. It's actually a common concern but not specified to Kafka.
This is also worth debating if centralized server + load balancer is a better practice in efficiency than Zookeeper + Node TCP. It's easier to develop a client but could lose efficiency too.

Copy link

mcqj commented Jul 15, 2016

It's not just .net; it's largely anything other than Java.
Take a look at the node.js clients that are listed - they're listed as supporting Kafka 0.8; that's not similar to Redis

Copy link

bklooste commented Jul 29, 2016

@CloudMarc When you have clients with NO connectivity for a while then exchanging critical messages its rather important.. For true real time you are often better of relying on tcp sockets.

None of the solutions I have seen im happy with - so its a case of pick your Poison :-(

Rabbit type queues have all the per message persistence queue performance issues , complexity managing many nodes , poison messages etc. I much prefer the route to a data table and pull it . Ok for simple things like jobs . In CQRS speak good for commands not events.

Azure Event Hubs , doest have topic subscription , is Azure only and its proxy communication is a bit tricky ,. Focus is on gathering data from many clients and pumping to big data.

Khafka has longer persistence , replays but is not so nice in Windows /Azure environment . It does allow branch network type support.

EasyRedisMQ looks the best , but its immature / small scale and no persistence guarantees

Copy link

sumanthreddy480 commented Aug 11, 2016

How consumer will receive messages from queue. can anybody provide a java example code for apache kafka queue.(0.9.0.X.)

Copy link

gavinbaumanis commented Sep 2, 2016

This is a genuine question and not meant to be inflammatory...
What does it matter that Kafka uses Zookeeper?
Why would you bother to choose to solve a problem that has already been solved? Especially when Zookeeper does what you need?

I don't pretend to have any insight - there are people here with much greater use of Kafka / messaging systems and perhaps 'see" issues that I just don't?

Copy link

dalegaspi commented Sep 5, 2016

this opinion piece is mostly stemmed from the author's failure to understand how Kafka works. if you do your research to design (not even build)--either with home-grown code or combination of OTF productcs--with the goal of trying to replicate what Kafka does with high availability, throughput, and scalability, you will realize how amazing this piece of software is, and appreciate why it's using something like ZooKeeper instead of "sitting behind a load balancer like a normal server."

the only thing close (in terms of function) is Amazon's Kinesis but it gets ungodly expensive if you have massive amount of data to process.

Copy link

chrisdlangton commented Jun 26, 2017

Came across this looking for an alternative to Kafka due primarily that Kafka is most suited to Java or Java based stacks and I'm using go for this project.

I'm quite battle hardened with in memory solutions and how the linux system calls work with file descriptors (streams), signals, and processes/sessions.

Having said this, it's quite clear that the suggestions to check out redis as an alternative is by far the most ill-informed and misleading conclusion (no offence, just being concise) and that's not due to knowledge of the skills I've listed I have, it's due to just one basic "key" concept CONSISTENCY.

Redis Implementations LRU, which is highly inappropriate for any system looking for any flavour of consistency.

Consistency is why we look at Kafka, anyone can attempt streams and availability with various types of success but consistency, specifically defining and operating a model for your consistency is by far the hardest to implement with any degree of success, realistically with any business that does more than just offer a stream product. If your not familiar with Kafka's consistency model read all about it here;

Much better then I could explain in a comment.

Basically, for any alternative to be an alternative of Kafka it needs to not only avoid consistency concerns (like LRU in redis) but also have a good grasp of how it achieves consistency and be able to meet that measurably.

I'll be speaking with Sajari to learn how they've tackled this in their go application, or spend time going though their GitHub...

Copy link

ijuma commented Jul 20, 2017

@chrisdlangton is a high quality Go client for Kafka for what it's worth.

Copy link

suikast42 commented Jul 27, 2017

@h12w Why the 3 level api should be not useful? I think for ERP connections or save time for technology evaluations ( which is a software engineers dream ) are good disciplines for that level API 's. The kafka guy is right. You can make everything if your core ecosystem is stable and fast enough.

Copy link

koshatul commented Apr 12, 2018

@ijuma I agree it's a high-quality go client for Kafka, but it also has a binary dependency (pkg-config/librdkafka) which makes deployments a little tricky.

Copy link

Anonymous-Coward commented Aug 29, 2018

@rovarghe Efficiency is not always the whole story. In fact, in most modern systems it's almost always a secondary factor. Spending a month on getting a new feature into production is a lot more expensive than adding 10 more nodes to a rabbitmq cluster now and shipping tomorrow. Therefore, if you need to implement a complex protocol because there are no good clients for your language, a less efficient but easier to consume technology might make more sense, from a business point of view. Efficiency is an issue with long-lived systems that have to run on hardware-constrained systems, such as autonomous sensors. But Kafka (or any of the would-be competitors) are hardly a product to run on such systems. In other contexts, such as powerful servers (cloud or not doesn't make a difference) that run a system that is actively changed and extended, as long as it can scale up to what I need, the easier to use technology always wins, IME. It simply makes more money for the business by reducing time to market.

Copy link

ekeyser commented Sep 19, 2021

Good read. IIRC my troubles w/ Kafka and Zookeeper were when attempting to implement in a scalable manner. This was about the time of this article so I forget the exact details but I came away with the experience that if you ever need to scale up or down the instances of Zookeeper and/or Kafka you are asking for trouble. This is coming from a infrastructure viewpoint not a software development viewpoint of this article but I think there's overlap as it relates to the architecture of Zookeeper/Kafka. Maybe it's changed. Maybe I wasn't doing something according to best practice or in a correct manner. I came away disgusted with the Zookeeper/Kafka stack. The whole point of some of these servers is to handle complexity. I never understood why Zookeeper had to stop and take a breather and almost fall over anytime a new node came online or dropped off. Jesus. Same w/ Kafka - either gracefully or due to node failure of some sort. Yes, I get that it's non-trivial and challenging work but then why are you writing a synchronization servers and real-time stream message brokers and intertwining them if scale isn't a major selling point. Something isn't scalable if you have to constantly run 100 servers and keep them up no matter what the load is just so you can handle peak load.

Basically, don't use in a cloud env. Use Kinesis or whatever GCP/Azure offer for streaming message brokers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment