Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Why I was previously not a fan of Apache Kafka

Update, September 2016

OK, you can pretty much ignore what I wrote below this update, because it doesn't really apply anymore.

I wrote this over a year ago, and at the time I had spent a couple of weeks trying to get Kafka 0.8 working with .NET and then Node.js with much frustration and very little success. I was rather angry. It keeps getting linked, though, and just popped up on Hacker News, so here's sort of an update, although I haven't used Kafka at all this year so I don't really have any new information.

In the end, we managed to get things working with a Node.js client, although we continued to have problems, both with our code and with managing a Kafka/Zookeeper cluster generally. What made it worse was that I did not then, and do not now, believe that Kafka was the correct solution for that particular problem at that particular company. What they were trying to achieve could have been done more simply with any number of other messaging systems, with a subscriber reading messages off and writing them to some form of persistent storage (like Elasticsearch). I'm sure there are issues of scale or whatever where Kafka makes sense.

It is true, as many people have pointed out in the comments, that my primary problem was the lack of a good Kafka client for .NET. If I'd been able to install a Kafka Nuget package and it had just worked, this would never have been written. But I couldn't. Today I could probably use a thin wrapper around librdkafka, and if I ever have to work with Kafka from .NET again, that's probably what I'll do. C/C++ libraries are great for stuff like that: C can talk to anything, and everything can talk to C. Yay.

I do understand the performance-related reasons that drove the decision to design a clever-client architecture, but it was, apparently, extremely difficult to create a good client unless you were working with either Java, or with a lower-level language such as C or Go which could work with the complex protocols and implementation requirements.

So, anyway, like I said, you can ignore the stuff below which was written about an old version of the software, while I was in a very bad mood. But I'm going to leave it here, in the hopes that it may serve as a warning to future developers of really complicated infrastructure components. It probably won't, though.

Yesterday on Twitter I opined that Apache Kafka is not good software.

Obviously this suggestion requires more explanation than is reasonable in 140 characters, so here goes:

TL;DR

It's not a server application; it's a Java library with a server component.

Disclaimer

I'm not 100% sure that everything here is accurate; if I have made any factual errors that you would like to correct please comment or fork or whatever.

Kafka

Kafka is managed by the Apache Foundation, but it was originally created by LinkedIn for internal use. LinkedIn are heavy Java/JVM users; as I understand it, a lot of their infrastructure was built with Scala, and now they're going all Java 8. So they wrote their custom-built distributed message bus in Scala, because why wouldn't they?

How (I think) Kafka works.

The problem is that in what I would guess was an attempt to maximise performance on the server, they built a lot of the complexity of dealing with distributed, clustered systems into the client code. Kafka clients indirectly connect to all the nodes in a Kafka cluster by first talking to another system, ZooKeeper, which is a distributed configuration/synchronisation service. ZooKeeper tells the client where the nodes are and which node is the Leader for a particular topic, and then the client opens a TCP socket to those nodes and talks a binary protocol to them. This is what Kafka does instead of just sitting behind a load balancer like a normal server.

Sending messages

You get messages onto the bus via the Produce API, which for reasons of efficiency allows you to submit multiple messages (a "Message Set") at a time (and compress them if you want to).

When you Produce a Message Set onto the bus, you don't directly get back a response telling you that the messages have successfully been persisted to one or more partitions. Instead, you must also Consume the bus, and you should eventually receive multiple messages acknowledging the persistence of each message in the set.

Partition tolerance

If a Node dies then a "leadership election" happens, ZooKeeper is updated with the new metadata, and your application must react to this and handle the changes. There's a six second delay while this happens, and who knows what happens if you try and send messages to a dead node during that time.

If you are writing your application in Java and using the official Kafka client, or a JVM language with a library which correctly provides a thin wrapper around the official Kafka client, such as clj-kafka, then this complexity is largely handled for you.

If, on the other hand, you are writing your application in any other language(s), then as far as Kafka is concerned you can go piss up a rope:

How The Kafka Project Handles Clients

Starting with the 0.8 release we are maintaining all but the jvm client external to the main code base. The reason for this is that it allows a small group of implementers who know the language of that client to quickly iterate on their code base on their own release cycle. Having these maintained centrally was becoming a bottleneck as the main committers can't hope to know every possible programming language to be able to perform meaningful code review and testing. This lead to a scenario where the committers were attempting to review and test code they didn't understand.

We are instead moving to the redis/memcached model which seems to work better at supporting a rich ecosystem of high quality clients.

We haven't tried all these clients and can't vouch for them. The normal rules of open source apply.

If you are aware of other clients not listed here (or are the author of such a client), please add it here. Please also feel free to help fill out information on the features the client supports, level of activity of the project, level of documentation, etc.

Why this is a problem

It's the "redis/memcached model", apparently. Now, I'm mostly a C# developer so as far as Redis goes I'm pretty spoiled with the ServiceStack and StackExchange clients to choose from, but take a look at the Redis clients page compared to the Kafka clients page. The Redis page offers recommendations, smiley faces and stars, and there are a lot of clients there, because Redis is a mature and widely-used technology. By comparison, the Kafka page tells you whatever the client's author said, without even a cursory attempt at curation or guidance, and there's really not a lot of choice, probably because (a) Kafka is still pretty new, and (b) they appear to break the protocol/API on a regular basis.

I also don't know how Kafka's complexity regarding Producers, Consumers, Nodes, Partitions, ZooKeepers and whatever else is at play compares to the complexity of Redis or Memcached, but it's pretty complicated. I briefly skimmed the Protocol Guide and browsed the code for the official Java client, and I think I died a little bit inside. It's not something for which you're going to invest the time writing a client unless you really, really want to use it, over and above the many existing message bus/queue/distributed commit log servers that already exist and are generally much easier to work with.

I understand from a colleague that the Go client created by Shopify is pretty good.

The thing that pushed me over the edge and caused me to tweet that I hate Kafka was reading this post about Kafka on LinkedIn, which says that the "Kafka Ecosystem at LinkedIn" includes

  1. A REST interface: This interface enables non-java applications to easily publish and consume messages from Kafka using a thin client model.

That's right; they know trying to write clients in other languages is a task that the gods would not have wished upon Sisyphus, so they've written a server, presumably in Java or Scala, that takes care of it for you.

AND HAVE THEY OPEN SOURCED THIS MAGICAL SERVER? NO, THEY BLOODY HAVEN'T.

Final thoughts

If you are using Java/Scala/Clojure/Kotlin/whatever and can use the Official Java Client then I'm sure Kafka is a perfectly reasonable choice for a message bus, although there are plenty of others that seem to me to be far less bloody-minded.

If, like Shopify, you have the resources and inclination to write your own client in your chosen language (Go, in their case) and maintain it in the face of breaking changes to the protocol, and you think Kafka sounds wonderful, then go ahead, I guess.

If you do not fit into either of those categories, then I would strongly suggest that you investigate other alternatives, such as Redis, RabbitMQ, or perhaps a cloud-based queuing system such as Azure Storage Queues or Service Bus (which you can also run on-prem) or Amazon SQS or SNS.

If what you're actually after is a distributed commit log, then maybe the HTTP-compliant EventStore would work for you.

Like I say, if I'm way off the mark in any of what I've written here, please let me know.

@koshatul
Copy link

koshatul commented Apr 12, 2018

@ijuma I agree it's a high-quality go client for Kafka, but it also has a binary dependency (pkg-config/librdkafka) which makes deployments a little tricky.

@Anonymous-Coward
Copy link

Anonymous-Coward commented Aug 29, 2018

@rovarghe Efficiency is not always the whole story. In fact, in most modern systems it's almost always a secondary factor. Spending a month on getting a new feature into production is a lot more expensive than adding 10 more nodes to a rabbitmq cluster now and shipping tomorrow. Therefore, if you need to implement a complex protocol because there are no good clients for your language, a less efficient but easier to consume technology might make more sense, from a business point of view. Efficiency is an issue with long-lived systems that have to run on hardware-constrained systems, such as autonomous sensors. But Kafka (or any of the would-be competitors) are hardly a product to run on such systems. In other contexts, such as powerful servers (cloud or not doesn't make a difference) that run a system that is actively changed and extended, as long as it can scale up to what I need, the easier to use technology always wins, IME. It simply makes more money for the business by reducing time to market.

@ekeyser
Copy link

ekeyser commented Sep 19, 2021

Good read. IIRC my troubles w/ Kafka and Zookeeper were when attempting to implement in a scalable manner. This was about the time of this article so I forget the exact details but I came away with the experience that if you ever need to scale up or down the instances of Zookeeper and/or Kafka you are asking for trouble. This is coming from a infrastructure viewpoint not a software development viewpoint of this article but I think there's overlap as it relates to the architecture of Zookeeper/Kafka. Maybe it's changed. Maybe I wasn't doing something according to best practice or in a correct manner. I came away disgusted with the Zookeeper/Kafka stack. The whole point of some of these servers is to handle complexity. I never understood why Zookeeper had to stop and take a breather and almost fall over anytime a new node came online or dropped off. Jesus. Same w/ Kafka - either gracefully or due to node failure of some sort. Yes, I get that it's non-trivial and challenging work but then why are you writing a synchronization servers and real-time stream message brokers and intertwining them if scale isn't a major selling point. Something isn't scalable if you have to constantly run 100 servers and keep them up no matter what the load is just so you can handle peak load.

Basically, don't use in a cloud env. Use Kinesis or whatever GCP/Azure offer for streaming message brokers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment