The resources doc has a lot of good stuff, but no guidance. This reading list is meant to be a more guided, opinionated path for learning about stream data processing.
Some works are accompanied by alternative options and/or responses, but those are completely
optional. If possible, try the main works first.
The Foundational Monograph
The Log: What every software engineer should know about real-time data's unifying abstraction by Jay Kreps (December 2013) kicked it all off for me. A seminal work.
The (Conference Talks made into Articles made into a) Book that Fills in all the Gaps
Making Sense of Stream Processing by Martin Kleppmann (March 2016) is a free ebook that compiles many of Kleppmann’s brilliant articles (based on his brilliant talks) on this topic.
This is a fantastic book that covers everything from theory to practice, history to the future. It’s
all broken down into small incremental ideas and clearly explained.
If you’d prefer to start with videos of Kleppmann’s talks, I recommend starting with these:
- Turning the database inside out with Apache Samza describes how we might reimagine what a database is and reshape the entire Web application stack with event streams at every level. I heard this in person and it blew my mind.
- Staying agile in the face of data deluge illustrates that “using the right tool for the right job” can lead to incredibly complex and fragile application architectures, and how streaming data can simplify.
- Systems that enable data agility
- Samza and the Unix philosophy of distributed systems
- Data liberation and data integration with Kafka
For more, see Kleppmann’s playlist of all his talks on YouTube.
The Treatise (on Why and How Stream Data Processing Might be the Future of Application Development)
Introducing Kafka Streams: Stream Processing Made Simple by Jay Kreps (March 2016) Explains the why of the new Kafka Streams framework, and in doing so dives deep into what is all this stuff, really, and why does it matter, and what does it mean for application development — brilliant.
A Broader, Cogent, and Less Kafka-Centric Perspective
The world beyond batch: Streaming 101 by Tyler Akidau (August 2015) is a super-helpful alternative perspective that didn’t come out of LinkedIn but rather Google. Akidau has worked for years on data processing systems at Google, including MillWheel, and Cloud Dataflow, and Apache Beam. I haven’t yet read part 102 but suspect it will be similarly illuminating.