Skip to content

Instantly share code, notes, and snippets.

@antirez
Last active February 2, 2018 14:15
Show Gist options
  • Save antirez/2f1ac5032498177a57f538b22cf1b8e7 to your computer and use it in GitHub Desktop.
Save antirez/2f1ac5032498177a57f538b22cf1b8e7 to your computer and use it in GitHub Desktop.
Why streams have elements that are actually like hashes?

Why stream items are small hashes instead of single strings like many other Redis types elements is a good question indeed. At the end it's just a design decision, so I don't have the definitive answer. However I can try to explain the design process leading to this design.

What I wanted "Streams" to be, was actually just an Abstract Log. I was not able to call the data structure "log" because it's confusing in many contexts, but that was the idea, and a log better represents what Redis Streams are. Perhaps it's the consumer groups part of the Redis Streams that better characterize the streaming part, but the data structure itself is a log.

Now, what constitutes a log? In the original form, is just lines of text ending with "\n", one after the other, added in an append only fashion. But in general is some data in append only mode.

XADD captures this append only mode of operation. While we have more powerful deletion mechanisms, and will add more, but that is the general idea. However it could just be XADD <element>, but I wanted to turn the log into an abstract log. What it means to be abstract?

Abstraction in this context is to capture what people usually try to accomplish with logs, and transform it in a conceptually equivalent thing cleaned of all the limits that are just a side effect of the current implementation. The reality is that people almost always have fields in their log lines, or binary log files, or whatever logging they are doing: be this binary fields, or CSV, or JSON items, it's a problem that exists, again and again, and there is to always find a suitable encoding. So in the process of abstracting the log, I thought that it was a good idea to say, actually log lines are an aggregated data type itself, and specifically an ordered set of field-value pairs. As a side effect, this allows Redis to compress streams much better, because there is more structure to exploit: same fields consecutive items can be trivially compressed this way, removing the fields at all and using a single bit to say: same fields as before. Finally, time series, one of the most important use cases for Redis Streams, are very well expressed using this API. It helps that because of the encoding used in streams, in case you don't need the fields, you can just have a single filed being the empty string, and put your string as value, and it will have an overhead of just one byte compared to not having support for fields.

I hope this helps! Thanks for the question, Salvatore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment