Skip to content

Instantly share code, notes, and snippets.

@frsyuki
Created June 11, 2012 02:36
Show Gist options
  • Save frsyuki/2908191 to your computer and use it in GitHub Desktop.
Save frsyuki/2908191 to your computer and use it in GitHub Desktop.
My thoughts on MessagePack

My thoughts on MessagePack

Hi. My name is Sadayuki "Sada" Furuhashi. I am the author of the MessagePack serialization format as well as its implementation in C/C++/Ruby.

Recently, MessagePack made it to the front page of Hacker News with this blog entry by Olaf, the creator of the Facebook game ZeroPilot. In the comment thread, there were several criticisms for the blog post as well as MessagePack itself, and I thought this was a good opportunity for me to address the questions and share my thoughts.

My high-level response to the comments

To the best of my understanding, roughly speaking, the criticisms fell into the following two categories.

  1. MessagePack may not be the best choice for client-side serialization as described by the blog author.

  2. MessagePack makes boastful claims about its performance. Several comments made a reference to the graph comparing MessagePack to Protocol Buffer and JSON.

I perfectly agree with the first point. When I conceived MessagePack in junior year of college, I never imagined that it would be used in a browser (It was originally conceived for a distributed file system I was writing at the time). While I am pleased to see MessagePack's wider adoption, its pros and cons should be carefully considered, and there are many situations where it simply does not offer enough advantage to JSON. That said, it is true that MessagePack is more space-efficient than JSON (being binary, more efficient storage of small integers, etc.), and if that's something that your project requires, MessagePack is a quite attractive choice.

With regards to 2, it was an editorial shortcoming on my part. I still stand by the claim that MessagePack can be four times as fast as Protocol Buffer (the implementation open-sourced by Google), but I should have done a better job explaining the context. I posted that comparison to highlight the power of "zero-copy" implemented in MessagePack's C/C++ library, and I should have said so. This has been fixed since then.

Of course, there are cases where other serializers (including Protocol Buffer) outperform MessagePack in terms of serialization/deserialization speed. As we move forward, I intend to create/consolidate a well-annotated, detailed set of spatial/temporal performance benchmarks so that users can have a better understanding of what MessagePack can and cannot offer. (If anyone would like to help us do this, it would be greatly appreciated!)

MessagePack's known use cases

As I mentioned earlier, MessagePack is no panacea. However, I believe it is a great tool for certain tasks, and I would like to share some of MessagePack's use cases to highlight what MessagePack is good for.

  1. Space-efficient storage for Memcache entries (Pinterest).

    At Pinterest, MessagePack is used to serialize a list of 64-bit ID's as a Memcache entry. According to them, MessagePack allows them to compress their 64-bit ID's to 5 bytes on average (5/8 ‾ 62.5%).

  2. For an RPC mechanism (DotCloud's zerorpc).

    DotCloud recently open-sourced zerorpc, an RPC mechanism built on ZeroMQ and MessagePack. This use case is fairly close to my original intent. When one is designing an RPC system, one of the first tasks is to specify and implement a communication protocol. This process can get pretty hairy as you need to worry about a lot of low-level issues like Endian-ness. By using MessagePack, one can skip designing and implementing a communication protocol entirely and accelerate development.

A few words on MessagePack's culture

To wrap things up, let me describe my view of MessagePack's culture.

  1. The MessagePack Project is highly decentralized. I originally came up with the format and implemented it for C/C++/Ruby (and I continue to maintain it for these languages), but each language has its own project leader and develops at its own pace. I find this model to be more effective than having a huge, monolithic body of committers as it keeps each project nimble and autonomous.

  2. The MessagePack Project believes in craftsmanship and expertise. While space efficiency of MessagePack comes from the format, speed is the result of sustained efforts by expert hackers who relentlessly optimized implementations of different programming languages. Over the past three years, I have seen them achieve remarkable performance gains that are only possible with an intimate knowledge of language internals. I have a lot of faith in their abilities to continue making MessagePack better and faster.

While I am proud of all the things we have achieved with everyone involved in the project, MessagePack still has long ways to go. There are a couple of projects in the pipeline. For example, we are working on a website redesign as well as better documentation and repository reorganization.

Thank you for reading this far, and happy hacking!

Sada

@gulbanana
Copy link

I have built an RPC system around MessagePack, and it works well for me.

@willsmith9182
Copy link

I have built an RPC system around MessagePack, and it works well for me.

Me three. MsgPack + RabbitMQ

@amcgregor
Copy link

amcgregor commented Dec 4, 2022

Impressive that this still lives. Thread necro is real.

Me not four. I replaced RabbitMQ, ZeroMQ, PostgreSQL, Redis, and Memcache/Membase on one project (the first one I was brought on to after moving across the country) with MongoDB. Many → one. BSON serialization is not terrible, achieving a hair shy of ten million dRPC calls per second 11 years ago. My comment from 2014 was shopping around for alternative comms/serialization.

Which was not found.

If it's being called out for being bad, it must be good! — @jobs-git

This sentiment is concerning. Rather think it's past time to unsubscribe from this.

@collimarco
Copy link

Is there anything similar to https://jsonlines.org but for msgpack?

Also, is there any way to compress a msgpack file that contains thousands of rows (like a log file) with gzip or zstd? Or should I simply apply the compression/decompression and then use msgpack separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment