Skip to content

Instantly share code, notes, and snippets.

@frsyuki
Created June 11, 2012 02:36
Star You must be signed in to star a gist
Save frsyuki/2908191 to your computer and use it in GitHub Desktop.
My thoughts on MessagePack

My thoughts on MessagePack

Hi. My name is Sadayuki "Sada" Furuhashi. I am the author of the MessagePack serialization format as well as its implementation in C/C++/Ruby.

Recently, MessagePack made it to the front page of Hacker News with this blog entry by Olaf, the creator of the Facebook game ZeroPilot. In the comment thread, there were several criticisms for the blog post as well as MessagePack itself, and I thought this was a good opportunity for me to address the questions and share my thoughts.

My high-level response to the comments

To the best of my understanding, roughly speaking, the criticisms fell into the following two categories.

  1. MessagePack may not be the best choice for client-side serialization as described by the blog author.

  2. MessagePack makes boastful claims about its performance. Several comments made a reference to the graph comparing MessagePack to Protocol Buffer and JSON.

I perfectly agree with the first point. When I conceived MessagePack in junior year of college, I never imagined that it would be used in a browser (It was originally conceived for a distributed file system I was writing at the time). While I am pleased to see MessagePack's wider adoption, its pros and cons should be carefully considered, and there are many situations where it simply does not offer enough advantage to JSON. That said, it is true that MessagePack is more space-efficient than JSON (being binary, more efficient storage of small integers, etc.), and if that's something that your project requires, MessagePack is a quite attractive choice.

With regards to 2, it was an editorial shortcoming on my part. I still stand by the claim that MessagePack can be four times as fast as Protocol Buffer (the implementation open-sourced by Google), but I should have done a better job explaining the context. I posted that comparison to highlight the power of "zero-copy" implemented in MessagePack's C/C++ library, and I should have said so. This has been fixed since then.

Of course, there are cases where other serializers (including Protocol Buffer) outperform MessagePack in terms of serialization/deserialization speed. As we move forward, I intend to create/consolidate a well-annotated, detailed set of spatial/temporal performance benchmarks so that users can have a better understanding of what MessagePack can and cannot offer. (If anyone would like to help us do this, it would be greatly appreciated!)

MessagePack's known use cases

As I mentioned earlier, MessagePack is no panacea. However, I believe it is a great tool for certain tasks, and I would like to share some of MessagePack's use cases to highlight what MessagePack is good for.

  1. Space-efficient storage for Memcache entries (Pinterest).

    At Pinterest, MessagePack is used to serialize a list of 64-bit ID's as a Memcache entry. According to them, MessagePack allows them to compress their 64-bit ID's to 5 bytes on average (5/8 ‾ 62.5%).

  2. For an RPC mechanism (DotCloud's zerorpc).

    DotCloud recently open-sourced zerorpc, an RPC mechanism built on ZeroMQ and MessagePack. This use case is fairly close to my original intent. When one is designing an RPC system, one of the first tasks is to specify and implement a communication protocol. This process can get pretty hairy as you need to worry about a lot of low-level issues like Endian-ness. By using MessagePack, one can skip designing and implementing a communication protocol entirely and accelerate development.

A few words on MessagePack's culture

To wrap things up, let me describe my view of MessagePack's culture.

  1. The MessagePack Project is highly decentralized. I originally came up with the format and implemented it for C/C++/Ruby (and I continue to maintain it for these languages), but each language has its own project leader and develops at its own pace. I find this model to be more effective than having a huge, monolithic body of committers as it keeps each project nimble and autonomous.

  2. The MessagePack Project believes in craftsmanship and expertise. While space efficiency of MessagePack comes from the format, speed is the result of sustained efforts by expert hackers who relentlessly optimized implementations of different programming languages. Over the past three years, I have seen them achieve remarkable performance gains that are only possible with an intimate knowledge of language internals. I have a lot of faith in their abilities to continue making MessagePack better and faster.

While I am proud of all the things we have achieved with everyone involved in the project, MessagePack still has long ways to go. There are a couple of projects in the pipeline. For example, we are working on a website redesign as well as better documentation and repository reorganization.

Thank you for reading this far, and happy hacking!

Sada

@tantalor
Copy link

The numbered list order breaks when you put a paragraph between the list items,

  1. Like this

Like this

  1. Like this

@maxpert
Copy link

maxpert commented Jun 11, 2012

I am quite happy to see msgpack finally getting the limelight. This not only means a better community coming up but also a good competition for serialization protocols.

@jeremyong
Copy link

To solve the numbered list order breaking, indent the in between paragraph with four spaces.

  1. like this

    like this

  2. like this

@frsyuki
Copy link
Author

frsyuki commented Jun 11, 2012

I fixed the numbered list. Thank you!

@kiyoto
Copy link

kiyoto commented Jun 11, 2012

jeremeyong: Do you mean (just gzipping json) v.s. (just msgpacking)? I have tried this with production data (I used to work at a transactional advertisement startup, and we had a lot of metadata persisted as blobs in mysql and memcached) with msgpack-php, and msgpack was consisting faster (1.5 to 2 times at least) for data of any size i tried. Size-wise, (just gzipping json) was smaller though.

@mlawren
Copy link

mlawren commented Jun 11, 2012

Glad to see some repository reorganisation in the works. The "decentralized" part of the MessagePack project (or simply having all languages in the same repository) does present a bit of a confusing view. Several issues I notice have been fixed in the code, but the issue left to langish. It is not always clear who is responsible for which part. I might jump on the mailing list and see what I can do to help.

@frsyuki
Copy link
Author

frsyuki commented Jun 12, 2012

mlawren: I understand the problem of the project operation. Because all implementation projects share some resources (including the git repository), responsibility tends to be unclear.
So we have been working on improving independency of each projects. I'm hearing comments for my plan:
"Dear MessagePack developers, Updates on the MessagePack Project": https://gist.github.com/2892856

@ant6n
Copy link

ant6n commented Feb 22, 2014

Is there any test suite, or maybe just a bunch of usually used files that can be used to test different serialization formats? I've tried to find large examples to compare the performance/size of serialization formats, but only come up with micro-examples.

A test suite could really help put these claims in perspective, and make it easier for potential users to choose a format based on the kind of data they intend to transmit.

@amcgregor
Copy link

A little late to the party, but I'm shopping around for a serialization format for extremely low-level RPC. A useful benchmark for someone in my market is memory efficiency of the packing/unpacking steps. (Stack/heap peak usage, etc.)

@loretoparisi
Copy link

Two things.

  • Just liking the interesting Pinterest post "Memcache Games"

https://github.com/pinterest/pinterest.github.com/blob/master/_posts/2012-01-20-memcache-games.md

There are out there any benchmark of MessagePack implementations? Let's say - if we consider a ObjectiveC client, MPMessagePack and NSJSONSerialization / JSONKit?

In most of cases we are not interested in client using the JSON.stringify / JSON.parse, but native JSON encoders for mobile.

As far as I know interesting benchmarks can be found here
https://github.com/johnezang/JSONKit
http://theburningmonk.com/benchmarks/

Of course it could be a non sense this test, since we are talking about Binary serializers against JSON serializers.

@nipun2
Copy link

nipun2 commented Jul 3, 2015

how can we de-serialize/Unpack messagePack byteArray generated/pack in c# to Javascript/JSON object?

@markand
Copy link

markand commented Dec 4, 2018

Link to graph is broken.

@jobs-git
Copy link

jobs-git commented Feb 4, 2019

If there are criticism that means the code is gaining interest of more people.

@earonesty
Copy link

earonesty commented Oct 10, 2019

Just an FYI: msgpack supports "bytes"... which is a tragically missing feature in JSON requiring awkward workarounds in many languages. Half the time I use msgpack in python/js, it's because of this 1 feature.... nothing else.

@gulbanana
Copy link

I have built an RPC system around MessagePack, and it works well for me.

@willsmith9182
Copy link

I have built an RPC system around MessagePack, and it works well for me.

Me three. MsgPack + RabbitMQ

@amcgregor
Copy link

amcgregor commented Dec 4, 2022

Impressive that this still lives. Thread necro is real.

Me not four. I replaced RabbitMQ, ZeroMQ, PostgreSQL, Redis, and Memcache/Membase on one project (the first one I was brought on to after moving across the country) with MongoDB. Many → one. BSON serialization is not terrible, achieving a hair shy of ten million dRPC calls per second 11 years ago. My comment from 2014 was shopping around for alternative comms/serialization.

Which was not found.

If it's being called out for being bad, it must be good! — @jobs-git

This sentiment is concerning. Rather think it's past time to unsubscribe from this.

@collimarco
Copy link

Is there anything similar to https://jsonlines.org but for msgpack?

Also, is there any way to compress a msgpack file that contains thousands of rows (like a log file) with gzip or zstd? Or should I simply apply the compression/decompression and then use msgpack separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment