Skip to content

Instantly share code, notes, and snippets.

@frsyuki
Last active December 10, 2015 02:58
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save frsyuki/4371201 to your computer and use it in GitHub Desktop.
Save frsyuki/4371201 to your computer and use it in GitHub Desktop.
MessagePack for Ruby optimized for Rubinius

MessagePack for Ruby optimized for Rubinius

I released MessagePack for Ruby 0.5.1. This version is optimized for Rubinius 2.0.0rc1 as well as CRuby 1.9. I improved ruby-serializers benchmark so that it generates simple graphical reports using Google Chart Tools API. (See MessagePack for Ruby version 5 for optimizations I implemented for CRuby)

Here is the result with Rubinius 2.0.0rc1:

Twitter

Twitter - Rubinius

Library Serialize Deserialize Size
  json|  249.28 sec|     58.38 sec|   3946 bytes
  yajl|   29.42 sec|     98.49 sec|   3946 bytes
    oj|   26.47 sec|     63.08 sec|   3946 bytes

msgpack| 19.15 sec| 30.09 sec| 3390 bytes bson| 37.23 sec| 97.01 sec| 3682 bytes

Instruments, a software bundled with XCode, is very useful for performance optimization. According its CPU profiling result, rb_hash_foreach function call consumed lots of CPU time.

So I optimized this function call, and the most time consuming procedure shifted to memory allocation/releasing (as following screenshot shows). I'll need some fundamental changes to improve it.

Instruments

By the way, MRI-compatible C-API implementation of Rubinius is awesome. I needed few changes for Rubinius.

Geo

Geo - Rubinius

Library Serialize Deserialize Size
  json|  647.87 sec|     66.53 sec|   2451 bytes
  yajl|   55.51 sec|    107.27 sec|   2451 bytes
    oj|   83.15 sec|     67.57 sec|   2451 bytes

msgpack| 25.40 sec| 41.65 sec| 2411 bytes bson| 283.76 sec| 99.90 sec| 3667 bytes

Rubinius implements some core libraries in Ruby. It's difficult to reduce function calling overhead for extensions written in C (without tricky technique?) in contrast to CRuby. I guess that's why performance is worse than the benchmark with CRuby 1.9. I once prototyped a pure-ruby implementation but its performance was much worse than C extensions (a hundred times slower). Even though C extensions grab GVL and cause concurrency degradation, I think C extension is better to implement serialization libraries.

Image

Image - Rubinius

Library Serialize Deserialize Size
  json|   18.75 sec|     12.26 sec|   8278 bytes
  yajl|   15.13 sec|     14.53 sec|   8278 bytes
    oj|    7.79 sec|      8.32 sec|   8278 bytes

msgpack| 5.34 sec| 1.72 sec| 6008 bytes bson| 12.82 sec| 5.51 sec| 8153 bytes protobuf| 11.94 sec| 9.27 sec| 5988 bytes

Serializing/deserializing large binary object consumes most of time to copy data, while function calls affect performance with small objects. I tried to optimize it but I gave up this time. I need to understand architecture of Rubinius furthermore to implement zero-copy. So advantage of msgpack at this benchmark mostly comes from a shared memory pool.

I used following environment for the benchmark:

  • OS: Mac OS X 10.8.2
  • CPU: Intel COre i7 2.7GHz
  • Memory: 16GB 1600MHz DDR3
  • rubinius 2.0.0rc1 (1.9.3 a06055d7 2012-11-02 JI) [x86_64-apple-darwin12.2.0]
  • gems
    • json 1.7.5
    • yajl-ruby 1.1.0
    • oj 1.4.7
    • msgpack 0.5.1
    • bson_ext 1.8.0
    • protobuf 2.5.5
@dbussink
Copy link

Awesome! If you have anything that we should improve in Rubinius let us know! I'm sure there are plenty of places in the C-API that could be improved performance wise. Also C extensions don't have a global lock anymore in Rubinius, so they should run concurrently.

Regarding a pure Ruby implementation, this is something we would love to investigate performance wise. Our goals are that Ruby is a viable option, also for these situations. It probably won't happen overnight, but having benchmarks to work with would be great.

Zero copy is more tricky, we do have RSTRING_NOT_MODIFIED that can be defined if a C extension doesn't modify the data that RSTRING_PTR() points at. This can result in nice performance improvements, but does have the restriction then that you can't modify that string data directly.

Don't hesitate to contact us to see how we can improve things :). Usually a good place is to ask around on IRC at irc://freenode.net/rubinius.

@frsyuki
Copy link
Author

frsyuki commented Dec 25, 2012

Thank you!

I didn't know even C extensions don't have a global lock in Rubinius. It's really awesome. But I guess C extensions have to care about thread safety. I'll consider to put memory pools in thread local storage.

RSTRING_NOT_MODIFIED should work fine because msgpack-ruby doesn't modify data came from String objects. Instead, it allocates and manages buffers and copy them to create Strings. So it will improve performance significantly. Great.
Regarding zero copy, msgpack-ruby just uses rb_str_substr function to slice a string because CRuby (and JRuby) optimizes that function not to copy the string. I think VM core can omit the copying if RSTRING_NOT_MODIFIED is set because it's guaranteed that the strings are not modified. I definitely try the flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment