Update the google/protobuf 3.0-alpha library in 2 concrete areas, and 1 opinionated way:
- remove inconsistencies between the c- and java-backed ruby libraries. Most inconsistencies are bugs in one implementation and not the other, but there was at least one helper method which existed in one and not the other.
- Improve the test suite. Since the bulk of the logic exists in lower-level C and java libraries, items which usually aren't tested because they come from the standard lib, like
#push
, should be tested as this library uses plenty of low-level handcrafted code - Opinionated changes -- make it feel more ruby-like. This means being able to accept string or symbols in initializers, returning
nil
instead of raising an exception when an array index out-of-bounds occurs, have common names and aliases for methods (e.g.#size
and#length
), and havingRepeatedFields
quack like an array, andMap
quack like a hash. There may be other examples.
First off, I'm very excited to see this library take shape. Having a canonical protobuf library is a good thing. We use python, go, ruby (MRI & jruby), and java in our stack, and having a common library that has similar behavior and functionality across languages is toast-worthy.
-
Performance optimizations: Ruby, historically, does not put performance as a first-level concern and rather puts its emphasis elsewhere. Many of us who use ruby on a regular basis are fine with this trade off. The google/protobuf library has put an unofficial stake in the ground on performance. The challenge will be keeping the app maintainable and 'ruby-like' while keeping performance as a top-level concern.
-
There are three levels of code: C, Java, and Ruby. In an ideal world, the main library will be ruby with optimized and optional sub-components in Java and C. However, moving even basic functionality to ruby and out of native, like
#to_h
which iterates through the fields and adds them to a hash, can be upwards of 9x slower gist.
- make the library exhibit the same behavior under both jRuby and C
- hook up the ruby library into travis and test Ruby 2.0.0, 2.1.0, 2.2.0, jruby-1.7.19, and jruby-9.0.0.0.pre2
- significantly beef up the test suite, and move it from being a happy-path focused suite to the conanical definition for the ruby library. And because much of the functionality is outside the ruby core and stdlib, unit testing of array and map-like functionality should be deeply tested
- evaluate three approaches to make the library behave more like standard Array and Map objects:
- move common function functionality to ruby.
- write helper methods that are 100% API-compatible with Array and Maps
- in the Java/C libraries, move to using Hash and Array objects that can then be exposed into the Ruby lib and then delegated against.
Summary I think the choice for making it more 'ruby-like' will either be:
- Do it up to, but not including ducktyping
RepeatedFields
andMap
, or - add the functionality in a ruby module that is shared across both C and Java.
I am starting to understand why ducktyping RepeatedFields
and Map
is a challenging target, as it may add a lot of work. But I also feel that not having it behave similarly to existing protobuf libs will be a significant pain point. I'm guessing, so having a much higher degree of confidence before taking on the burden of being 100% Array and Map api compatible, would be great.
- I'm fixing inconsistencies as I find them, though without a comprehensive test suite I'm sure I'm missing some
- I experimented with a few different approaches for duck-typing:
- moving common functionality to ruby: I got this to work quite well, but the performance was noticeably bad compared to the performance as it exists today. We're talking 10x faster.
- I explored writing a ruby-only version, which would replace the jruby implementation and make the C-exts as optional. wow, was this painful, and it quickly becomes unattractive, if for no other reason than a full protobuf encoder/decoder would need to be maintained in parallel with the primary c-lib. That seems untenable to me.
- I modified RepeatedFields.c to populate a rb_ary object in parallel. I looked at replacing the internal data structure with a rb_ary, but that was a non-starter as it uses upd, which is appropriate. Also, there are performance enhancements, such as memcpy that are not available if rb_ary becomes the primary data model. Having rb_ary as a secondary data model that mirrors the internal structures works, and works well, BUT I strongly suspect it is quite efficient. I stopped before doing detail benchmarking or analysis of memory usage, so I don't know if there will be a lot of mem copying, pointers, references, etc... And it still runs into the problem of #! methods... keeping a secondary
rb_ary
will make any non-modifying array behavior much faster than to keep calling#to_ary
, but it won't help any modification methods. - I started to add methods that would allow 100% api-compatibility with Array and Map. This is a combination of C/Java functionality and helper methods in ruby. The risks are keeping this inline with different versions of ruby, and the #! methods. The #! can be implemented in ruby, but they will really not be efficient. I don't know what 'good enough' is. And, test cases will be needed to double check all this functionality; functionality which is not a part of the core protobuf lib
Thanks for this summary and braindump of what you're thinking and hoping to do! I like the general direction, and overall we've been really happy to have community feedback like this and get improvements.
A few more specific thoughts:
inspect
and the like is nice.nil
rather than exception on out-of-bounds: I am probably much less deep into the ways of Ruby than you are, so I may be missing some subtlety that makes this a hard choice, but honestly it seems perfectly reasonable to me to go withnil
. Throwing an exception on out-of-bounds was the default "safe" behavior we came to. It's technically a breaking API change, but it's probably unlikely that someone is relying on that exception yet, as we're still alpha.int64_t
/int32_t
).