skippy/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
    tl;dr

Update the google/protobuf 3.0-alpha library in 2 concrete areas, and 1 opinionated way:

remove inconsistencies between the c- and java-backed ruby libraries.  Most inconsistencies are bugs in one implementation and not the other, but there was at least one helper method which existed in one and not the other.
Improve the test suite.  Since the bulk of the logic exists in lower-level C and java libraries, items which usually aren't tested because they come from the standard lib, like #push, should be tested as this library uses plenty of low-level handcrafted code
Opinionated changes -- make it feel more ruby-like. This means being able to accept string or symbols in initializers, returning nil instead of raising an exception when an array index out-of-bounds occurs, have common names and aliases for methods (e.g. #size and #length), and having RepeatedFields quack like an array, and Map quack like a hash.  There may be other examples.

Disclaimer

First off, I'm very excited to see this library take shape.  Having a canonical protobuf library is a good thing.  We use python, go, ruby (MRI & jruby), and java in our stack, and having a common library that has similar behavior and functionality across languages is toast-worthy.
Challenge:


Performance optimizations: Ruby, historically, does not put performance as a first-level concern and rather puts its emphasis elsewhere.  Many of us who use ruby on a regular basis are fine with this trade off.  The google/protobuf library has put an unofficial stake in the ground on performance.  The challenge will be keeping the app maintainable and 'ruby-like' while keeping performance as a top-level concern.


There are three levels of code: C, Java, and Ruby.  In an ideal world, the main library will be ruby with optimized and optional sub-components in Java and C.  However, moving even basic functionality to ruby and out of native, like #to_h which iterates through the fields and adds them to a hash, can be upwards of 9x slower gist.


Proposal


make the library exhibit the same behavior under both jRuby and C
hook up the ruby library into travis and test Ruby 2.0.0, 2.1.0, 2.2.0, jruby-1.7.19, and jruby-9.0.0.0.pre2
significantly beef up the test suite, and move it from being a happy-path focused suite to the conanical definition for the ruby library.  And because much of the functionality is outside the ruby core and stdlib, unit testing of array and map-like functionality should be deeply tested
evaluate three approaches to make the library behave more like standard Array and Map objects:

move common function functionality to ruby.
write helper methods that are 100% API-compatible with Array and Maps
in the Java/C libraries, move to using Hash and Array objects that can then be exposed into the Ruby lib and then delegated against.


Summary I think the choice for making it more 'ruby-like' will either be:

Do it up to, but not including ducktyping RepeatedFields and Map, or
add the functionality in a ruby module that is shared across both C and Java.

I am starting to understand why ducktyping RepeatedFields and Map is a challenging target, as it may add a lot of work.  But I also feel that not having it behave similarly to existing protobuf libs will be a significant pain point.  I'm guessing, so having a much higher degree of confidence before taking on the burden of being 100% Array and Map api compatible, would be great.
Work done to date


I'm fixing inconsistencies as I find them, though without a comprehensive test suite I'm sure I'm missing some
I experimented with a few different approaches for duck-typing:

moving common functionality to ruby: I got this to work quite well, but the performance was noticeably bad compared to the performance as it exists today.  We're talking 10x faster.
I explored writing a ruby-only version, which would replace the jruby implementation and make the C-exts as optional.  wow, was this painful, and it quickly becomes unattractive, if for no other reason than a full protobuf encoder/decoder would need to be maintained in parallel with the primary c-lib.  That seems untenable to me.
I modified RepeatedFields.c to populate a rb_ary object in parallel.  I looked at replacing the internal data structure with a rb_ary, but that was a non-starter as it uses upd, which is appropriate.  Also, there are performance enhancements, such as memcpy that are not available if rb_ary becomes the primary data model.  Having rb_ary as a secondary data model that mirrors the internal structures works, and works well, BUT I strongly suspect it is quite efficient.  I stopped before doing detail benchmarking or analysis of memory usage, so I don't know if there will be a lot of mem copying, pointers, references, etc... And it still runs into the problem of #! methods... keeping a secondary rb_ary will make any non-modifying array behavior much faster than to keep calling #to_ary, but it won't help any modification methods.
I started to add methods that would allow 100% api-compatibility with Array and Map.  This is a combination of C/Java functionality and helper methods in ruby.  The risks are keeping this inline with different versions of ruby, and the #! methods.  The #! can be implemented in ruby, but they will really not be efficient.  I don't know what 'good enough' is. And, test cases will be needed to double check all this functionality; functionality which is not a part of the core protobuf lib