Skip to content

Instantly share code, notes, and snippets.

@frsyuki
Last active June 2, 2016 08:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save frsyuki/956850ba7696810904225305c42e9379 to your computer and use it in GitHub Desktop.
Save frsyuki/956850ba7696810904225305c42e9379 to your computer and use it in GitHub Desktop.
Idea to optimize EventStream of Fluentd using MessagePack operators

Problems

  • Deserialization of Ruby objects is slow.
  • Filter plugins are slow.

Optimimzation of a record object

A Record class is implemented in C. It has Ruby objects as a serialized binary using MessagePack. When a field is accessed (e.g. record['key']), it returns a deserialized object. This deserializes only necessary value. Thus, if only a few fields are accessed, this improves performance.

For farther optimization, when a field-access returns a String, it uses copy-on-write as following:

  • A Record stores msgpack binary using String object in memory. The String object should be freezed.
  • When a field is accessed and the value is a string, the Record calls String#substr to return subset of the entire binary. This is copy-on-write in CRuby. Thus, as long as following process doesn't modify the object, it doesn't copy data.

Optimization of a record stream

A Page class is implemented in C. It has sequence of Records (or maybe map of String to Record) as a serialized binary using MessagePack. Creating a Page from a msgpack binary is fast because it doesn't have to deserialize objects.

A Page also has list of Proc objects to convert records. Those Proc objects are called only when records are used.

record1 = {'a': 1, 'b': 2}
record2 = {'c': 1, 'e': 2}
page = Page.build([record1, record2])   # internally, Page has msgpack binary
p page[1]  #=> {'c': 1, 'e': 2}   # this deserializes record2 without deserializing record1
page.map! {|r| {'modified': 1} }  # this proc is not called here
p page[1]  #=> {'modified': 1}    # above proc is called now. this deserializing record2 without deserializing record1, and calls above proc, and returned applied result
page.map! {|r| {'modified': 1} }  # applying map! appends proc to the internal list

More optimization using special lookup methods

record1 = {'a': 1, 'b': {'c': 1}}
page = Page.build([record1, record2])

p page.dig('b', 'c')  #=> this deserializing 1 without deserializing Hash

More optimization using special modify methods

record1 = {'a': 1, 'b': {'c': 1}}
page = Page.build([record1, record2])

page.dig!('b', 'c') {|v| v + 1 }        #=> this proc is not called here
p page[0]  #=> {'a': 1, 'b': {'c': 2}}  # above proc is called here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment