Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
logstash, why jruby?

Long story, short: I'm totally open to supporting more rubies if possible. Details follow.

Related issue: http://code.google.com/p/logstash/issues/detail?id=37

Summary:

  • core and stdlib ruby changes violently and without notice and without backwards compatibility. I want nothing of that.
  • need a cross-ruby date library that isn't part of stdlib (see previous point) and is also good.
  • need an easy way to use multiple cpus that is cross-ruby (threads are not it)

Details:

Mainly, the ruby core/stdlib API changes between ruby 1.8 and 1.9 are very poorly done. Some are documented while others are not. Some changes make sense, while others do not. That was the main reason for originally deciding to use jruby.

JRuby lets me use Java libraries in place of crappy ruby ones. For example, there are some undocumented changes to datetime between ruby 1.8 and 1.9, so the logstash 'date' filter uses Joda-Time instead of ruby's stdlib datetime.

Further, JRuby's performance options are currently much better than MRI or YARV. At worst, during benchmarks, JRuby performs on-par with YARV 1.9.2, but since JRuby has actual threads, we can use more cpus more easily, and pretty much beat plain ruby.

Additionally, java debugging tools are quite excellent. jvisualvm, jstack, etc.

Lastly, I can very easily ship a single 'executable' that should work on most platforms with java - see the monolithic jar logstash releases. I can't easily do this with other rubies.

There are some parts of logstash that explicitly require java currently - the date filter, elasticsearch support, and thread support.

The code is also only tested under ruby 1.8.7, and performance difference between JRuby and MRI 1.8.7 is pretty huge. It might get better if you try REE, but that's not really the same ruby everyone's going to have.

The date filter can be made ruby-friendly if someone write a non-crappy date parsing library in ruby. The ones that ship with stdlib are not fast or safe to use (ruby core changes it wildly without notice).

ElasticSearch support is much faster in jruby/jvm than it was using pure ruby, because we are now using the java APi for elasticsearch. Previously we were using the HTTP/REST api using EventMachine and em-http-request, which has much lower throughput.

Lastly, jruby supports proper threading so logstash can process events on multiple CPU cores. MRI and YARV Ruby cannot do this without forking and message passing.

The downsides to using JRuby are possibly higher in-memory footprint.

Again, I'm open to supporting non-JRuby rubies, but there needs to be answers for some of the above.

@mhvenkat

This comment has been minimized.

Show comment
Hide comment
@mhvenkat

mhvenkat May 19, 2014

Thanks for an excellent blog on choosing jruby !!
I would like to pose a question : what are the limitations in implementing Logstash as a pure java solution?

Thanks for an excellent blog on choosing jruby !!
I would like to pose a question : what are the limitations in implementing Logstash as a pure java solution?

@jillesvangurp

This comment has been minimized.

Show comment
Hide comment
@jillesvangurp

jillesvangurp May 20, 2014

With you on most of this; using jruby as well. One issue that you don't address is memory usage. Currently this is not yet an issue for me but I could imagine using cheap vms where sacrificing a quarter or more of RAM to logstash is a non starter. In my current setup, I could see myself needing a replacement for the bit of logstash plumbing that I currently have that is responsible for gathering collectd and logs from various files and delivering that to elasticsearch. I don't see why that should take more than a few MB of RAM instead of 0.5GB. For now it is a fair compromise but it does mean, I have to reconsider my architecture when we move to using a lot of cheap amazon boxes for our frontend.

With you on most of this; using jruby as well. One issue that you don't address is memory usage. Currently this is not yet an issue for me but I could imagine using cheap vms where sacrificing a quarter or more of RAM to logstash is a non starter. In my current setup, I could see myself needing a replacement for the bit of logstash plumbing that I currently have that is responsible for gathering collectd and logs from various files and delivering that to elasticsearch. I don't see why that should take more than a few MB of RAM instead of 0.5GB. For now it is a fair compromise but it does mean, I have to reconsider my architecture when we move to using a lot of cheap amazon boxes for our frontend.

@UnitedMarsupials

This comment has been minimized.

Show comment
Hide comment
@UnitedMarsupials

UnitedMarsupials Jul 7, 2014

Sir! Thank you very much for publishing your reasoning. I too was wondering, why would anybody use JRuby, if natively-compiled Ruby is available. This post explains it.

However, I take an issue with one of your points:

Lastly, I can very easily ship a single 'executable' that should work on most platforms with java - see the monolithic jar logstash releases. I can't easily do this with other rubies.

I wish, you wouldn't do that -- providing other people's code, that is quite likely to already exist on the system, or be independently available. Thankfully, you don't bundle your own Java (some people do!), but you should not be providing your own JRuby JAR, nor the log4j, nor anything else, that's freely available from the 3rd-parties. Simply list the requirements (along with versions, if important) -- the way you require Java -- and have the packagers (be they FreeBSD ports-maintainers or RedHat RPM authors, or what have you) create the proper port/package for their respective OS.

By bundling the 3rd-party JARs and Ruby libraries, you simply increase the size of your distribution -- and only for the sake of "out of the box" readiness... The readiness, that, in my opinion, is rather superficial. One may use it as a proof of concept, but for a deployment across multiple systems, one would (or should!) create a package anyway...

Sir! Thank you very much for publishing your reasoning. I too was wondering, why would anybody use JRuby, if natively-compiled Ruby is available. This post explains it.

However, I take an issue with one of your points:

Lastly, I can very easily ship a single 'executable' that should work on most platforms with java - see the monolithic jar logstash releases. I can't easily do this with other rubies.

I wish, you wouldn't do that -- providing other people's code, that is quite likely to already exist on the system, or be independently available. Thankfully, you don't bundle your own Java (some people do!), but you should not be providing your own JRuby JAR, nor the log4j, nor anything else, that's freely available from the 3rd-parties. Simply list the requirements (along with versions, if important) -- the way you require Java -- and have the packagers (be they FreeBSD ports-maintainers or RedHat RPM authors, or what have you) create the proper port/package for their respective OS.

By bundling the 3rd-party JARs and Ruby libraries, you simply increase the size of your distribution -- and only for the sake of "out of the box" readiness... The readiness, that, in my opinion, is rather superficial. One may use it as a proof of concept, but for a deployment across multiple systems, one would (or should!) create a package anyway...

@mhvenkat

This comment has been minimized.

Show comment
Hide comment
@mhvenkat

mhvenkat Jul 8, 2014

While there are excellent justification to use JRuby, it does not have Apache 2 license. It recently switched to EPL. See http://mmilinkov.wordpress.com/2013/02/13/jruby-moves-to-the-epl/
This has a limitation for a wider adoption of logstash in large companies.
I would recommend that logstash be re-implemented as a pure java solution (a.k.a Apache Flume).

mhvenkat commented Jul 8, 2014

While there are excellent justification to use JRuby, it does not have Apache 2 license. It recently switched to EPL. See http://mmilinkov.wordpress.com/2013/02/13/jruby-moves-to-the-epl/
This has a limitation for a wider adoption of logstash in large companies.
I would recommend that logstash be re-implemented as a pure java solution (a.k.a Apache Flume).

@shurane

This comment has been minimized.

Show comment
Hide comment
@shurane

shurane Jul 31, 2014

Really, all I want is the fast prototyping that plain Ruby offers. If logstash supported SIGUSR1 to reload configs and a had clear instructions to start the REPL to test features , I would be a very happy person. Waiting 10 seconds to start logstash, even with drip, is very counter-productive to writing out a config.

shurane commented Jul 31, 2014

Really, all I want is the fast prototyping that plain Ruby offers. If logstash supported SIGUSR1 to reload configs and a had clear instructions to start the REPL to test features , I would be a very happy person. Waiting 10 seconds to start logstash, even with drip, is very counter-productive to writing out a config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment