Skip to content

Instantly share code, notes, and snippets.

@bokmann
Last active August 31, 2023 07:32
Show Gist options
  • Star 40 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save bokmann/6652776 to your computer and use it in GitHub Desktop.
Save bokmann/6652776 to your computer and use it in GitHub Desktop.
brief summary of massive performance improvements with JRuby
# Thee will be more information here when I share the entire problem space I'm working on, but
# in short, this is preview material for my second talk in a series called "What Computer Scientists Know".
# The first talk is on recursion, and goes through several examples., leading up to a problem based
# on a simple puzzle that initial estimates based on performance of a previous puzzle would take years
# to solve on modern computers with the techniques shown in Ruby. That sets the stage for improving the
# performance of that problem with threading, concurrency, and related tuning.
#
# The second talk is on threading and concurrency, touching on algorithmic performance as well.
# Using some knowledge of the problem (board symmetry, illegal moves, etc), we reduce the problem space
# to about .5% of what we initially thought it was. Still, the initial single threaded solution took more
# than 24 hours to find the first solution. Watching that run, we can see with the tool 'htop' that only
# one core on my 8 core laptop was busy. So we introduce threading. Still, with Ruby 1.9.3/2.0, we can't get
# much better performance with threading, using a breadth-first search to decompose the problem into 40
# sub-problems nd giving each of those sub-problems a thread. (This is because of Ruby's GIL).
#
# Switching to JRuby gave us a huge threading performance, but knowing what I know about Ruby and Java, I
# suspected the implementation wasn't thread-safe with Ruby's built-in datatypes. Sure enough, I can prove
# this with a subset of the problem; the JRuby version, while faster, found fewer solutions. How fast it
# wrong?
#
# I switched to using native Java collections, and started getting ConcurrentModificationExceptions. I
# made these concurrent, but that reduced performance because the locking on the threadsafe versions of
# Java collections isn't much better than Ruby's GIL.
#
# So I got out my trusty copy of Brian Goetz's "Java Concurrency in Practice", and looked up the use of
# some of the java.util.concurrent classes. In particular, I had an array of hashes that was created
# in ruby with this line of code:
hash_collection = []
#I replaced that with:
java_import "java.util.concurrent.ConcurrentSkipListSet"
hash_collection = ConcurrentSkipListSet.new
# no other code changed, and there was a huge improvement (on the order of dozens of times. I'll have
# real numbers and hardware specs in a final version of this analysis).
# I also had a simple count of game moves processed - the worst kind of shared state - an integer that
# every thread was constantly incrementing. I would get huge improvements by taking it out, but then
# I had no visibility into how fast stuff was processing. I replaced an integer that was created in
# Ruby with this line:
@@count = 0
# with this:
java_import "java.util.concurrent.atomic.AtomicInteger"
@@count = AtomicInteger.new
# and the code that I incremented that from this:
@@count = @@count + 1
# with this:
@@count.incrementAndGet
# and that got me about 80% of the performance as not having the integer at all.
# finally, I had been creating about 40 threads being executed on a machine with 16 cores.
# In an effort to further reduce the problem space as well as create more threads, I iterated
# one more time with the breadth first search and had about 2700 initial states I could explore
# in parallel. I did the naive thing, and just created that many threads:
threads = []
things.each do |thing|
threads << Thread.new do
thing.solve
end
end
# and that was BLAZINGLY fast, but ran out of memory on my machine after several hours
# (16 gigabytes!) That many threads with recursion, and the stackframes just got to be
# too much.
# So I researched thread pools, so that I could just keep the jobs in a holding state until
# a thread was available to take on the job, then I could adjust the number of threads based
# on the number of cores, independently from the number of jobs. That led to this code,
# based on java.util.concurrent.Executors:
java_import java.util.concurrent.Executors
executor = Executors.newFixedThreadPool(64)
things.each do |thing|
executor.submit do
thing.solve
end
end
# end result is blazingly fast... tens of thousands of times faster than Ruby 1.9.3 with threads.
# I'll have real metrics when this is official but the ruby solution toiled for 24 hours, only
# finding 1 solution; the initial JRuby solution found 6 solutions in 24 hours; this most recent
# solution has found over 270 solutions in two hours, and is evaluating over 40 million game
# positions a minute:
irb(main):287:0> x = Board.evaluated; sleep 60; puts Board.evaluated - x
40488395
# the 16 cores on the fastest machine I have access to are pegged with this solution, as shown by
# this output from htop:
# 1 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.7%] 5 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.1%] 9 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||95.6%] 13 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.7%]
# 2 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.7%] 6 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||97.5%] 10 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.1%] 14 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||96.2%]
# 3 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.7%] 7 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||97.5%] 11 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.7%] 15 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||96.2%]
# 4 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||95.6%] 8 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||97.5%] 12 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||98.7%] 16 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||96.8%]
# Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 5122/11901MB] Tasks: 53, 98 thr; 65 running
# Swp[ 0/4095MB] Load average: 56.92 56.51 45.51
#
# Finally, I was using an array (and .duping it) to keep track of the evaluated moves while recursing
# down the game tree. I switched this to using my own immutable linked list (since previous game moves
# don't change), removed the dup, and got another 15% performance improvement.
#
# There are potentially two other changes I can make to get similar performance improvements.
#
# Again, I realize this data in its current form is largly anecdotal; when this is cleaned up for a real
# presentation Ill have several code examples and their evolution, and performance characteristics on
# a few different machines.
#
# most of the small improvements (linked list, for example), would apply to MRI Ruby as well, but the
# two most massive improvements come from real threads and concurrent data structures. Thats where
# the JVM underneath JRuby shows how incredible it is.
@bwalsh
Copy link

bwalsh commented Oct 3, 2013

nice work

@flajann2
Copy link

At some point I should try running RubyNEAT on JRuby. The thought of being able to leverage cores for NEAT is very alluring. I'll just have to make it JRuby-aware. Other than that, not much should have to change. Hmmmmm.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment