Skip to content

Instantly share code, notes, and snippets.

@Kukunin
Last active December 28, 2023 12:53
Show Gist options
  • Save Kukunin/960ccef0d3c0a2c4b28ff5345911c2a5 to your computer and use it in GitHub Desktop.
Save Kukunin/960ccef0d3c0a2c4b28ff5345911c2a5 to your computer and use it in GitHub Desktop.
Ruby Ractors vs Threads benchmark
require 'benchmark'
require 'etc'
Ractor.new { :warmup } if defined?(Ractor)
def fibonacci(n)
return n if (0..1).include? n
fibonacci(n - 1) + fibonacci(n - 2)
end
NUMBER = 25
TIMES = 400
CPU_CORES = 2
PER_CORE = TIMES / CPU_CORES
raise 'TIMES is not divided evenly by CPU cores' unless TIMES.modulo(CPU_CORES).zero?
Benchmark.bmbm do |x|
x.report('inline') { TIMES.times { fibonacci(NUMBER) } }
x.report('thread inline') do
Thread.new { TIMES.times { fibonacci(NUMBER) } }.join
end
x.report("threads per #{CPU_CORES} cores") do
Array.new(CPU_CORES) do
Thread.new { PER_CORE.times { fibonacci(NUMBER) } }
end.each(&:join)
end
x.report('per-task threads at once') do
Array.new(TIMES) do
Thread.new { fibonacci(NUMBER) }
end.each(&:join)
end
x.report("per-task threads batches per #{CPU_CORES} cores") do
CPU_CORES.times do
Array.new(PER_CORE) do
Thread.new { fibonacci(NUMBER) }
end.each(&:join)
end
end
if defined?(Ractor)
x.report('ractor inline') do
Ractor.new { TIMES.times { fibonacci(NUMBER) } }.take
end
x.report("ractors per #{CPU_CORES} cores") do
Array.new(CPU_CORES) do
Ractor.new { PER_CORE.times { fibonacci(NUMBER) } }
end.each(&:take)
end
x.report('per-task ractors at once') do
Array.new(TIMES) do
Ractor.new { fibonacci(NUMBER) }
end.each(&:take)
end
x.report("per-task ractor batches per #{CPU_CORES} cores") do
CPU_CORES.times do
Array.new(PER_CORE) do
Ractor.new { fibonacci(NUMBER) }
end.each(&:take)
end
end
end
end
@Kukunin
Copy link
Author

Kukunin commented Apr 25, 2021

Ruby v3.0.1

Ruby Ractors is a new way to get true parallelism. It's in its very early stage but looks very promising.

One of the caveats is that each Ractor spawns an OS thread for it, rather than having being dispatched on a fixed set of workers (as Erlang VM does, for example). It's also not an event-loop (like Node.JS). From the (https://github.com/ruby/ruby/blob/master/doc/ractor.md#multiple-ractors-in-an-interpreter-process)[documentation]:

The overhead of creating a Ractor is similar to overhead of one Thread creation.

Results are on my Thinkpad x230 (Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz):

4 cores:

                                           user     system      total        real
inline                                12.608468   0.029125  12.637593 ( 12.688897)
thread inline                         11.750990   0.000048  11.751038 ( 11.774610)
threads per 4 cores                   12.099049   0.009560  12.108609 ( 12.233338)
per-task threads at once              11.968960   0.039850  12.008810 ( 12.051534)
per-task threads batches per 4 cores  11.934967   0.019740  11.954707 ( 12.027466)
ractor inline                         11.825349   0.003244  11.828593 ( 11.846111)
ractors per 4 cores                   28.544691   0.045398  28.590089 (  7.787499)
per-task ractors at once              29.194077   0.062446  29.256523 (  7.671101)
per-task ractor batches per 4 cores   29.407297   0.048708  29.456005 (  7.785071)

2 cores:

                                           user     system      total        real
inline                                12.052309   0.016563  12.068872 ( 12.129856)
thread inline                         12.707273   0.009835  12.717108 ( 12.778038)
threads per 2 cores                   12.111228   0.003231  12.114459 ( 12.180608)
per-task threads at once              11.984674   0.029761  12.014435 ( 12.071126)
per-task threads batches per 2 cores  11.937880   0.055926  11.993806 ( 12.045331)
ractor inline                         12.367448   0.003263  12.370711 ( 12.414733)
ractors per 2 cores                   14.228111   0.006461  14.234572 (  7.212094)
per-task ractors at once              29.542887   0.059049  29.601936 (  7.719956)
per-task ractor batches per 2 cores   30.161583   0.088713  30.250296 (  8.052886)

Also, just for clarify, here are benchmarks on JRuby (jruby 9.2.17.0 (2.5.8)) with true parallelism:

4 cores:

                                           user     system      total        real
inline                                 7.580000   0.010000   7.590000 (  7.602248)
thread inline                          7.310000   0.000000   7.310000 (  7.323234)
threads per 4 cores                   16.960000   0.050000  17.010000 (  4.624139)
per-task threads at once              16.650000   0.150000  16.800000 (  4.603958)
per-task threads batches per 4 cores  16.940000   0.110000  17.050000 (  4.699659)

2 cores:

                                           user     system      total        real
inline                                 7.270000   0.000000   7.270000 (  7.269524)
thread inline                          7.530000   0.010000   7.540000 (  7.547916)
threads per 2 cores                    8.970000   0.000000   8.970000 (  4.560888)
per-task threads at once              16.970000   0.120000  17.090000 (  4.707308)
per-task threads batches per 2 cores  17.180000   0.120000  17.300000 (  4.722010)

Conclusions that I can draw:

  • Ractor provides true parallelism as Threads.
  • You can notice the effect of HyperThreading - while there are 4 logical cores on my laptop, there are only physically two. That's why the difference between 4 and 2 threads is not so big
  • JRuby operates faster on the CPU-intensive tasks (not relevant to typical Rails projects)

@Kukunin
Copy link
Author

Kukunin commented Apr 25, 2021

The easiest way for you to understand ractors is that Ractor.new == Thread.new, but running with more limitations (no shared objects) and in a truly parallel manner. Not more (at least for now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment