Skip to content

Instantly share code, notes, and snippets.

@davidrunger
Last active December 17, 2015 20:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save davidrunger/2e38719d074d84d53519 to your computer and use it in GitHub Desktop.
Save davidrunger/2e38719d074d84d53519 to your computer and use it in GitHub Desktop.
benchmarking Ruby string slicing
require 'benchmark'
# generate some random strings
n = 5_000_000
string_n = ''
string_2n = ''
letters = ('a'..'z').to_a
n.times { string_n << letters.sample }
(n * 2).times { string_2n << letters.sample }
n = 1_000
puts 'Slice half of string:'
Benchmark.bm do |b|
b.report('includes last character; n') do
n.times do
string_n[(string_n.length / 2)..-1]
end
end
b.report('includes last character; 2n') do
n.times do
string_2n[(string_2n.length / 2)..-1]
end
end
b.report("doesn't include last character; n") do
n.times do
string_n[((string_n.length / 2) - 1)..-2]
end
end
b.report("doesn't include last character; 2n") do
n.times do
string_2n[((string_2n.length / 2) - 1)..-2]
end
end
end
Slice half of string:
user system total real
includes last character; n 0.000000 0.000000 0.000000 ( 0.000378)
includes last character; 2n 0.000000 0.000000 0.000000 ( 0.000343)
doesn't include last character; n 0.540000 0.430000 0.970000 ( 1.028334)
doesn't include last character; 2n 1.270000 0.760000 2.030000 ( 2.069830)
@ruggeri
Copy link

ruggeri commented Dec 8, 2015

Cool find!

I compiled Ruby with SHARABLE_MIDDLE_SUBSTRING=1 and doing so suggests that indeed the string is shared (no copy occurs) even when excluding the last character. The first run is with the option on, and the second is my default ruby, where presumably middle sharing is disabled.

/usr/local/bin$ ./ruby ~/string_slicing_benchmark.rb 
Slice half of string:
       user     system      total        real
includes last character; n  0.000000   0.000000   0.000000 (  0.000543)
includes last character; 2n  0.000000   0.000000   0.000000 (  0.000415)
doesn't include last character; n  0.000000   0.000000   0.000000 (  0.000293)
doesn't include last character; 2n  0.000000   0.000000   0.000000 (  0.000300)
/usr/local/bin$ ruby ~/string_slicing_benchmark.rb 
Slice half of string:
       user     system      total        real
includes last character; n  0.000000   0.000000   0.000000 (  0.000338)
includes last character; 2n  0.000000   0.000000   0.000000 (  0.000305)
doesn't include last character; n  0.360000   0.340000   0.700000 (  0.697888)
doesn't include last character; 2n  0.780000   0.870000   1.650000 (  1.656923)

@davidrunger
Copy link
Author

@ruggeri Oh, wow, so awesome! I only just now saw this comment. Super cool. Thanks for sharing those results. I wonder how much of the current Ruby implementation of String is incompatible with SHARABLE_MIDDLE_SUBSTRING=1. In other words, I wonder what sort of bugs would have cropped up if you had continued playing around with your custom build of the Ruby interpreter. Presumably at least some of the current Ruby String implementation is incompatible with sharable middle substrings; otherwise the Ruby gods would have already given us this fast, new behavior as the default.

It's really cool to see a concrete example of one way that an enormous performance gain could/can be squeezed out of Ruby (at least for programs that are doing a lot of string slicing). Thanks again for sharing.

@Haseeb-Qureshi
Copy link

We experimented a little bit, but I suspect that all of the Ruby methods work fine even with this macro switched on. I think the reason why this is switched off by default is not the Ruby standard library, but because of the fact that many Ruby applications are using other C modules or libraries which expect to manipulate null-terminated strings. When they're passed Ruby in-memory Ruby strings that are not properly null-terminated, I'm guessing that there will be some unpredictable errors (especially if those errors only show up for sliced strings that do not include the final character, which might be a nightmare to debug).

We posited that's why it's switched off by default.

@davidrunger
Copy link
Author

Interesting idea, and very plausible. I suspect you might be right. That would indeed be a bear to debug. If correct, it's tantalizing to think about the performance gain that would be possible for Ruby, but for the (relatively small, I would think) set of Ruby applications that need to use C libraries to work with Ruby-sliced strings. On the other hand, I guess if one has to slice a 2.5 megabyte substring 1,000 times just for it to take 1 second on a laptop computer, then maybe there aren't really too many noteworthy, real-world performance gains being left on the table, except for a very small set of apps that are doings a lot of slicing of large strings. Back on the other hand, every little bit helps. And transforming an operation from O(n) to O(1) is just so satisfying. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment