Skip to content

Instantly share code, notes, and snippets.

@evotopid
Last active August 29, 2015 14:06
Show Gist options
  • Save evotopid/876fcbf2724c8876a454 to your computer and use it in GitHub Desktop.
Save evotopid/876fcbf2724c8876a454 to your computer and use it in GitHub Desktop.
Ruby Benchmark: String#bytesize vs String#size
# String#bytesize vs String#size
require 'benchmark'
N=20_000_000
short_utf8_string = "ääää"
long_utf8_string = "äääääääääääääääääääääääääääääääääääääääääääääääääääääääää"
short_ascii_string = "aaaa"
long_ascii_string = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
short_mixed_string = "aäaä"
long_mixed_string = "aäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäa"
Benchmark.bmbm do |x|
x.report("utf8.short.size"){N.times{ short_utf8_string.size }}
x.report("utf8.short.bytesize"){N.times{ short_utf8_string.bytesize }}
x.report("utf8.long.size"){N.times{ long_utf8_string.size }}
x.report("utf8.long.bytesize"){N.times{ long_utf8_string.bytesize }}
x.report("ascii.short.size"){N.times{ short_ascii_string.size }}
x.report("ascii.short.bytesize"){N.times{ short_ascii_string.bytesize }}
x.report("ascii.long.size"){N.times{ long_ascii_string.size }}
x.report("ascii.long.bytesize"){N.times{ long_ascii_string.bytesize }}
x.report("mixed.short.size"){N.times{ short_mixed_string.size }}
x.report("mixed.short.bytesize"){N.times{ short_mixed_string.bytesize }}
x.report("mixed.long.size"){N.times{ long_mixed_string.size }}
x.report("mixed.long.bytesize"){N.times{ long_mixed_string.bytesize }}
end
Conclusion:
- Don't call size on a string which might contain utf8 characters.
- You'll be probably fine using bytesize on everything, but if you
know there are only going to be ascii characters you may be even
better off just using size.
Output:
user system total real
utf8.short.size 2.910000 0.010000 2.920000 ( 2.916396)
utf8.short.bytesize 2.530000 0.000000 2.530000 ( 2.539148)
utf8.long.size 3.140000 0.010000 3.150000 ( 3.138874)
utf8.long.bytesize 2.270000 0.000000 2.270000 ( 2.267662)
ascii.short.size 2.030000 0.000000 2.030000 ( 2.032265)
ascii.short.bytesize 2.460000 0.000000 2.460000 ( 2.464350)
ascii.long.size 2.200000 0.000000 2.200000 ( 2.201368)
ascii.long.bytesize 2.470000 0.010000 2.480000 ( 2.480259)
mixed.short.size 2.780000 0.000000 2.780000 ( 2.789172)
mixed.short.bytesize 2.920000 0.010000 2.930000 ( 2.928942)
mixed.long.size 3.340000 0.000000 3.340000 ( 3.347672)
mixed.long.bytesize 2.610000 0.000000 2.610000 ( 2.609181)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment