Skip to content

Instantly share code, notes, and snippets.

@guilhermesimoes
Last active September 11, 2023 09:36
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save guilhermesimoes/d69e547884e556c3dc95 to your computer and use it in GitHub Desktop.
Save guilhermesimoes/d69e547884e556c3dc95 to your computer and use it in GitHub Desktop.
Ruby Benchmark: Counting the number of lines of a file
# gem install benchmark-ips
require "benchmark/ips"
path = "lib/rubycritic/cli/options.rb"
Benchmark.ips do |x|
x.report("read + each_line") { File.read(path).each_line.count }
x.report("open + each_line") { File.open(path, "r").each_line.count }
x.report("open + readlines") { File.open(path, "r").readlines.length }
x.report("foreach + count") { File.foreach(path).count }
x.report("foreach + inject") { File.foreach(path).inject(0) {|count, line| count+1} }
x.report("foreach") { count = 0; File.foreach(path) { count+=1}; count }
x.report("wc") { `wc -l < "#{path}"`.to_i }
x.compare!
end
Calculating -------------------------------------
read + each_line 4.272k i/100ms
open + each_line 4.075k i/100ms
open + readlines 4.050k i/100ms
foreach + count 3.569k i/100ms
foreach + inject 2.979k i/100ms
foreach 3.397k i/100ms
wc 26.000 i/100ms
-------------------------------------------------
read + each_line 44.535k (± 2.7%) i/s - 226.416k
open + each_line 42.232k (± 2.7%) i/s - 211.900k
open + readlines 41.585k (± 5.8%) i/s - 210.600k
foreach + count 38.254k (± 1.1%) i/s - 192.726k
foreach + inject 31.478k (± 0.8%) i/s - 157.887k
foreach 35.819k (± 3.2%) i/s - 180.041k
wc 260.809 (± 3.8%) i/s - 1.326k
Comparison:
read + each_line: 44534.7 i/s
open + each_line: 42232.2 i/s - 1.05x slower
open + readlines: 41584.8 i/s - 1.07x slower
foreach + count: 38254.1 i/s - 1.16x slower
foreach: 35818.6 i/s - 1.24x slower
foreach + inject: 31478.2 i/s - 1.41x slower
wc: 260.8 i/s - 170.76x slower
@ilyazub
Copy link

ilyazub commented Mar 14, 2023

Thank you! Found this gist while trying to reduce memory usage of serpapi.com backend.

count($/) is 1.5 times faster and doesn't allocate memory. Checked on Ruby 2.7.2.

$ ruby -v
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]

Performance

Results

Warming up --------------------------------------
                size    31.000  i/100ms
              length    75.000  i/100ms
               count    77.000  i/100ms
   each_line + count    81.000  i/100ms
           count($/)   196.000  i/100ms
Calculating -------------------------------------
                size      1.529k (±33.9%) i/s -      4.774k in   5.015361s
              length      1.434k (±38.8%) i/s -      5.025k in   5.139834s
               count      1.335k (±40.7%) i/s -      4.697k in   5.079353s
   each_line + count      1.411k (±39.5%) i/s -      5.022k in   5.110146s
           count($/)      2.231k (± 2.6%) i/s -     11.172k in   5.012323s

Comparison:
           count($/):     2230.5 i/s
                size:     1529.0 i/s - 1.46x  (± 0.00) slower
              length:     1434.2 i/s - 1.56x  (± 0.00) slower
   each_line + count:     1411.0 i/s - 1.58x  (± 0.00) slower
               count:     1334.9 i/s - 1.67x  (± 0.00) slower

Code

require "benchmark/ips"

html = File.read(Rails.root.join("spec/data/google/superhero-movies-mobile-63f582a0defa1345501c6b50-2023-02-23.html"))

Benchmark.ips do |x|
  x.report("size") { html.lines.size < 50 }
  x.report("length") { html.lines.length < 50 }
  x.report("count") { html.lines.count < 50 }

  x.report("each_line + count") { html.each_line.count < 50 }

  x.report("count($/)") { html.count($/) < 50 }

  x.compare!
end

Memory usage

count($\) doesn't allocate a new array compared to lines/each_line/etc.

Clarifying reports:

  • tmp/html_length_vs_count_vs_size_bench.rb:4 is File.read
  • tmp/html_length_vs_count_vs_size_bench.rb:6 is lines.size
lines/readlines/each_line/etc.
$ bundle exec heapy read ./tmp/lines_count/allocated.heap 49 --lines=10

Analyzing Heap (Generation: 49)
-------------------------------

allocated by memory (204879705) (in bytes)
==============================
  204872652  tmp/html_length_vs_count_vs_size_bench.rb:6
       2568  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
       1010  /usr/local/lib/ruby/2.7.0/pathname.rb:42
        800  /usr/local/lib/ruby/2.7.0/pathname.rb:46
        392  tmp/html_length_vs_count_vs_size_bench.rb:4
        252  /usr/local/lib/ruby/2.7.0/pathname.rb:351
        218  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/logger.rb:89
        217  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/reporting.rb:70
        217  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/tagged_logging.rb:23
        192  /usr/local/lib/ruby/2.7.0/pathname.rb:358

object count (5406)
==============================
  5301  tmp/html_length_vs_count_vs_size_bench.rb:6
    27  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
    22  /usr/local/lib/ruby/2.7.0/pathname.rb:42
    20  /usr/local/lib/ruby/2.7.0/pathname.rb:46
     5  tmp/html_length_vs_count_vs_size_bench.rb:4
     4  /usr/local/lib/ruby/2.7.0/pathname.rb:351
     3  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/reporting.rb:87
     3  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/inflector/methods.rb:272
     2  /usr/local/lib/ruby/2.7.0/pathname.rb:410
     2  /usr/local/lib/ruby/2.7.0/pathname.rb:390

High Ref Counts
==============================

  5300  tmp/html_length_vs_count_vs_size_bench.rb:6
    73  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
    20  /usr/local/lib/ruby/2.7.0/pathname.rb:46
count($\)
$ bundle exec heapy read ./tmp/count_nl/allocated.heap 48 --lines=10

Analyzing Heap (Generation: 48)
-------------------------------

allocated by memory (2547465) (in bytes)
==============================
  2540804  tmp/html_length_vs_count_vs_size_bench.rb:4
     2568  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
     1010  /usr/local/lib/ruby/2.7.0/pathname.rb:42
      800  /usr/local/lib/ruby/2.7.0/pathname.rb:46
      252  /usr/local/lib/ruby/2.7.0/pathname.rb:351
      218  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/logger.rb:89
      217  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/tagged_logging.rb:23
      217  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/reporting.rb:70
      192  /usr/local/lib/ruby/2.7.0/pathname.rb:358
      192  /usr/local/lib/ruby/2.7.0/pathname.rb:357

object count (105)
==============================
  27  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
  22  /usr/local/lib/ruby/2.7.0/pathname.rb:42
  20  /usr/local/lib/ruby/2.7.0/pathname.rb:46
   5  tmp/html_length_vs_count_vs_size_bench.rb:4
   4  /usr/local/lib/ruby/2.7.0/pathname.rb:351
   3  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/inflector/methods.rb:272
   3  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/reporting.rb:87
   2  /usr/local/lib/ruby/2.7.0/pathname.rb:410
   2  /usr/local/lib/ruby/2.7.0/pathname.rb:389
   2  /usr/local/lib/ruby/2.7.0/pathname.rb:390

High Ref Counts
==============================

  73  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
  20  /usr/local/lib/ruby/2.7.0/pathname.rb:46
   4  /usr/local/lib/ruby/2.7.0/pathname.rb:358
   3  tmp/html_length_vs_count_vs_size_bench.rb:4
   3  /usr/local/lib/ruby/2.7.0/pathname.rb:351
   3  /usr/local/lib/ruby/2.7.0/pathname.rb:390
Code
require "heap-profiler"

HeapProfiler.report(Rails.root.join('tmp/lines_count')) do
  html = File.read(Rails.root.join("spec/data/google/superhero-movies-mobile-63f582a0defa1345501c6b50-2023-02-23.html"))

  100.times { html.lines.count < 50 }
  # 100.times { html.count($/) < 50 }
end
Script to compare

Comment/uncomment 100.times { html.lines.count < 50 } and replace paths in the command below. (Not super convenient. heapy diff doesn't provide a diff).

# Profile heap
bundle exec rails r tmp/html_length_vs_count_vs_size_bench.rb

# Read summary of heap allocations
bundle exec heapy read ./tmp/count_nl/allocated.heap

# Read a specific generation (48) limiting number of lines to output (10)
bundle exec heapy read ./tmp/count_nl/allocated.heap 48 --lines=10

@ilyazub
Copy link

ilyazub commented Sep 11, 2023

An updated benchmark results:

$ bundle exec rails r tmp/html_length_vs_count_vs_size_bench.rb
Warming up --------------------------------------
        lines + size    19.000  i/100ms
      lines + length   130.000  i/100ms
       lines + count   122.000  i/100ms
   each_line + count   127.000  i/100ms
           count($/)   172.000  i/100ms
each_line + take + count
                       159.000  i/100ms
each_line + lazy + take + count
                       155.000  i/100ms
each_line + lazy + take + count
                       145.000  i/100ms
Calculating -------------------------------------
        lines + size      1.205k (±28.3%) i/s -      5.548k in   5.007348s
      lines + length      1.117k (±24.5%) i/s -      4.940k in   5.074537s
       lines + count    999.275  (±43.0%) i/s -      3.904k in   5.142585s
   each_line + count    962.420  (±42.7%) i/s -      3.937k in   5.386932s
           count($/)      1.696k (± 3.4%) i/s -      8.600k in   5.077876s
each_line + take + count
                          1.050k (±42.9%) i/s -      4.134k in   5.084343s
each_line + lazy + take + count
                        918.077  (±50.5%) i/s -      3.565k in   5.057191s
each_line + lazy + take + count
                        941.035  (±49.4%) i/s -      3.770k in   5.265527s

Comparison:
           count($/):     1695.7 i/s
        lines + size:     1205.1 i/s - 1.41x  (± 0.00) slower
      lines + length:     1116.8 i/s - 1.52x  (± 0.00) slower
each_line + take + count:     1050.0 i/s - 1.61x  (± 0.00) slower
       lines + count:      999.3 i/s - 1.70x  (± 0.00) slower
   each_line + count:      962.4 i/s - 1.76x  (± 0.00) slower
each_line + lazy + take + count:      941.0 i/s - 1.80x  (± 0.00) slower

Code:

require "benchmark/ips"

Benchmark.ips do |x|
  x.report("lines + size") { html.lines.size < 50 }
  x.report("lines + length") { html.lines.length < 50 }
  x.report("lines + count") { html.lines.count < 50 }

  x.report("count($/)") { html.count($/) < 50 }

  x.report("each_line + count") { html.each_line.count < 50 }
  x.report("each_line + take + count") { html.each_line.take(50).count < 50 }
  x.report("each_line + lazy + take + count") { html.each_line.lazy.take(50).count < 50 }
  x.report("each_line + lazy + take + count") { !html.each_line.lazy.drop(49).any? }

  x.compare!
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment