Skip to content

Instantly share code, notes, and snippets.

@benolee
Created January 13, 2018 04:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save benolee/dfbedcd2793b4a013f10611526d4847f to your computer and use it in GitHub Desktop.
Save benolee/dfbedcd2793b4a013f10611526d4847f to your computer and use it in GitHub Desktop.
Ruby's performance tuning way

This is Japanese-English translation of my diary articles

These articles demonstration how to find out memory issues and tuning.

  • How to estimate memory consumption
  • How to use allocation_tracer gem to know object creation details
  • How to optimize this typical issue in VM optimization
  • How to use gc_tracer gem to know GC behaviour
  • How to use valgrind/massif to know memory consumption details in memory

I hope this article help your Ruby life.


Ruby's performance tuning way.

Matz pointed out [this article] (http://kokizzu.blogspot.jp/2015/02/c-java-hhvm-ruby-nodejsrhinojscspidermo.html) for us and said "I want to reduce memory consumption of Ruby interpreter". It is an order to [our team] (https://blog.heroku.com/archives/2013/3/6/matz_highlights_ruby_2_0_at_waza).

Let's quote target program.

##############################
# scomb.rb
def newGap gap
    gap /= 1.3;
    gap = gap.to_i;
    return 11 if gap == 9 or gap == 10
    return 1 if gap < 1
    gap
end
def combSort a
    len = a.length
    gap = len
    swapped = false
    begin
        swapped = false
        gap = newGap gap
        (0...(len-gap)).each do |i|
            if a[i] > a[i+gap]
                swapped = true
                a[i], a[i+gap] = a[i+gap], a[i]
            end
        end
    end while gap > 1 or swapped
end
N = 10000000;
arr = N.times.map{|x| (N-x).to_s }
combSort arr
(1...N).each{ |z| print '!' if arr[z]<arr[z-1] }

This program is simple. Let's quote an evaluation result, too.

$ ruby --version
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
$ time ruby scomb.rb
CPU: 114.67s    Real: 115.21s   RAM: 870612KB

I evaluated this program on my own environment (ruby trunk on Ubuntu on VirtualBox on Windows 7).

CPU: 108.87s    Real: 109.11s   RAM: 860996KB

Almost similar.

At first, understand what happens.

At last lines, this program makes an array containing 10000000 = 10_000_000 = 10M entries. On 64bit CPU, we can estimate required memory comsmption:

  • An array having 10M items -> 1 entry needs 8B (pointer size) * 10M items = 80MB
  • 10M short strings = 40 byte (size of RVALUE) * 10M = 400MB

About 500MB. When 10M slots (slot is an basic space for each objects) are required, then Ruby's GC allocates x1.8 slots, so 10M * 1.8 = 18M RVALUE will be allocated. So total slots consumes 18M * 40B = 720MB. 720MB + 80MB = 800MB. It is similar to the evaluation result: RAM: 860,996KB = 860MB.

Let's consider using GC tuning (Ruby's GC can tune by using several environment variables). Allocatiing 1.8 times slots seems too huge, doesn't it? Let's try this parameter to use 1.2. RUBY_GC_HEAP_GROWTH_FACTOR environ variable specify this factor. Let's try.

$ RUBY_GC_HEAP_GROWTH_FACTOR=1.2 /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ./miniruby -I../../trunk/lib -I. -I.ext/common   ../../trunk/test.rb

CPU: 108.32s    Real: 108.49s   RAM: 745388KB

Ok. Memory consumption is reduced. But not so big reduction.

Originally, which part of program consumes memory? A method combSort() seems no memory consuming parts. Let's try run prgoram without calling combSort(arr).

CPU: 3.09s      Real: 3.16s     RAM: 527452KB

This is 500MB which we estimated for this program consuming memory. Therefore, some parts of calling combSort(arr) seems to be consuming memory. Which part of this program make object?, and which type of objects?

In this case, we can use [allocation tracer] (https://github.com/ko1/allocation_tracer) gem.

Like that:

##############################
# scomb.rb
def newGap gap
    gap /= 1.3;
    gap = gap.to_i;
    return 11 if gap == 9 or gap == 10
    return 1 if gap < 1
    gap
end
def combSort a
    len = a.length
    gap = len
    swapped = false
    begin
        swapped = false
        gap = newGap gap
        (0...(len-gap)).each do |i|
            if a[i] > a[i+gap]
                swapped = true
                a[i], a[i+gap] = a[i+gap], a[i]
            end
        end
    end while gap > 1 or swapped
end

require 'allocation_tracer'
require 'pp'

pp ObjectSpace::AllocationTracer.trace{
  N = 1000#0000;
  arr = N.times.map{|x| (N-x).to_s }
  combSort arr
  (1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
}

Only surrounding ObjectSpace::AllocationTracer.trace block. Btw, we only know a trend, making 1K entries instead of 10M entries.

Results are here:

{["../../trunk/test.rb", 31]=>[1004, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 17]=>[24, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 20]=>[2546, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 33]=>[1, 0, 0, 0, 0, 0]}

This means, for example, the result is [1004, 0, 0, 0, 0, 0] at the location ["../../trunk/test.rb", 31] (of course, it means ../../trunk/test.rb:31). It is difficult without any information.

  • 1st column means 1004 objects are created at this line.
  • 2nd column means 0 objects (created at this line) are promoted to old objects.
  • 3rd column means total age of objects created at this line is 0.
  • 4th column means the minimum age of objects created at this line is 0.
  • 5th column means the maximum age of objects created at this line is 0.
  • 6th column means that total memory consumption without RVALUE.

1st column (creation number) is important and enough for this situation.

Line 31 making an array by map method so 1004 objects are created at this line. It is reasonable.

We need to check line 20. This line creats 2546 objects. This line is "a[i], a[i+gap] = a[i+gap], a[i]". What happen on this line?

To know more details, we can use allocation tracer gem like this:

ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
  ...
}

We can collect statistics for each types (types are MRI specific types). A result is:

{["../../trunk/test.rb", 31, :T_DATA]=>[1, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 31, :T_ARRAY]=>[1, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 31, :T_NODE]=>[2, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 31, :T_STRING]=>[1000, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 17, :T_STRUCT]=>[24, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 20, :T_ARRAY]=>[2546, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 33, :T_STRUCT]=>[1, 0, 0, 0, 0, 0]}

Using this technique, we can understand T_ARRAY, that is to say array objects are created. Whay arrays?

Actually, ruby's multiple assignment (parallel assignment) returns an array. Maybe you don't know about that. So let's see the example.

x = (a, b = 1, 2)
p x #=> [1, 2]

A variable x has a reference to the result of an expression (a, b = 1, 2). And it is [1, 2]. We can check it with allocation tracer gem.

require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
  a, b = 1, 2
}

and

{["../../trunk/test.rb", 5, :T_ARRAY]=>[1, 0, 0, 0, 0, 0]}

Okay, line 5, or a, b = 1, 2 creates an array.

You want to say that you don't want to use created array object [1, 2]. In past I also had thought same thing, we optimize it. If it is clear that we don't need to use an array object, then we elimite to create an array object.

Let's try.

require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
  a, b = 1, 2
  nil
}

Result:

{}

Result shows that no objects are created.

Original program returns a result value of multiple assignment as a block result value. But we can specify nil for block result value, and we can eliminate creating array objects.

Okay, let's insert nil and try it again.

##############################
# scomb.rb
def newGap gap
    gap /= 1.3;
    gap = gap.to_i;
    return 11 if gap == 9 or gap == 10
    return 1 if gap < 1
    gap
end
def combSort a
    len = a.length
    gap = len
    swapped = false
    begin
        swapped = false
        gap = newGap gap
        (0...(len-gap)).each do |i|
            if a[i] > a[i+gap]
                swapped = true
                a[i], a[i+gap] = a[i+gap], a[i]
                nil                               # Insert nil here
            end
        end
    end while gap > 1 or swapped
end

require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
  N = 1000#0000;
  arr = N.times.map{|x| (N-x).to_s }
  combSort arr
  (1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
}

But...

{["../../trunk/test.rb", 32, :T_DATA]=>[1, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 32, :T_ARRAY]=>[1, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 32, :T_NODE]=>[2, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 32, :T_STRING]=>[1000, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 17, :T_STRUCT]=>[24, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 20, :T_ARRAY]=>[2546, 0, 0, 0, 0, 0],   <-- here
 ["../../trunk/test.rb", 34, :T_STRUCT]=>[1, 0, 0, 0, 0, 0]}

Array objects are created. Why?

In fact, multiple assignments includes method calling like a[x], a[y] = 1, 2 create an array naively. This is because method invocation can cause side-effect and we need to sort execution order correctly. For example, this program should invoke a[x] = 1; a[y] = 2 one by one.

In reverse, let's consider the cases we don't need to care about an execution order.

a, b = 1, 2

In this case, Ruby's stack machine VM exceute like that.

push 1
push 2
set b # b = 2
set a # a = 1

In this case, an assignment order is b and a. However, it is possible that a[x], a[y] has a side effect, a logging for example. So we can't do same thing like:

push 1
push 2
set a[y] # a[y] = 2
set a[x] # a[x] = 1

Because of this reason, we make an intermediate array.

push 1
push 2
newarray 2    # Make [1, 2]
expandarray 2 # Expand [1, 2] reversely.
set a[x]      # a[x] = 1
set a[y]      # a[y] = 2

We can complete multiple assignment.

Now, we understand everything. Let's rewrite multiple assignment like that:

##############################
# scomb.rb
def newGap gap
    gap /= 1.3;
    gap = gap.to_i;
    return 11 if gap == 9 or gap == 10
    return 1 if gap < 1
    gap
end
def combSort a
    len = a.length
    gap = len
    swapped = false
    begin
        swapped = false
        gap = newGap gap
        (0...(len-gap)).each do |i|
            if a[i] > a[i+gap]
                swapped = true
                x, y = a[i+gap], a[i] # Rewrite here, not cool way
                a[i] = x
                a[i+gap] = y
                nil
            end
        end
    end while gap > 1 or swapped
end

require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
  N = 1000#0000;
  arr = N.times.map{|x| (N-x).to_s }
  combSort arr
  (1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
}

and result is:

{["../../trunk/test.rb", 34, :T_DATA]=>[1, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 34, :T_ARRAY]=>[1, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 34, :T_NODE]=>[2, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 34, :T_STRING]=>[1000, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 17, :T_STRUCT]=>[24, 0, 0, 0, 0, 0],
 ["../../trunk/test.rb", 36, :T_STRUCT]=>[1, 0, 0, 0, 0, 0]}

Congraturations, we don't make an array object any more at line 20.

Let's try with 10M entries.

$ /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ./miniruby -I../../trunk/lib -I. -I.ext/common   ../../trunk/test.rb

CPU: 102.89s    Real: 103.14s   RAM: 527712KB

Congraturations, it is estimated 500MB.


Based on this analysis, Nobu made a patch to tune this behavior in [r49614] (http://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=49614).

For a[0], a[1] = x, y, the compiled byte code from:

push x           # stack: x
push y           # stack: x y
newarray 2       # stack: [x, y]
expandarray 2, 0 # stack: y, x
set a[0]         # stack: y  # a[0] = x
set a[1]         # stack:    # a[1] = y

to

push x           # stack: x
push y           # stack: x y
swap             # stack: y x
set a[0]         # stack: y # a[0] = x
set a[1]         # stack:   # a[1] = y

It is only for two variables case, but most freqent case. We have an idea to introduce reverse instruction for over two variables, but it is not frequent case. So that we only introduce this technique using an exsiting swap instruction.

Btw, not balanced cases aren't supported, such as a, b, c = x, y. They are exceptions.

Let's try on trunk.

##############################
# scomb.rb
def newGap gap
    gap /= 1.3;
    gap = gap.to_i;
    return 11 if gap == 9 or gap == 10
    return 1 if gap < 1
    gap
end
def combSort a
    len = a.length
    gap = len
    swapped = false
    begin
        swapped = false
        gap = newGap gap
        (0...(len-gap)).each do |i|
            if a[i] > a[i+gap]
                swapped = true
                a[i], a[i+gap] = a[i+gap], a[i]
                nil                                # there is still nil
            end
        end
    end while gap > 1 or swapped
end

N = 10000000;
arr = N.times.map{|x| (N-x).to_s }
combSort arr
(1...N).each{ |z| print '!' if arr[z]<arr[z-1] }

and

CPU: 103.15s    Real: 103.30s   RAM: 527492KB

Congratulations, we can optimize completely.


nil is not cool. How about to use GC tuning again?

We can estimate that required objects (old objects) are 10M + alpha objects. Let's try with RUBY_GC_HEAP_GROWTH_FACTOR=1.01 to restrict heap extension. However, using this parameter, we need huge overhead for initial code (making a 10M entries array). So that prepare 10M+alpha slots, in this case, 12M slots at initial time using RUBY_GC_HEAP_INIT_SLOTS=12000000.

Let's try:

$ RUBY_GC_HEAP_INIT_SLOTS=12000000 RUBY_GC_HEAP_GROWTH_FACTOR=1.01 /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ./miniruby -I../../trunk/lib -I. -I.ext/common   ../../trunk/test.rb

CPU: 110.43s    Real: 110.65s   RAM: 670824KB

It is better than last trial "RAM: 745388KB".

However, we prepare 12M slots at first, it should be enough but 740MB is higher than estimated (12M * 40B = 480MB: slots) + (80MB: array) = 560MB. What happens?

In this case, we can use [gc_tracer] (https://github.com/ko1/gc_tracer) gem.

At the beggining of file, you only need to write:

require 'gc_tracer'
GC::Tracer.start_logging

and program outputs GC internal information at every events, like that.

type    tick    count   heap_allocated_pages    heap_sorted_length      heap_allocatable_pages  heap_available_slots    heap_live_slots heap_free_slots    heap_final_slots        heap_marked_slots       heap_swept_slots        heap_eden_pages heap_tomb_pages total_allocated_pages   total_freed_pages  total_allocated_objects total_freed_objects     malloc_increase_bytes   malloc_increase_bytes_limit     minor_gc_count  major_gc_count  remembered_wb_unprotected_objects  remembered_wb_unprotected_objects_limit old_objects     old_objects_limit       oldmalloc_increase_bytes        oldmalloc_increase_bytes_limit     major_by        gc_by   have_finalizer  immediate_sweep state
start   1424126502809096        1       29411   29411   0       11988251        11986265        1986    0       0       0       29411   0       29411      0       11988251        1986    0       33554432        0       0       0       0       0       0       0       16777216        0       newobj     0       0       none
end_mark        1424126503057311        1       29411   29411   0       11988251        11986265        1986    0       10008677        0       29411      0       29411   0       11988251        1986    0       33554432        1       0       164     0       6099    0       0       16777216  0newobj  0       0       marking
end_sweep       1424126503416026        1       29411   29411   2859    11988251        10415829        1572422 0       10008677        1979573 26552      2859    29411   0       12395402        1979573 0       33554432        1       0       164     0       6099    0       0       16777216  0newobj  0       0       sweeping
start   1424126503416614        2       29411   29411   2859    11988251        10415829        1572422 0       10008677        1979573 26552   2859       29411   0       12395402        1979573 0       32883343        1       0       164     0       6099    0       0       16777216        oldgen     newobj  0       0       none
end_mark        1424126505920376        2       29412   29705   293     11988659        11988228        431     0       10011540        1979573 29412      0       29412   0       13967801        1979573 0       32883343        1       1       164     328     6099    12198   0       16777216  oldgen   newobj  0       0       marking
end_sweep       1424126509653076        2       29704   29705   1426    12107679        11526419        581260  0       10011540        2096662 28278      1426    29704   0       15482679        3956260 0       32883343        1       1       164     328     6099    12198   0       16777216  oldgen   newobj  0       0       sweeping
start   1424126510720439        3       29704   29705   0       12107679        12107679        0       0       10011540        2096662 29704   0 29704    0       16063939        3956260 0       32225676        1       1       164     328     6099    12198   0       16777216        0       newobj     0       0       none
end_mark        1424126511016490        3       29705   30001   296     12108087        12107679        408     0       10008677        2096662 29705      0       29705   0       16063939        3956260 0       32225676        2       1       173     328     10008482        12198   0       16777216   0       newobj  0       0       marking
end_sweep       1424126511495631        3       29705   30001   3035    12108087        10439327        1668760 0       10008677        2220177 26670      3035    29705   0       16494588        6055261 0       32225676        2       1       173     328     10008482        12198   0       16777216   0       newobj  0       0       sweeping
start   1424126511496421        4       29705   30001   3035    12108087        10439327        1668760 0       10008677        2220177 26670   3035       29705   0       16494588        6055261 0       31581162        2       1       173     328     10008482        12198   0       16777216  oldgen   newobj  0       0       none
end_mark        1424126515062702        4       29706   30002   296     12108494        12108087        407     0       10011715        2220177 29706      0       29706   0       18163348        6055261 0       31581162        2       2       173     346     10008483        20016966        0 16777216 oldgen  newobj  0       0       marking
end_sweep       1424126517959578        4       30001   30002   1514    12228736        11611552        617184  0       10011715        2217546 28487      1514    30001   0       19763184        8151632 0       31581162        2       2       173     346     10008483        20016966        0 16777216 oldgen  newobj  0       0       sweeping
start   1424126518937223        5       30001   30002   0       12228736        12228736        0       0       10011715        2217546 30001   0 30001    0       20380368        8151632 0       30949538        2       2       173     346     10008483        20016966        0       16777216  0newobj  0       0       none
end_mark        1424126518939379        5       30002   30301   299     12229144        12228736        408     0       10008677        2217546 30002      0       30002   0       20380368        8151632 0       30949538        3       2       173     346     10008483        20016966        0 16777216 0       newobj  0       0       marking
end_sweep       1424126520330323        5       30002   30301   3714    12229144        10715191        1513953 0       10008677        2342458 26288      3714    30002   0       21086881        10371690        0       30949538        3       2       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126522730700        6       30002   30301   0       12229144        12229144        0       0       10008677        2342458 30002   0 30002    0       22600834        10371690        0       30330547        3       2       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126522733421        6       30003   30302   299     12229552        12229144        408     0       10008677        2342458 30003      0       30003   0       22600834        10371690        0       30330547        4       2       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126523700184        6       30003   30302   3715    12229552        10715222        1514330 0       10008677        2342866 26288      3715    30003   0       23307378        12592156        0       30330547        4       2       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126526054966        7       30003   30302   0       12229552        12229552        0       0       10008677        2342866 30003   0 30003    0       24821708        12592156        0       29723936        4       2       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126526057021        7       30004   30303   299     12229959        12229552        407     0       10008677        2342866 30004      0       30004   0       24821708        12592156        0       29723936        5       2       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126527206651        7       30004   30303   3715    12229959        10715610        1514349 0       10008677        2343273 26289      3715    30004   0       25528640        14813030        0       29723936        5       2       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126529599621        8       30004   30303   0       12229959        12229959        0       0       10008677        2343273 30004   0 30004    0       27042989        14813030        0       29129457        5       2       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126529601939        8       30004   30303   0       12229959        12229959        0       0       10008677        2343273 30004      0       30004   0       27042989        14813030        0       29129457        6       2       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126530234409        8       30004   30303   3213    12229959        10464694        1765265 0       10008677        2221281 26791      3213    30004   0       27499005        17034311        0       29129457        6       2       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126530235030        9       30004   30303   3213    12229959        10464768        1765191 0       10008677        2221281 26791   3213       30004   0       27499079        17034311        0       28546867        6       2       173     346     10008483        20016966        0 16777216 nofree  newobj  0       0       none
end_mark        1424126533329440        9       30005   30304   299     12230366        12229959        407     0       10011894        2221281 30005      0       30005   0       29264270        17034311        0       28546867        6       3       173     346     10008483        20016966  016777216        nofree  newobj  0       0       marking
end_sweep       1424126535800255        9       30303   30304   1604    12351831        11697946        653885  0       10011894        2340463 28699      1604    30303   0       30950321        19252375        0       28546867        6       3       173     346     10008483        20016966  016777216        nofree  newobj  0       0       sweeping
start   1424126536666685        10      30303   30304   0       12351831        12351831        0       0       10011894        2340463 30303   0 30303    0       31604206        19252375        0       27975929        6       3       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126536668752        10      30304   30606   302     12352239        12351831        408     0       10008677        2340463 30304      0       30304   0       31604206        19252375        0       27975929        7       3       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126537715101        10      30304   30606   3938    12352239        10746976        1605263 0       10008677        2466777 26366      3938    30304   0       32342504        21595528        0       27975929        7       3       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126539759913        11      30304   30606   0       12352239        12352239        0       0       10008677        2466777 30304   0 30304    0       33947767        21595528        0       27416410        7       3       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126539762060        11      30305   30607   302     12352646        12352239        407     0       10008677        2466777 30305      0       30305   0       33947767        21595528        0       27416410        8       3       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126540779275        11      30305   30607   3938    12352646        10747410        1605236 0       10008677        2467184 26367      3938    30305   0       34686499        23939089        0       27416410        8       3       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126542891783        12      30305   30607   0       12352646        12352646        0       0       10008677        2467184 30305   0 30305    0       36291735        23939089        0       26868081        8       3       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126542894134        12      30306   30608   302     12353054        12352646        408     0       10008677        2467184 30306      0       30306   0       36291735        23939089        0       26868081        9       3       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126543915024        12      30306   30608   3938    12353054        10747832        1605222 0       10008677        2467592 26368      3938    30306   0       37030889        26283057        0       26868081        9       3       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126545944224        13      30306   30608   0       12353054        12353054        0       0       10008677        2467592 30306   0 30306    0       38636111        26283057        0       26330719        9       3       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126545946459        13      30306   30608   0       12353054        12353054        0       0       10008677        2467592 30306      0       30306   0       38636111        26283057        0       26330719        10      3       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126546575020        13      30306   30608   3393    12353054        10489565        1863489 0       10008677        2344376 26913      3393    30306   0       39116998        28627433        0       26330719        10      3       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126546575643        14      30306   30608   3393    12353054        10489639        1863415 0       10008677        2344376 26913   3393       30306   0       39117072        28627433        0       25804104        10      3       173     346     10008483        20016966        0 16777216 nofree  newobj  0       0       none
end_mark        1424126549474776        14      30307   30609   302     12353462        12353054        408     0       10012072        2344376 30307      0       30307   0       40980487        28627433        0       25804104        10      4       173     346     10008483        20016966  016777216        nofree  newobj  0       0       marking
end_sweep       1424126551736504        14      30608   30609   1696    12476150        11784748        691402  0       10012072        2464605 28912      1696    30608   0       42753162        30968414        0       25804104        10      4       173     346     10008483        20016966  016777216        nofree  newobj  0       0       sweeping
start   1424126552572440        15      30608   30609   0       12476150        12476150        0       0       10012072        2464605 30608   0 30608    0       43444564        30968414        0       25288021        10      4       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126552574628        15      30609   30914   305     12476557        12476150        407     0       10008677        2464605 30609      0       30609   0       43444564        30968414        0       25288021        11      4       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126553574614        15      30609   30914   4163    12476557        10779627        1696930 0       10008677        2592319 26446      4163    30609   0       44215513        33435886        0       25288021        11      4       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126555693333        16      30609   30914   0       12476557        12476557        0       0       10008677        2592319 30609   0 30609    0       45912443        33435886        0       24782260        11      4       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126555695618        16      30610   30915   305     12476965        12476557        408     0       10008677        2592319 30610      0       30610   0       45912443        33435886        0       24782260        12      4       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126556637992        16      30610   30915   4164    12476965        10779632        1697333 0       10008677        2592727 26446      4164    30610   0       46683397        35903765        0       24782260        12      4       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126558691606        17      30610   30915   0       12476965        12476965        0       0       10008677        2592727 30610   0 30610    0       48380730        35903765        0       24286614        12      4       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126558693863        17      30611   30916   305     12477372        12476965        407     0       10008677        2592727 30611      0       30611   0       48380730        35903765        0       24286614        13      4       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126559661498        17      30611   30916   4165    12477372        10779603        1697769 0       10008677        2593134 26446      4165    30611   0       49151655        38372052        0       24286614        13      4       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126561720964        18      30611   30916   0       12477372        12477372        0       0       10008677        2593134 30611   0 30611    0       50849424        38372052        0       23800881        13      4       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126561723441        18      30611   30916   0       12477372        12477372        0       0       10008677        2593134 30611      0       30611   0       50849424        38372052        0       23800881        14      4       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126562369895        18      30611   30916   3576    12477372        10514399        1962973 0       10008677        2468694 27035      3576    30611   0       51355145        40840746        0       23800881        14      4       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126562370577        19      30611   30916   3576    12477372        10514473        1962899 0       10008677        2468694 27035   3576       30611   0       51355219        40840746        0       23324863        14      4       173     346     10008483        20016966        0 16777216 nofree  newobj  0       0       none
end_mark        1424126565561276        19      30612   30917   305     12477780        12477372        408     0       10012256        2468694 30612      0       30612   0       53318118        40840746        0       23324863        14      5       173     346     10008483        20016966  016777216        nofree  newobj  0       0       marking
end_sweep       1424126567940458        19      30916   30917   1787    12601690        11873250        728440  0       10012256        2589963 29129      1787    30916   0       55179111        43305861        0       23324863        14      5       173     346     10008483        20016966  016777216        nofree  newobj  0       0       sweeping
start   1424126568805749        20      30916   30917   0       12601690        12601690        0       0       10012256        2589963 30916   0 30916    0       55907551        43305861        0       22858365        14      5       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126568808097        20      30917   31225   308     12602098        12601690        408     0       10008677        2589963 30917      0       30917   0       55907551        43305861        0       22858365        15      5       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126569823157        20      30917   31225   4391    12602098        10812225        1789873 0       10008677        2719084 26526      4391    30917   0       56711098        45898873        0       22858365        15      5       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126571993522        21      30917   31225   0       12602098        12602098        0       0       10008677        2719084 30917   0 30917    0       58500971        45898873        0       22401197        15      5       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126571995939        21      30918   31226   308     12602506        12602098        408     0       10008677        2719084 30918      0       30918   0       58500971        45898873        0       22401197        16      5       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126572995120        21      30918   31226   4391    12602506        10812573        1789933 0       10008677        2719492 26527      4391    30918   0       59304866        48492293        0       22401197        16      5       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126575233588        22      30918   31226   0       12602506        12602506        0       0       10008677        2719492 30918   0 30918    0       61094799        48492293        0       21953173        16      5       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126575236602        22      30919   31227   308     12602913        12602506        407     0       10008677        2719492 30919      0       30919   0       61094799        48492293        0       21953173        17      5       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126576237601        22      30919   31227   4393    12602913        10812191        1790722 0       10008677        2719899 26526      4393    30919   0       61898312        51086121        0       21953173        17      5       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126578559341        23      30919   31227   0       12602913        12602913        0       0       10008677        2719899 30919   0 30919    0       63689034        51086121        0       21514109        17      5       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126578561550        23      30919   31227   0       12602913        12602913        0       0       10008677        2719899 30919      0       30919   0       63689034        51086121        0       21514109        18      5       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126579347093        23      30919   31227   3762    12602913        10539261        2063652 0       10008677        2594235 27157      3762    30919   0       64219617        53680356        0       21514109        18      5       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126579347996        24      30919   31227   3762    12602913        10539335        2063578 0       10008677        2594235 27157   3762       30919   0       64219691        53680356        0       21083826        18      5       173     346     10008483        20016966        0 16777216 nofree  newobj  0       0       none
end_mark        1424126582822956        24      30920   31228   308     12603320        12602913        407     0       10012442        2594235 30920      0       30920   0       66283269        53680356        0       21083826        18      6       173     346     10008483        20016966  016777216        nofree  newobj  0       0       marking
end_sweep       1424126585323074        24      31227   31228   1878    12728454        11962886        765568  0       10012442        2716541 29349      1878    31227   0       68233712        56270826        0       21083826        18      6       173     346     10008483        20016966  016777216        nofree  newobj  0       0       sweeping
start   1424126586349636        25      31227   31228   0       12728454        12728454        0       0       10012442        2716541 31227   0 31227    0       68999280        56270826        0       20662149        18      6       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126586351756        25      31228   31539   311     12728861        12728454        407     0       10008677        2716541 31228      0       31228   0       68999280        56270826        0       20662149        19      6       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126587374898        25      31228   31539   4621    12728861        10845248        1883613 0       10008677        2847071 26607      4621    31228   0       69835850        58990602        0       20662149        19      6       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126589528536        26      31228   31539   0       12728861        12728861        0       0       10008677        2847071 31228   0 31228    0       71719463        58990602        0       20248906        19      6       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126589530688        26      31229   31540   311     12729269        12728861        408     0       10008677        2847071 31229      0       31229   0       71719463        58990602        0       20248906        20      6       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126590517575        26      31229   31540   4622    12729269        10845240        1884029 0       10008677        2847479 26607      4622    31229   0       72556025        61710785        0       20248906        20      6       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126592712832        27      31229   31540   0       12729269        12729269        0       0       10008677        2847479 31229   0 31229    0       74440054        61710785        0       19843927        20      6       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126592715386        27      31230   31541   311     12729677        12729269        408     0       10008677        2847479 31230      0       31230   0       74440054        61710785        0       19843927        21      6       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126593675368        27      31230   31541   4623    12729677        10845232        1884445 0       10008677        2847887 26607      4623    31230   0       75276608        64431376        0       19843927        21      6       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126595912772        28      31230   31541   0       12729677        12729677        0       0       10008677        2847887 31230   0 31230    0       77161053        64431376        0       19447048        21      6       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126595915096        28      31230   31541   0       12729677        12729677        0       0       10008677        2847887 31230      0       31230   0       77161053        64431376        0       19447048        22      6       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126596550020        28      31230   31541   3948    12729677        10564079        2165598 0       10008677        2720999 27282      3948    31230   0       77716454        67152375        0       19447048        22      6       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126596550695        29      31230   31541   3948    12729677        10564079        2165598 0       10008677        2720999 27282   3948       31230   0       77716454        67152375        0       19058107        22      6       173     346     10008483        20016966        0 16777216 nofree  newobj  0       0       none
end_mark        1424126599572099        29      31231   31542   311     12730084        12729677        407     0       10012627        2720999 31231      0       31231   0       79882052        67152375        0       19058107        22      7       173     346     10008483        20016966  016777216        nofree  newobj  0       0       marking
end_sweep       1424126601548503        29      31541   31542   1973    12856442        12052174        804268  0       10012627        2844344 29568      1973    31541   0       81921598        69869424        0       19058107        22      7       173     346     10008483        20016966  016777216        nofree  newobj  0       0       sweeping
start   1424126602263195        30      31541   31542   0       12856442        12856442        0       0       10012627        2844344 31541   0 31541    0       82725866        69869424        0       18676944        22      7       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126602265341        30      31542   31856   314     12856849        12856442        407     0       10008677        2844344 31542      0       31542   0       82725866        69869424        0       18676944        23      7       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126603175460        30      31542   31856   4854    12856849        10878266        1978583 0       10008677        2976283 26688      4854    31542   0       83595454        72717188        0       18676944        23      7       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126605113748        31      31542   31856   0       12856849        12856849        0       0       10008677        2976283 31542   0 31542    0       85574037        72717188        0       18303405        23      7       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126605115927        31      31543   31857   314     12857257        12856849        408     0       10008677        2976283 31543      0       31543   0       85574037        72717188        0       18303405        24      7       173     346     10008483        20016966  016777216        0       newobj  0       0       marking
end_sweep       1424126605955606        31      31543   31857   4855    12857257        10878238        1979019 0       10008677        2976691 26688      4855    31543   0       86443597        75565359        0       18303405        24      7       173     346     10008483        20016966  016777216        0       newobj  0       0       sweeping
start   1424126607549599        32      31543   31857   0       12857257        12857257        0       0       10008677        2976691 31543   0 31543    0       88422616        75565359        0       17937336        24      7       173     346     10008483        20016966        0       16777216   0       newobj  0       0       none
end_mark        1424126607551824        32      31544   31858   314     12857664        12857257        407     0       10008677        2976691 31544      0       31544   0       88422616        75565359        0       17937336        25      7       173     346     10008483        20016966  016777216        0       newobj  0       0       marking

CPU: 114.08s    Real: 114.44s   RAM: 672072KB

It is hard to read. So I uploaded google apps sheet (https://docs.google.com/spreadsheets/d/1lVbgPWdumC19UMm-n_VxcsRUR9RR6G_E0gVTBxscvzs/edit?usp=sharing).

Introducing first colums:

  • type Event type
  • tick Phisical time
  • count GC count

Event types means:

  • start Beggining of GC
  • end_mark End of making for this GC
  • end_sweep End of sweeping for this GC == End of GC

Btw, there are other events.

Let's see heap_marked_slots numbers. This numbers show how many objects are marked, in other words, numbers of living objects. They are stable for 10.0M. However, the numbers of heap_available_slots, this numbers show how many slots are available, are increasing. Why?

Actually, Ruby's GC extends heap if there are no enough free slots. "Enough free slots" is available slots * 0.3. Let's check with real numbers.

  • available: 11988251
  • marked: 10008677
  • free slots: 11988251 - 10008677 = 1979574
  • free slot ratio: 1979574/11988251 = 0.16...

So it is less than 0.3. So that Ruby's GC decide to extend heap. Extending by RUBY_GC_HEAP_GROWTH_FACTOR=1.01, so that available slots are increased from 11988659 to 12107679 for one GC. It is x1.009, almost x1.01.

To prevent heap extension, we need to prepare available slots * 0.7 > marked objects => available slots > marked objects / 0.7 = marked objects * 1.43. Marked objects are stable with 10.1M objects. So we need to prepare, x1.5 (a bit bigger than x1.43) = 15.15M.

Let's try.

$ RUBY_GC_HEAP_INIT_SLOTS=15150000 /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ruby ../../trunk/test.rb...
CPU: 132.72s    Real: 133.03s   RAM: 770084KB

Checking GC trace result, there are no extension for heap. Congrats!


Wait. 770MB seems too big.

  • Available slots: 15.15M * 40B = 606MB
  • 10M Array = 10M * 8B = 80MB
  • Total: 686MB.

Where remaining 80MB comes from?


Let's check using [valgrind] (http://valgrind.org/). It is very powerful tool to understand the behavior of program. There is a tool named massif to check memory consumption. Let's use it.

Using valgrind/massif (and using my own esoteric script), we can get the following figure. Btw, current ruby doesn't run on valgrind without a trick.

[Fig. result of massif] (http://www.atdot.net/fp_store/f.h4owjn/file.copipa-temp-image.png)

X-axis shows time and Y-axis shows memory consumption (stacking line chart).

Analysis:

We need to check maximum value. Bingo! 600MB + 80MB + 80MB = 760MB, it is almost equals to our previous evaluation.

What is last "gc_mark@gc.c consumes 80 MB in maximum, 50 MB in average"?


Using my own script, we can get more.

[Fig. result of massif / details] (http://www.atdot.net/fp_store/f.foowjn/file.copipa-temp-image.png)

We can find that the function stack_chunk_alloc() consumes 50MB to 80MB. This function is allocator function for marking stack to process marking phase. It is surprising result.

The following pseudo code shows how marking stack used.

def mark_all
  mark_stack = [root_set...]
  while obj = mark_stack.pop
    mark(obj)
    obj.reachable_object{|refered_by_obj|
      mark_stack.push refered_by_obj unless refered_by_obj.marked?
    }
  end
end

In this case, we need to mark an array object contains 10M entries. In this time, 10M entries (strings) are pushed onto the marking stack. Marking stack size will be 8B * 10M = 80MB. It is measured value.

We can solve all of mysteries.


This is demostration how to survey the behavior of Ruby programs. You can use several tools. If you have a trouble, please consider to use them. If you ask us, please give us information, at least, gc_tracer log.

Thanks, Koichi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment