Skip to content
View short_story.md

Начиная с версии 2.2.0 (даже с ее превью), руби (mri) может быть причной падения вашего сервера по ООМ. Моя длительная, многомесячная эпопея с поиском загадочного бага, которую в подробностях я опишу в следующей статье, окончилась написанием патча в руби. Сейчас я хотел бы объяснить суть этого бага и суть патча.

Я бы ничего не нашел без помощи моих друзей - Равиля Байрамгалина, который прошел со мной весь путь, начиная с анализа первых графиков падений и погружения меня во все потаенные тонкости дебага руби приложений, и Владимира Меньшакова, который в C творит чо хочет вообще и помогал мне с дебагом, относящимся к C стороне выполнения руби приложений.

View main.rb
require 'sidekiq/cli'
require 'sidekiq/api'
require 'celluloid'
puts Sidekiq::Stats.new.queues
Sidekiq::Queue.new('default').clear
Sidekiq::RetrySet.new.clear
module Count
View script.sh
# Sort by memsize from first 50 lines of dump (can be just with `cat` command)
head -50 input.json | jq --slurp '[.[] | {address: .address, type: .type, file: .file, line: .line, method: .method, memsize: .memsize }] | sort_by(.memsize) | map([.address, .type, .file, "\(.line)", .method, "\(.memsize)"] | join(" ")) | reverse' | head -20
# Group objects by type-class-struct-name, count them, count sum of their memsize and sort them by count
# Useful when you didn't trace allocations
cat input_file | jq --slurp '[.[] | {type: "\(.type) \(.class) \(.struct) \(.name)", memsize: .memsize}] | group_by(.type) | map(reduce .[] as $item ({type: .[0].type, memsize: .[0].memsize, count: 0}; {type: .type, memsize: (.memsize + $item.memsize), count: (.count + 1)})) | sort_by(.count) | map(["\(.count)", "\(.memsize)", .type] | join(" : ")) | reverse' > output_file
# Group objects by allocation file-line, count them and their sum of memsize and sort by memsize
cat input_file | jq --slurp '[.[] | {address: .address, type:
View Why you should upgrade Celluloid ASAP to 0.17.2 (or downgrade to 0.16.0).md

Long story short, Celluloid versions 0.17+ have a memory leak.

The Reason behind this is that completed Celluloid threads are never cleaned up.

We have discovered that our Sidekiq process is leaking memory when we have a lot of tasks that were failed because of exceptions. Unfortunately, having a lot of failed tasks is specific for our application — we do have a lot of small queued jobs to work with social network APIs and other external services.

You can reproduce the problem with this: https://gist.github.com/gazay/3aa78e515ab05cb79f76

View THE_PROBLEM.md

If run this with ruby test.rb with MRI, Jruby, and in MRI irb: true

If run in Jruby irb: false

I think it can be related in following failing rails tests, but I'm not sure:

  3) Error:
AttrInternalTest#test_naming_format:
NoMethodError: private method `foo=' called for #<#<Class:0x64ea9235>:0x6af12899>
View Errors in rails tests
___________________1__________________
3:activesupport:[master ✗]$ bundle exec rake test --trace
uri:classloader:/jruby/kernel/kernel.rb:28: warning: unsupported exec option: close_others
** Invoke test (first_time)
** Execute test
/Users/alex/code/opensource/jruby/bin/jruby -w -I"lib:test" --dev -I"/Users/alex/.gem/jruby/2.2.2/gems/rake-10.4.2/lib" "/Users/alex/.gem/jruby/2.2.2/gems/rake-10.4.2/lib/rake/rake_test_loader.rb" "test/**/*_test.rb"
/Users/alex/.gem/jruby/2.2.2/gems/minitest-5.3.3/lib/minitest.rb:46: warning: (...) interpreted as grouped expression
uri:classloader:/jruby/bigdecimal.rb:1: warning: loading in progress, circular require considered harmful - bigdecimal.jar
require at org/jruby/RubyKernel.java:966
View another_strange_behavior
0:activesupport:[no_fork_issue]$ jruby -v
jruby 9.0.0.0-SNAPSHOT (2.2.2) 2015-04-21 891f12e Java HotSpot(TM) 64-Bit Server VM 25.31-b07 on 1.8.0_31-b13 +jit [darwin-x86_64]
0:activesupport:[no_fork_issue]$ irb
irb(main):001:0> IO.popen({'PATH' => 'some'}, 'echo $PATH').read
=> "some\n"
[ERROR] Failed to disable echo
java.io.IOException: Cannot run program "sh": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at java.lang.Runtime.exec(Runtime.java:620)
at java.lang.Runtime.exec(Runtime.java:485)
View output
0:activesupport:[no_fork_issue]$ jruby -v
jruby 9.0.0.0-SNAPSHOT (2.2.2) 2015-04-21 891f12e Java HotSpot(TM) 64-Bit Server VM 25.31-b07 on 1.8.0_31-b13 +jit [darwin-x86_64]
0:activesupport:[no_fork_issue]$ jruby -S gem install bundler
RuntimeError:
you might need to reinstall the gem which depends on the missing jar or in case there is Jars.lock then JARS_RESOLVE=true will install the missing jars
no such file to load -- org/yaml/snakeyaml/1.14/snakeyaml-1.14 (LoadError)
do_require at /Users/alex/.rubies/jruby-9.0.0.0-SNAPSHOT/lib/ruby/stdlib/jar_dependencies.rb:261
require_jar at /Users/alex/.rubies/jruby-9.0.0.0-SNAPSHOT/lib/ruby/stdlib/jar_dependencies.rb:207
View reproduction_steps.rb
require 'libxml'
require 'nokogiri'
loop do
nodes = Nokogiri::HTML::DocumentFragment.parse 'z<', 'utf-8'
node = nodes.children.first
node.replace node.content
end
View build
androidbuilder@aa3718480a89:~/android-haskell-activity$ arm-linux-androideabi-cabal build --verbose=3
Using internal setup method with build-type Simple and args:
["build","--verbose=3","--builddir=dist","--jobs=8","--with-gcc=/home/androidbuilder/.ghc/android-14/arm-linux-androideabi-4.8/bin/arm-linux-androideabi-gcc","--with-ghc=/home/androidbuilder/.ghc/android-14/arm-linux-androideabi-4.8/bin/arm-unknown-linux-androideabi-ghc","--with-ghc-pkg=/home/androidbuilder/.ghc/android-14/arm-linux-androideabi-4.8/bin/arm-unknown-linux-androideabi-ghc-pkg","--with-hsc2hs=/home/androidbuilder/.ghc/android-14/arm-linux-androideabi-4.8/bin/arm-unknown-linux-androideabi-hsc2hs","--with-ld=/home/androidbuilder/.ghc/android-14/arm-linux-androideabi-4.8/bin/arm-linux-androideabi-ld","--with-strip=/home/androidbuilder/.ghc/android-14/arm-linux-androideabi-4.8/bin/arm-linux-androideabi-strip","--hsc2hs-option=--cross-compile"]
("/home/androidbuilder/.ghc/android-14/arm-linux-androideabi-4.8/bin/arm-linux-androideabi-gcc",["
Something went wrong with that request. Please try again.