I recently came across the stackprof gem for Ruby 2.1.0 and decided to give it a spin by analyzing a webgen run of the webgen website.
StackProf is a sampling call-stack profiler like Google perftools but built only using functionality available in Ruby 2.1 itself. It is very fast, the overhead is barely noticeable.
The webgen website is probably the most complex webgen website I currently use, uses all (or nearly all) features of webgen and is therefore perfect for the task.
Here are some numbers on pre-optimization performance:
ruby 2.0.0p247 38.50sec
ruby 2.1.0p0 31.00sec
What was quite an astonishment was that Ruby 2.1 is much faster than Ruby 2.0 when it comes to webgen. This may be due to the new garbage collector and webgen creating quite a lot of objects.
After using StackProf and looking at the culprits that took the most time, over-all performance is much better:
ruby 2.0.0p247 33.08sec
ruby 2.1.0p0 26.70sec
So we shaved off about four to five seconds (about 14%) which is quite good!
The main problem was the uri
standard library that is used in central parts of webgen (Webgen::Path
and Webgen::Node
). By caching some expensive operations and optimizing some other parts we achieved this performance gain without much fuss.
Another speed-up came by overriding an (as of yet) un-released method in the kramdown library to cache the normalized versions of pre-defined link definitions (832 link definitions times 115 pages is 95680 invocations, by caching we achieve a 115x speedup!).
StackProf was quite useful when trying to find bad performing methods. I used ruby-prof before but for the use-case at hand StackProf was better and faster for me.