Skip to content

Instantly share code, notes, and snippets.

@NickLaMuro
Last active November 7, 2019 02:38
Show Gist options
  • Save NickLaMuro/3b54c57e1ffeb21316f38529fa93ea67 to your computer and use it in GitHub Desktop.
Save NickLaMuro/3b54c57e1ffeb21316f38529fa93ea67 to your computer and use it in GitHub Desktop.
Testing interval sizes (and hang cases) with Stackprof
# Uncomment below if using this script within the stackprof repo
#
# stackprof_path = File.expand_path File.join("..", "lib"), __FILE__
# $LOAD_PATH.unshift(stackprof_path) unless $LOAD_PATH.include?(stackprof_path)
#
require 'stackprof'
def recurse(num = 1, depth = 1)
if depth == 900
num
else
recurse(num * 2, depth + 1)
end
rescue Interrupt
puts ">>>>> SIGINT: Caller length in #{Process.pid} is #{caller.count}"
puts
exit
end
def test_stackprof test_num, desc, interval = 1000, sleep_length = 5
5.times do |i|
puts "Test #{test_num}, Iteration #{i+1}: #{desc}"
pid = fork do
result = StackProf.run(mode: :wall, interval: interval) {recurse}
puts "child process #{Process.pid} done!"
puts
StackProf::Report.new(result).print_text false, 5
puts
end
sleep sleep_length
# kill the process if needed
begin
puts "killing process #{pid}..."
Process.kill "INT", pid # send SIGINT to pid (if it exists)
Process.wait pid # wait for it to finish "killing"
rescue Errno::ESRCH
# process didn't exist with first `Process.kill`, so move on
end
end
end
# Tests
puts "Current PID: #{Process.pid}"
test_stackprof "1", "Default interval"
test_stackprof "2", "1 micro second inverval, wait 5 seconds", 1
test_stackprof "3", "1 micro second inverval, wait 30 seconds", 1, 30

Run 1: Only using Tests 1 and 2

$ ruby hang_test.rb
Current PID:  12918
Test 1, Iteration 1:  Default interval
child process 12919 done!
killing process 12919...
Test 1, Iteration 2:  Default interval
child process 12920 done!
killing process 12920...
Test 1, Iteration 3:  Default interval
child process 12992 done!
killing process 12992...
Test 1, Iteration 4:  Default interval
child process 12993 done!
killing process 12993...
Test 1, Iteration 5:  Default interval
child process 12996 done!
killing process 12996...
Test 2, Iteration 1:  1 micro second inverval, wait 5 seconds
killing process 13032...
>>>>> SIGINT: Caller length in 13032 is 658
child process 13032 done!
Test 2, Iteration 2:  1 micro second inverval, wait 5 seconds
killing process 13035...
>>>>> SIGINT: Caller length in 13035 is 541
child process 13035 done!
Test 2, Iteration 3:  1 micro second inverval, wait 5 seconds
killing process 13039...
>>>>> SIGINT: Caller length in 13039 is 550
child process 13039 done!
Test 2, Iteration 4:  1 micro second inverval, wait 5 seconds
killing process 13074...
>>>>> SIGINT: Caller length in 13074 is 771
child process 13074 done!
Test 2, Iteration 5:  1 micro second inverval, wait 5 seconds
killing process 13077...
>>>>> SIGINT: Caller length in 13077 is 555
child process 13077 done!

Run 2: Add test 3 (commented out tests 1 and 2)

$ ruby hang_test.rb
Current PID:  13575
Test 3, Iteration 1:  1 micro second inverval, wait 30 seconds
killing process 13576...
>>>>> SIGINT: Caller length in 13576 is 890
child process 13576 done!
Test 3, Iteration 2:  1 micro second inverval, wait 30 seconds
killing process 13725...
>>>>> SIGINT: Caller length in 13725 is 904
child process 13725 done!
Test 3, Iteration 3:  1 micro second inverval, wait 30 seconds
killing process 13805...
>>>>> SIGINT: Caller length in 13805 is 902
child process 13805 done!
Test 3, Iteration 4:  1 micro second inverval, wait 30 seconds
killing process 13918...
>>>>> SIGINT: Caller length in 13918 is 895
child process 13918 done!
Test 3, Iteration 5:  1 micro second inverval, wait 30 seconds
killing process 13995...
>>>>> SIGINT: Caller length in 13995 is 871
child process 13995 done!

Run 3: Drop depth to 900

$ ruby hang_test.rb
Current PID:  14222
Test 3, Iteration 1:  1 micro second inverval, wait 30 seconds
child process 14223 done!
killing process 14223...
Test 3, Iteration 2:  1 micro second inverval, wait 30 seconds
killing process 14300...
>>>>> SIGINT: Caller length in 14300 is 885
child process 14300 done!
Test 3, Iteration 3:  1 micro second inverval, wait 30 seconds
killing process 14413...
>>>>> SIGINT: Caller length in 14413 is 863
child process 14413 done!
Test 3, Iteration 4:  1 micro second inverval, wait 30 seconds
child process 14490 done!
killing process 14490...
Test 3, Iteration 5:  1 micro second inverval, wait 30 seconds
killing process 14567...
>>>>> SIGINT: Caller length in 14567 is 875
child process 14567 done!

Run 4: Extra debugging output added

$ ruby hang_test.rb
Current PID:  32622
Test 3, Iteration 1:  1 micro second inverval, wait 30 seconds
child process 32623 done!

==================================
  Mode: wall(1)
  Samples: 427083 (88.31% miss rate)
  GC: 241 (0.06%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
    426842  (99.9%)      426842  (99.9%)     Object#recurse
       241   (0.1%)         241   (0.1%)     (garbage collection)
    426842  (99.9%)           0   (0.0%)     Object#test_stackprof
    426842  (99.9%)           0   (0.0%)     <main>
    426842  (99.9%)           0   (0.0%)     <main>

killing process 32623...
Test 3, Iteration 2:  1 micro second inverval, wait 30 seconds
child process 32737 done!

==================================
  Mode: wall(1)
  Samples: 367676 (84.44% miss rate)
  GC: 333 (0.09%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
    367343  (99.9%)      367343  (99.9%)     Object#recurse
       333   (0.1%)         333   (0.1%)     (garbage collection)
    367343  (99.9%)           0   (0.0%)     Object#test_stackprof
    367343  (99.9%)           0   (0.0%)     <main>
    367343  (99.9%)           0   (0.0%)     <main>

killing process 32737...
Test 3, Iteration 3:  1 micro second inverval, wait 30 seconds
killing process 32814...
>>>>> SIGINT: Caller length in 32814 is 892

Test 3, Iteration 4:  1 micro second inverval, wait 30 seconds
child process 32931 done!

==================================
  Mode: wall(1)
  Samples: 512515 (90.01% miss rate)
  GC: 231 (0.05%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
    512284 (100.0%)      512284 (100.0%)     Object#recurse
       231   (0.0%)         231   (0.0%)     (garbage collection)
    512284 (100.0%)           0   (0.0%)     Object#test_stackprof
    512284 (100.0%)           0   (0.0%)     <main>
    512284 (100.0%)           0   (0.0%)     <main>

killing process 32931...
Test 3, Iteration 5:  1 micro second inverval, wait 30 seconds
killing process 33008...
>>>>> SIGINT: Caller length in 33008 is 839

Note: I ended up capturing these samples while also testing out the following patch:

diff --git a/ext/stackprof/stackprof.c b/ext/stackprof/stackprof.c
index ae3467d..f434d2c 100644
--- a/ext/stackprof/stackprof.c
+++ b/ext/stackprof/stackprof.c
@@ -29,6 +29,7 @@ static struct {
     int running;
     int raw;
     int aggregate;
+    int in_signal_handler;
 
     VALUE mode;
     VALUE interval;
@@ -78,6 +79,8 @@ stackprof_start(int argc, VALUE *argv, VALUE self)
     if (_stackprof.running)
 	return Qfalse;
 
+    _stackprof.in_signal_handler = 0;
+
     rb_scan_args(argc, argv, "0:", &opts);
 
     if (RTEST(opts)) {
@@ -536,30 +539,27 @@ stackprof_record_gc_samples()
 static void
 stackprof_gc_job_handler(void *data)
 {
-    static int in_signal_handler = 0;
-    if (in_signal_handler) return;
     if (!_stackprof.running) return;
 
-    in_signal_handler++;
     stackprof_record_gc_samples();
-    in_signal_handler--;
+    _stackprof.in_signal_handler--;
 }
 
 static void
 stackprof_job_handler(void *data)
 {
-    static int in_signal_handler = 0;
-    if (in_signal_handler) return;
     if (!_stackprof.running) return;
 
-    in_signal_handler++;
     stackprof_record_sample();
-    in_signal_handler--;
+    _stackprof.in_signal_handler--;
 }
 
 static void
 stackprof_signal_handler(int sig, siginfo_t *sinfo, void *ucontext)
 {
+    if (_stackprof.in_signal_handler) return;
+
+    _stackprof.in_signal_handler++;
     _stackprof.overall_signals++;
     if (rb_during_gc()) {
 	_stackprof.unrecorded_gc_samples++;

As a result, their actually were more samples in the output then after I reverted it an tried again, but that could have been from some bad coding on my part. Effectively I was just checking to see if wrapping the rb_postponed_job_register_one with a global _stackprof.in_signal_handler would have any effect on the process hanging.

Answer... no, sadly.

Run 5: More runs with debugging output, patch removed

$ ruby hang_test.rb
Current PID:  37574
Test 3, Iteration 1:  1 micro second inverval, wait 30 seconds
child process 37575 done!

==================================
  Mode: wall(1)
  Samples: 301107 (84.07% miss rate)
  GC: 176 (0.06%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
    300931  (99.9%)      300931  (99.9%)     Object#recurse
       176   (0.1%)         176   (0.1%)     (garbage collection)
    300931  (99.9%)           0   (0.0%)     Object#test_stackprof
    300931  (99.9%)           0   (0.0%)     <main>
    300931  (99.9%)           0   (0.0%)     <main>

killing process 37575...
Test 3, Iteration 2:  1 micro second inverval, wait 30 seconds
killing process 37688...
>>>>> SIGINT: Caller length in 37688 is 899

Test 3, Iteration 3:  1 micro second inverval, wait 30 seconds
child process 37804 done!

==================================
  Mode: wall(1)
  Samples: 205186 (85.94% miss rate)
  GC: 246 (0.12%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
    204940  (99.9%)      204940  (99.9%)     Object#recurse
       246   (0.1%)         246   (0.1%)     (garbage collection)
    204940  (99.9%)           0   (0.0%)     Object#test_stackprof
    204940  (99.9%)           0   (0.0%)     <main>
    204940  (99.9%)           0   (0.0%)     <main>

killing process 37804...
Test 3, Iteration 4:  1 micro second inverval, wait 30 seconds
killing process 37881...
>>>>> SIGINT: Caller length in 37881 is 852

Test 3, Iteration 5:  1 micro second inverval, wait 30 seconds
child process 38259 done!

==================================
  Mode: wall(1)
  Samples: 305836 (88.43% miss rate)
  GC: 189 (0.06%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
    305647  (99.9%)      305647  (99.9%)     Object#recurse
       189   (0.1%)         189   (0.1%)     (garbage collection)
    305647  (99.9%)           0   (0.0%)     Object#test_stackprof
    305647  (99.9%)           0   (0.0%)     <main>
    305647  (99.9%)           0   (0.0%)     <main>

killing process 38259...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment