Skip to content

Instantly share code, notes, and snippets.

@coderplay
Last active April 11, 2018 15:56
Show Gist options
  • Save coderplay/c2e2dd89e33c302b581716f5758245c4 to your computer and use it in GitHub Desktop.
Save coderplay/c2e2dd89e33c302b581716f5758245c4 to your computer and use it in GitHub Desktop.
user/kernel space c++ method tracing

frequent YGC.

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 535K of event 'cycles'
# Event count (approx.): 1819606527710
#
# Children      Self  Command         Shared Object          Symbol
# ........  ........  ..............  .....................  ............................................................................................................
#
    25.03%     0.00%  java            libjvm.so              [.] java_start
            |
            ---java_start
               |
               |--13.78%--GangWorker::loop
               |          |
               |           --13.56%--G1ParTask::work
               |                     |
               |                     |--10.38%--G1ParEvacuateFollowersClosure::do_void
               |                     |          |
               |                     |          |--8.76%--G1ParScanThreadState::trim_queue
               |                     |          |          |
               |                     |          |           --4.66%--G1ParScanThreadState::copy_to_survivor_space
               |                     |          |                     |
               |                     |          |                     |--1.81%--__memcpy_ssse3_back
               |                     |          |                     |
               |                     |          |                      --0.81%--InstanceKlass::oop_oop_iterate_backwards_nv
               |                     |          |
               |                     |           --1.26%--GenericTaskQueueSet<OverflowTaskQueue<StarTask, (MemoryType)5, 131072u>, (MemoryType)5>::steal_best_of_2
               |                     |
               |                      --3.17%--G1RootProcessor::scan_remembered_sets
               |                                G1RemSet::oops_into_collection_set_do
               |                                |
               |                                |--1.99%--G1RemSet::scanRS
               |                                |          |
               |                                |           --1.99%--G1CollectedHeap::collection_set_iterate_from
               |                                |                     |
               |                                |                      --1.99%--ScanRSClosure::doHeapRegion
               |                                |                                |
               |                                |                                |--1.15%--DirtyCardToOopClosure::do_MemRegion
               |                                |                                |          |
               |                                |                                |           --0.51%--HeapRegionDCTOC::walk_mem_region
               |                                |                                |
               |                                |                                 --0.67%--HeapRegionRemSetIterator::has_next
               |                                |
               |                                 --1.18%--G1CollectedHeap::iterate_dirty_card_closure
               |                                           |
               |                                            --1.16%--DirtyCardQueueSet::apply_closure_to_completed_buffer
               |                                                      |
               |                                                       --1.16%--DirtyCardQueueSet::apply_closure_to_completed_buffer_helper
               |                                                                 |
               |                                                                  --1.16%--RefineRecordRefsIntoCSCardTableEntryClosure::do_card_ptr
               |                                                                            |
               |                                                                             --1.14%--G1RemSet::refine_card
               |                                                                                       |
               |                                                                                        --0.80%--HeapRegion::oops_on_card_seq_iterate_careful
               |
                --11.25%--ConcurrentG1RefineThread::run
                          |
                           --11.25%--DirtyCardQueueSet::apply_closure_to_completed_buffer
                                     |
                                      --11.24%--DirtyCardQueueSet::apply_closure_to_completed_buffer_helper
                                                |
                                                 --11.22%--RefineCardTableEntryClosure::do_card_ptr
                                                           |
                                                            --11.06%--G1RemSet::refine_card
                                                                      |
                                                                       --9.87%--HeapRegion::oops_on_card_seq_iterate_careful
                                                                                 |
                                                                                 |--3.96%--ObjArrayKlass::oop_oop_iterate_nv_m
                                                                                 |          |
                                                                                 |           --3.39%--OtherRegionsTable::add_reference
                                                                                 |
                                                                                 |--2.33%--G1BlockOffsetArray::forward_to_block_containing_addr_slow
                                                                                 |          |
                                                                                 |           --1.21%--HeapRegion::block_size
                                                                                 |                     |
                                                                                 |                      --0.51%--HeapRegion::block_is_obj
                                                                                 |
                                                                                 |--1.97%--InstanceKlass::oop_oop_iterate_nv
                                                                                 |          |
                                                                                 |           --1.29%--OtherRegionsTable::add_reference
                                                                                 |
                                                                                  --0.67%--G1BlockOffsetArrayContigSpace::block_start_unsafe

    25.03%     0.00%  java            libpthread-2.17.so     [.] start_thread
            |
            ---start_thread
               java_start
               |
               |--13.78%--GangWorker::loop
               |          |
               |           --13.56%--G1ParTask::work
               |                     |
               |                     |--10.38%--G1ParEvacuateFollowersClosure::do_void
               |                     |          |
               |                     |          |--8.76%--G1ParScanThreadState::trim_queue
               |                     |          |          |
               |                     |          |           --4.66%--G1ParScanThreadState::copy_to_survivor_space
               |                     |          |                     |
               |                     |          |                     |--1.81%--__memcpy_ssse3_back
               |                     |          |                     |
               |                     |          |                      --0.81%--InstanceKlass::oop_oop_iterate_backwards_nv
               |                     |          |
               |                     |           --1.26%--GenericTaskQueueSet<OverflowTaskQueue<StarTask, (MemoryType)5, 131072u>, (MemoryType)5>::steal_best_of_2
               |                     |
               |                      --3.17%--G1RootProcessor::scan_remembered_sets
               |                                G1RemSet::oops_into_collection_set_do
               |                                |
               |                                |--1.99%--G1RemSet::scanRS
               |                                |          |
               |                                |           --1.99%--G1CollectedHeap::collection_set_iterate_from
               |                                |                     |
               |                                |                      --1.99%--ScanRSClosure::doHeapRegion
               |                                |                                |
               |                                |                                |--1.15%--DirtyCardToOopClosure::do_MemRegion
               |                                |                                |          |
               |                                |                                |           --0.51%--HeapRegionDCTOC::walk_mem_region
               |                                |                                |
               |                                |                                 --0.67%--HeapRegionRemSetIterator::has_next
               |                                |
               |                                 --1.18%--G1CollectedHeap::iterate_dirty_card_closure
               |                                           |
               |                                            --1.16%--DirtyCardQueueSet::apply_closure_to_completed_buffer
               |                                                      |
               |                                                       --1.16%--DirtyCardQueueSet::apply_closure_to_completed_buffer_helper
               |                                                                 |
               |                                                                  --1.16%--RefineRecordRefsIntoCSCardTableEntryClosure::do_card_ptr
               |                                                                            |
               |                                                                             --1.14%--G1RemSet::refine_card
               |                                                                                       |
               |                                                                                        --0.80%--HeapRegion::oops_on_card_seq_iterate_careful
               |
                --11.25%--ConcurrentG1RefineThread::run
                          |
                           --11.25%--DirtyCardQueueSet::apply_closure_to_completed_buffer
                                     |
                                      --11.24%--DirtyCardQueueSet::apply_closure_to_completed_buffer_helper
                                                |
                                                 --11.22%--RefineCardTableEntryClosure::do_card_ptr
                                                           |
                                                            --11.06%--G1RemSet::refine_card
                                                                      |
                                                                       --9.87%--HeapRegion::oops_on_card_seq_iterate_careful
                                                                                 |
                                                                                 |--3.96%--ObjArrayKlass::oop_oop_iterate_nv_m
                                                                                 |          |
                                                                                 |           --3.39%--OtherRegionsTable::add_reference
                                                                                 |
                                                                                 |--2.33%--G1BlockOffsetArray::forward_to_block_containing_addr_slow
                                                                                 |          |
                                                                                 |           --1.21%--HeapRegion::block_size
                                                                                 |                     |
                                                                                 |                      --0.51%--HeapRegion::block_is_obj
                                                                                 |
                                                                                 |--1.97%--InstanceKlass::oop_oop_iterate_nv
                                                                                 |          |
                                                                                 |           --1.29%--OtherRegionsTable::add_reference
  ...

Oracle hotspot jvm 1.8.0u144

$ sudo perf probe -v  -x /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so --funcs | head -n 10
(anonymous namespace)::get_global()
ASConcurrentMarkSweepGeneration::cms_size_policy() const
ASConcurrentMarkSweepGeneration::gc_adaptive_policy_counters()
ASConcurrentMarkSweepGeneration::kind()
ASConcurrentMarkSweepGeneration::resize(unsigned long, unsigned long)
ASConcurrentMarkSweepGeneration::short_name() const
ASConcurrentMarkSweepGeneration::shrink_by(unsigned long)
ASConcurrentMarkSweepGeneration::update_counters()
ASConcurrentMarkSweepGeneration::update_counters(unsigned long)
ASConcurrentMarkSweepPolicy::initialize_gc_policy_counters()

function to be traced -- jint JvmtiExport::load_agent_library(AttachOperation* op, outputStream* st)

Check existence

$ sudo perf probe -v  -x /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so --funcs | grep load_agent
JvmtiExport::load_agent_library(AttachOperation*, outputStream*)

First attempt to add probe


 sudo perf probe -x /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so --add 'JvmtiExport::load_agent_library(AttachOperation*, outputStream*)'
Semantic error :There is non-digit char in line number.

...

Try mangled symbol instead

$ readelf -s --wide /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so | grep load_agen
 39207: 000000000079cd00  1389 FUNC    LOCAL  HIDDEN    13 _ZN11JvmtiExport18load_agent_libraryEP15AttachOperationP12outputStream

$ sudo perf probe -x /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so --add '_ZN11JvmtiExport18load_agent_libraryEP15AttachOperationP12outputStream'
Failed to find symbol _ZN11JvmtiExport18load_agent_libraryEP15AttachOperationP12outputStream in /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
  Error: Failed to add events.

3rd attempt, add probe through function address. The address got from readelf or objdump

$ objdump -tT /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so | grep _ZN11JvmtiExport18load_agent_libraryEP15AttachOperationP12outputStream | awk '{print $1}'
000000000079cd00
$ sudo perf probe -x /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so --add '0x000000000079cd00'
Added new event:
  probe_libjvm:abs_79cd00 (on 0x000000000079cd00 in /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so)

You can now use it in all perf tools, such as:

        perf record -e probe_libjvm:abs_79cd00 -aR sleep 1

Now try this event

$ sudo perf record -e probe_libjvm:abs_79cd00 -a -g
Lowering default frequency rate to 750.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
 sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2  of event 'probe_libjvm:abs_79cd00'
# Event count (approx.): 2
#
# Children      Self  Trace output
# ........  ........  ..............
#
   100.00%   100.00%  (7f304aad6d00)
            |
            ---start_thread
               java_start
               JavaThread::run
               JavaThread::thread_main_inner
               JvmtiExport::load_agent_library

system wide tracing which file is being read

sudo perf probe 'vfs_read file->f_path.dentry->d_name.name:string'
  1. find the address of trace point
objdump -tT  $JAVA_HOME/jre/lib/amd64/server/libjvm.so | grep gc_begin

00000000008982e0 l     F .text  00000000000000ee              .hidden _ZN15GCMemoryManager8gc_beginEbbb
0000000000ac7970 l     F .text  0000000000000006              .hidden _ZN15VM_GC_Operation15notify_gc_beginEb
000000000089baf0 l     F .text  0000000000000079              .hidden _ZN13MemoryService8gc_beginEbbbbb
  1. add tracepoint to perf
sudo perf probe -x /$JAVA_HOME/jre/lib/amd64/server/libjvm.so --add '0x00000000008982e0'
Added new event:
  probe_libjvm:abs_8982e0 (on 0x00000000008982e0 in $JAVA_HOME/jre/lib/amd64/server/libjvm.so)

You can now use it in all perf tools, such as:

        perf record -e probe_libjvm:abs_8982e0 -aR sleep 1

3.start gc_begin event recording

$ sudo perf record -e probe_libjvm:abs_8982e0 -a -g
^C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.304 MB perf.data (9 samples) ]
  1. show the call stack
$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9  of event 'probe_libjvm:abs_8982e0'
# Event count (approx.): 9
#
# Children      Self  Trace output
# ........  ........  ..............
#
   100.00%   100.00%  (7fa71870d2e0)
            |
            ---start_thread
               java_start
               VMThread::run
               VMThread::loop
               VMThread::evaluate_operation
               VM_Operation::evaluate
               VM_GenCollectForAllocation::doit
               GenCollectorPolicy::satisfy_failed_allocation
               GenCollectedHeap::do_collection
               GCMemoryManager::gc_begin

with debuginfo

sudo perf probe -x /lib64/libc-2.17.so  --add  fwrite
sudo  perf record -e probe_libc:fwrite -a -g  -p $java_pid
$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 297  of event 'probe_libc:fwrite'
# Event count (approx.): 297
#
# Children      Self  Trace output
# ........  ........  ..............
#
   100.00%   100.00%  (7f112397bf50)
            |
            ---start_thread
               java_start
               VMThread::run
               VMThread::loop
               |
               |--55.22%--VMThread::evaluate_operation
               |          VM_Operation::evaluate
               |          VM_G1IncCollectionPause::doit
               |          G1CollectedHeap::do_collection_pause_at_safepoint
               |          |
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 297  of event 'probe_libc:fwrite'
# Event count (approx.): 297
#
# Children      Self  Trace output
# ........  ........  ..............
#
   100.00%   100.00%  (7f112397bf50)
            |
            ---start_thread
               java_start
               VMThread::run
               VMThread::loop
               |
               |--55.22%--VMThread::evaluate_operation
               |          VM_Operation::evaluate
               |          VM_G1IncCollectionPause::doit
               |          G1CollectedHeap::do_collection_pause_at_safepoint
               |          |
               |          |--36.36%--G1CollectedHeap::log_gc_footer
               |          |          |
               |          |          |--32.32%--G1GCPhaseTimes::print
               |          |          |          |
               |          |          |          |--16.16%--G1GCParPhasePrinter::print_multi_length
               |          |          |          |          LineBuffer::append_and_print_cr
               |          |          |          |          outputStream::print_cr
               |          |          |          |          _IO_fwrite
               |          |          |          |
               |          |          |          |--9.43%--G1GCPhaseTimes::print_stats
               |          |          |          |          LineBuffer::append_and_print_cr
               |          |          |          |          outputStream::print_cr
               |          |          |          |          _IO_fwrite
               |          |          |          |
               |          |          |          |--5.39%--G1GCPhaseTimes::print_stats
               |          |          |          |          LineBuffer::append_and_print_cr
               |          |          |          |          outputStream::print_cr
               |          |          |          |          _IO_fwrite
               |          |          |          |
               |          |          |           --1.35%--G1GCPhaseTimes::print_stats
               |          |          |                     LineBuffer::append_and_print_cr
               |          |          |                     outputStream::print_cr
               |          |          |                     _IO_fwrite
               |          |          |
               |          |          |--1.35%--outputStream::print_cr
               |          |          |          _IO_fwrite
               |          |          |
               |          |          |--1.35%--G1CollectorPolicy::print_detailed_heap_transition
               |          |          |          outputStream::print
               |          |          |          _IO_fwrite
               |          |          |
               |          |           --1.35%--_IO_fwrite
               |          |
               |          |--9.43%--G1CollectedHeap::log_gc_header
               |          |          |
               |          |          |--8.08%--outputStream::gclog_stamp
               |          |          |          _IO_fwrite
               |          |          |
               |          |           --1.35%--outputStream::print
               |          |                     _IO_fwrite
               |          |
               |          |--6.73%--G1CollectorPolicy::record_collection_pause_start
               |          |          ageTable::compute_tenuring_threshold
               |          |          |
               |          |          |--5.39%--outputStream::print_cr
               |          |          |          _IO_fwrite
               |          |          |
               |          |           --1.35%--_IO_fwrite
               |          |
               |           --2.69%--TraceCPUTime::~TraceCPUTime
               |                     |
               |                     |--1.35%--outputStream::print
               |                     |          _IO_fwrite
               |                     |
               |                      --1.35%--_IO_fwrite
               |
                --44.78%--SafepointSynchronize::end
                          RuntimeService::record_safepoint_end
                          |
                          |--19.19%--outputStream::date_stamp
                          |          _IO_fwrite
                          |
                          |--19.19%--outputStream::stamp
                          |          _IO_fwrite
                          |
                           --6.40%--outputStream::print_cr
                                     _IO_fwrite

$ objdump -tTC $JAVA_HOME/jre/lib/amd64/server/libjvm.so | grep GCCause::to_string
00000000005d2fd0 l F .text 00000000000001c9 .hidden GCCause::to_string(GCCause::Cause)
$ sudo perf probe -x $JAVA_HOME/jre/lib/amd64/server/libjvm.so '0x00000000005d2fd0%return +0($retval):string'
Added new event:
probe_libjvm:abs_5d2fd0 (on 0x00000000005d2fd0%return in $omit_java_home/jre/lib/amd64/server/libjvm.so with +0($retval):string)
You can now use it in all perf tools, such as:
perf record -e probe_libjvm:abs_5d2fd0 -aR sleep 1
$ sudo perf record -e probe_libjvm:abs_5d2fd0 -a
$ sudo perf script
java 5260 [000] 2190542.672893: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce9144f4d) arg1="No GC"
java 5260 [000] 2190542.672898: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce9144f61) arg1="Allocation Failure"
java 5260 [000] 2190542.672903: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce8b0464d) arg1="Allocation Failure"
java 5260 [000] 2190542.672906: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce8b04663) arg1="Allocation Failure"
java 5260 [000] 2190542.672930: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce8c6825c) arg1="Allocation Failure"
java 5260 [000] 2190542.672962: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce8fbf214) arg1="Allocation Failure"
java 5260 [000] 2190542.692693: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce8b046bf) arg1="Allocation Failure"
java 5260 [000] 2190542.692697: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce8b046d3) arg1="Allocation Failure"
java 5260 [000] 2190542.692699: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce9144f9f) arg1="Allocation Failure"
java 5260 [000] 2190542.692701: probe_libjvm:abs_5d2fd0: (7f0ce8c4ffd0 <- 7f0ce9144fb3) arg1="No GC"

Trace the 2nd argument of JNIEXPORT jint JNICALL Agent_OnAttach(JavaVM* vm, char *args, void * reserved)

download perf-tools from https://github.com/brendangregg/perf-tools/

$ sudo bin/uprobe 'p:/usr/java/jdk1.8.0_144/jre/lib/amd64/libinstrument.so:Agent_OnAttach args=+0(%si):string'
Tracing uprobe Agent_OnAttach (p:Agent_OnAttach /usr/java/jdk1.8.0_144/jre/lib/amd64/libinstrument.so:0x27b0 args=+0(%si):string). Ctrl-C to end.
           <...>-61491 [012] d... 8823844.123721: Agent_OnAttach: (0x7f0f63df87b0) args="/home/ec2-user/.greys/lib/1.7.6.6/greys/greys-agent.jar=/home/ec2-user/.greys/lib/1.7.6.6/greys/greys-core.jar;;targetIp=127.0.0.1;greysCore=/home/ec2-user/.greys/lib/1.7.6.6/greys/greys-core.jar;greysAgent=/home/ec2-user/.greys/lib/1.7.6.6/greys/greys-agent.jar;connectTimeout=6000;targetPort=3658;javaPid=61228;"

For more complex parameters, like struct pointer, object pointer. Dumping the object layout first, this can be done through clang. https://eli.thegreenplace.net/2012/12/17/dumping-a-c-objects-memory-layout-with-clang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment