Skip to content

Instantly share code, notes, and snippets.

@shipilev
Last active August 13, 2018 19:35
Show Gist options
  • Save shipilev/8038826 to your computer and use it in GitHub Desktop.
Save shipilev/8038826 to your computer and use it in GitHub Desktop.
How to stop tweeting and investigate HotSpot
*** Figuring out what "no vm operation" means here in -XX:+PrintSafepointStatistics:
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.320: Deoptimize [ 7 0 0 ] [ 0 0 0 0 0 ] 0
1.320: no vm operation [ 8 1 1 ] [ 0 0 0 0 0 ] 1
4.061: EnableBiasedLocking [ 8 1 1 ] [ 0 0 0 0 0 ] 1
5.222: RevokeBias [ 8 0 1 ] [ 0 0 0 0 0 ] 0
5.831: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
6.440: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
7.049: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
7.659: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
8.267: RevokeBias [ 8 0 1 ] [ 0 0 0 0 0 ] 0
8.875: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
9.482: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
10.091: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
10.700: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
11.308: RevokeBias [ 8 0 0 ] [ 0 0 0 0 0 ] 0
11.354: no vm operation [ 6 1 1 ] [ 0 0 0 0 0 ] 0
1) Getting our hands dirty! Let's patch the VM to figure out what is going on.
First of, we figure out that "no vm operation" is printed when vm_operation() is NULL.
So, let's proactively crash the VM on that condition. Second, let's see all the uses
of the safepoint enter sequence, and mark them up to figure out which one leads to NULL:
diff -r 5832cdaf89c6 src/share/vm/runtime/safepoint.cpp
--- a/src/share/vm/runtime/safepoint.cpp Mon Dec 16 08:24:33 2013 -0500
+++ b/src/share/vm/runtime/safepoint.cpp Thu Dec 19 16:38:40 2013 +0400
@@ -382,6 +382,10 @@
// Update the count of active JNI critical regions
GC_locker::set_jni_lock_count(_current_jni_active_count);
+ if (VMThread::vm_operation() == NULL) {
+ ShouldNotReachHere();
+ }
+
if (TraceSafepoint) {
VM_Operation *op = VMThread::vm_operation();
tty->print_cr("Entering safepoint region: %s", (op != NULL) ? op->name() : "no vm operation");
diff -r 5832cdaf89c6 src/share/vm/runtime/vmThread.cpp
--- a/src/share/vm/runtime/vmThread.cpp Mon Dec 16 08:24:33 2013 -0500
+++ b/src/share/vm/runtime/vmThread.cpp Thu Dec 19 16:38:40 2013 +0400
@@ -285,6 +285,7 @@
}
// 4526887 let VM thread exit at Safepoint
+ tty->print("MARK 1: %s\n", vm_operation() == NULL ? "null" : vm_operation()->name());
SafepointSynchronize::begin();
if (VerifyBeforeExit) {
@@ -453,6 +454,7 @@
// 'GuaranteedSafepointInterval' milliseconds. This will run all
// the clean-up processing that needs to be done regularly at a
// safepoint
+ tty->print("MARK 2: %s\n", vm_operation() == NULL ? "null" : vm_operation()->name());
SafepointSynchronize::begin();
#ifdef ASSERT
if (GCALotAtAllSafepoints) InterfaceSupport::check_gc_alot();
@@ -493,6 +495,7 @@
_vm_queue->set_drain_list(safepoint_ops); // ensure ops can be scanned
+ tty->print("MARK 3: %s\n", vm_operation() == NULL ? "null" : vm_operation()->name());
SafepointSynchronize::begin();
evaluate_operation(_cur_vm_operation);
// now process all queued safepoint ops, iteratively draining
@@ -574,6 +577,7 @@
bool max_time_exceeded = GuaranteedSafepointInterval != 0 && (interval > GuaranteedSafepointInterval);
if (SafepointALot || max_time_exceeded) {
HandleMark hm(VMThread::vm_thread());
+ tty->print("MARK 4: %s\n", vm_operation() == NULL ? "null" : vm_operation()->name());
SafepointSynchronize::begin();
SafepointSynchronize::end();
}
@@ -662,6 +666,7 @@
_cur_vm_operation = op;
if (op->evaluate_at_safepoint() && !SafepointSynchronize::is_at_safepoint()) {
+ tty->print("MARK 5: %s\n", vm_operation() == NULL ? "null" : vm_operation()->name());
SafepointSynchronize::begin();
op->evaluate();
SafepointSynchronize::end();
2) Build the VM, run the any long workload:
# Fork: N/A, test runs in the existing VM
# Warmup: 1 iterations, 100 ms each
# Measurement: 5 iterations, 100 ms each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Running: org.openjdk.jmh.benchmarks.BlackholeConsumeCPUBench.consume_00001
# Warmup Iteration 1: 2.304 ns/op
Iteration 1: 1.641 ns/op
Iteration 2: MARK 2: null
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc: SuppressErrorAt=/safepoint.cpp:386
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/home/shade/trunks/hotspot-rt/hotspot/src/share/vm/runtime/safepoint.cpp:386), pid=8456, tid=140425471211264
# Error: ShouldNotReachHere()
#
# JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0-internal-fastdebug-shade_2013_12_19_15_53-b00)
# Java VM: OpenJDK 64-Bit Server VM (25.0-b64-fastdebug mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/shade/projects/jmh/hs_err_pid8456.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
Current thread is 140425471211264
Dumping core ...
Aborted (core dumped)
3) Got it, this is our assert. The last VM operation before null was "MARK 2".
Looking back at HotSpot code, we are stumbled with this block:
if (timedout && (SafepointALot ||
SafepointSynchronize::is_cleanup_needed())) {
MutexUnlockerEx mul(VMOperationQueue_lock,
Mutex::_no_safepoint_check_flag);
// Force a safepoint since we have not had one for at least
// 'GuaranteedSafepointInterval' milliseconds. This will run all
// the clean-up processing that needs to be done regularly at a
// safepoint
tty->print("MARK 2: %s\n", vm_operation() == NULL ? "null" : vm_operation()->name());
SafepointSynchronize::begin();
#ifdef ASSERT
if (GCALotAtAllSafepoints) InterfaceSupport::check_gc_alot();
#endif
SafepointSynchronize::end();
}
4) Read out the comment, understand this is the collateral clean up VM op.
5) Let's confirm with -XX:GuaranteedSafepointInterval=90000000.
Done, no "no vm operation"-s in the log.
@gvsmirnov
Copy link

Gosh! Shouldn't it be reflected in the logs properly?

(Go ahead and submit a patch, Gleb)

@shipilev
Copy link
Author

Go ahead and submit a patch, Gleb.

@cheremin
Copy link

Actually, real cleanup is in
void SafepointSynchronize::do_cleanup_tasks() {
jlong cleanup_time;

// Update fat-monitor pool, since this is a safepoint.
if (TraceSafepoint) {
cleanup_time = os::javaTimeNanos();
}

ObjectSynchronizer::deflate_idle_monitors();
InlineCacheBuffer::update_inline_caches();
if(UseCounterDecay && CounterDecay::is_decay_needed()) {
CounterDecay::decay();
}
NMethodSweeper::sweep();

if (TraceSafepoint) {
tty->print_cr("do_cleanup_tasks takes "INT64_FORMAT_W(6) "ms",
(os::javaTimeNanos() - cleanup_time) / MICROUNITS);
}
}

I see monitor deflation -- ok, understandable.

Clearing "inline caches" (http://code.metager.de/source/xref/openjdk/jdk6/hotspot/src/share/vm/code/icBuffer.cpp), and NMethodSweeper::sweep (http://cr.openjdk.java.net/~stefank/6989984.1/raw_files/old/src/share/vm/runtime/sweeper.cpp) seems like it manages JIt-ed methods blobs

By any way, GuaranteedSafepointInterval=1000 by default -- do we really need monitor deflation and JIT cleanup so often?

@gvsmirnov
Copy link

Go ahead and submit a patch, @cheremin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment