Skip to content

Instantly share code, notes, and snippets.

@fedidat
Created May 15, 2018 05:43
Show Gist options
  • Save fedidat/78e57bb2bec87913141d9efd6f285561 to your computer and use it in GitHub Desktop.
Save fedidat/78e57bb2bec87913141d9efd6f285561 to your computer and use it in GitHub Desktop.
TL;DW for https://youtu.be/oH4_unx8eJQ (JVM Internals)
  • There are multiple tiers of compilation: a hot method will first be compiled with a quick, instrumented compile, then (if the method stays hot), it can be optimized further with more expensive compiles.

  • OpenJDK counts method invocations and backedges (loops) to decide when to JIT a block of code. A hot method, a hot loop, or a warm loop in a warm method will get JITed.

  • HotSpot doesn't start measuring until the program has had a little time to start up and begin its normal workload.

  • When JITing a loop, the interpreter's stack frame needs to be translated since the memory layout used by the compiled code may be different.

  • The earlier tiers of compilation generate various levels of instrumentation (which has overhead) to decide whether it's worth continuing to measure and optimize and also to determine what speculative optimizations may be useful (if a virtual call always goes to the same target, that target could be inlined).

  • The compiler generally runs in another thread while the code is still being interpreted.

  • A caller will look up the compiled instance of a function and cache the reference. When a new version of the function is compiled, the JIT will atomically modify the beginning of the old function so that it causes the caller to discard its cached reference and look up the new version.

  • If the JIT makes some assumptions about the code's behavior and an assumption is violated, execution falls back to the interpreter (the code is deoptimized). After more measurement, the code may be compiled again.

  • When compiling Java source code, you get a lot of null-checks and array bounds checks, which can often be optimized out.

  • The Java caller always checks that it isn't calling a method on null, so the callee doesn't need to verify "this != null".

  • The compiler keeps track of the range of possible values for a variable, allowing it to optimize away some bounds checks.

  • a >= b is changed to the canonical form !(a < b), which makes it easier to find common subexpressions.

  • Some null-checks can be avoided by allowing the code to segfault if a pointer is null, then handling the segfault and generating a NullPointerException. If the code generates lots of NullPointerExceptions, the JVM will deoptimize.

  • If code throws a lot of exceptions, the JIT can optimize for that. It can re-use an exception object and avoid generating a stack trace every time.

  • When the garbage collector wants to stop all the threads, it unmaps a certain page of memory. The compiled code (and interpreter) periodically read from that memory and segfault if the GC wants them to stop. For the common case, this is faster than a branch.

  • One strategy for optimizing loops: "peel" the first iteration of the loop off and execute it before entering the rest of the loop. This helps hoist conditions out of the loop: a conditional inside the loop can be checked during the first iteration, then the result can be re-used inside the rest of the loop.

  • If a variable is not declared "volatile", the compiler is allowed to assume that other threads won't change it.

  • If a variable is declared "final", it may still be changed via reflection.

  • Inlining called functions helps the compiler identify cases where the callee isn't doing evil things like reflection. Inlining also allows constant-folding and other analysis.

  • javac intentionally doesn't optimize much - that's the JIT's job.

  • HotSpot won't JIT large methods (>8000 bytes of bytecode). Small methods (up to ~325 bytes) can be inlined by the JIT, so write many small functions.

  • for (var: collection) keeps a local reference to collection, so the JIT doesn't need to worry about its value changing between loop iterations.

  • Sometimes the JIT needs to compile code that references a class that hasn't even been loaded (and might not even be loadable). For dynamic cases like this, the JITed code generates an "uncommon trap" to fall back to the interpreter. Data on the stack must be translated back to the format expected by the interpreter. All new callers (even in other threads) will be redirected to the interpreter.

  • To speculatively convert a dynamic function call into something static, HotSpot has 4 strategies: static analysis, class hierarchy analysis (using assumptions that can be broken if more classes are loaded), TypeProfile measured from less-optimized execution of the code, and Unique Concrete Method.

  • If only one possible callee of the right type is loaded, HotSpot just inlines it without checking. If another possible callee is loaded later, there's a stop-the-world operation to deoptimize the methods. This even affects threads with the method on their stack: when they return into the method, they need to deoptimize.

  • If TypeProfile finds most calls go to the same target, the call can be rewritten to explicitly check the type and use the inlined code. If the type doesn't match, it deoptimizes the code. It doesn't try to dynamically call because the callee could do evil things with reflection and violate assumptions in the optimized codepath. 90% to 95% of calls go to one or two types.

  • Occasionally you can make code more JIT-friendly by disallowing troublesome inputs and explicitly checking for them early in your code so that the JIT doesn't need to emit code that handles those cases.

  • Many optimizations are like regular expressions: they match some pattern in the code and handle it. HotSpot optimizes for common patterns. Small functions with immutable values and local variables (untouchable by other functions and threads) are JIT-friendly. Native methods aren't JIT-friendly because the JIT can't make many assumptions about what they might do.

  • The compiler typically does a lot of work when the program is warming up, then stops using CPU as all the hot spots are compiled, but there's still some memory overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment