#JIT Compiler Tuning
##Hotspot Compilation Mechanism Selection
- -client - client compiler (C1) - begins compiling earlier -> optimize startup time. default for 32 bit OS, choose if heap < 3GB
- -server - server compiler (C2)- better optimize perf for long-running apps. 2 subtypes: -server 32-bit: 5-20% faster, but total process size must be < 4GB -d64 64-bit: (DEFAULT on modern machines)
- -XX:+TieredCompilation - tiered compilation (combines both client and server) - the best perf, but requires more native memory for extra compiled code. DEFAULT.
##Advanced JIT Tuning:
- -XX:ReservedCodeCacheSize= Reserves space - Use with tiered compilation, or on depleted code cache warning. Default=240 MB
- -XX:InitialCodeCacheSize= reallocate initial space - uncommon. -XX:CompileThreshold= Sets #times a method/loop is executed before compiling it -> cause more methods to be compiled,& sooner. Default = 10,000 before OSR (on stack replacement)
- -XX:+PrintCompilation - diagnostic log of JIT compiler operations - Inspect the compilation eg check if an important method is being compiled
- -XX:+CICompilerCount= - Sets #threads used by the JIT compiler. If too many compiler threads are being started (eg when running multiple JVMs on a machine).
#Garbage Collection Tuning consider GC tuning if GCLog indicates >= 4% time in GC. first tweak desired pause time, then increase heap size (which may increase pause time) or young gen ratio.
##Sizing the Heap heap size spacetime tradeoff: smaller -> more frequent GC, larger -> longer pauses. sum of heap size for all JVMs must be smaller than physical memory - 1GB (do not include swapfile size !!! eg concurrent mode failure). set initial & max heap size -> JVM can tune itself according to workload. Instead, specify perform goals for GC algorithm: tolerable pause times, %time to spend in GC rule of thumb: size heap to be 30% occupied after full GC. (force with jcmd & observe how much memory is used afterwards).
- -Xms - initial heap size - default [linux:Min(512MB, 1/64 RAM), osx:64MB]
- -Xmx - max heap size - default [linux: min(32GB, 1/4 RAM) osx: min (1GB, 1/4 RAM)]
##GC Collection Goal Hints
- -XX:MaxGCPauseMillis=N - how long pauses should be. Main tuning point. Default=200ms.be realistic. takes precedence.
- -XX:GCTimeRatio=N - how much time to spend in GC. = Throughput (1 - Throughput) default=99 = 1.95% of the time. For a throughput goal of 95% (0.95), this equation yields a GCTimeRatio of 19 -XX:-AggressiveHeap - Enables set of tuning flags optimized for high memory machines running a single JVM with a large heap.
##GC algorithm Selection ###Serial
- -XX:+UseSerialGC - simple, single-threaded GC algorithm. default for client JIT Compiler. use for single CPU, < 100MB Heap.
###Throughput (parallel) DEFAULT for server. use for best AVERAGE response time, if app can tolerate small full GC pauses.
- -XX:+UseParallelOldGC - Uses multiple threads to collect OLD gen while app threads are stopped. When app can tolerate occasional long pauses, maximize throughput while minimizing CPU usage.
- -XX:+UseParallelGC - Uses multiple threads to collect the YOUNG gen while app threads are stopped. Use with UseParallelOldGC.
###CMS (Concurrent Mark & Sweep) scan without pausing, uses more CPU - minimize pauses on response times. uses background thread to periodically scan OLD Gen & discard unused objects. only short pauses during minor GC. heap can fragment - no compaction. Young gen is never resized unless a full GC occurs. CMS aims to never have a full collection --> never resize its young gen (if tuned correctly). Concurrent mode failures (CMF)- concurrent collection of tenured ge did not finish before the tenured gen became full. to avoid: increace heap size, frequency via CMSInitiatingOccupancyFraction, #background threads
- -XX:+UseConcMarkSweepGC - Uses BACKGROUND thread(s) to collect OLD gen with minimal pauses. short GC pauses, but requires extra core for background thread, suitable for a relatively small heap.
- -XX:+UseParNewGC - Uses multiple threads to collect young gen while app threads are stopped. Use with UseConcMarkSweepGC.
###G1 (Garbage First) Designed to process large heaps (> 4GB) divided into regions - can move objects between them, partially compacting heap without pause. tuning goal is to avoid full GC - increace Old Gen size (ratio or total heap size), increase # background threads & frequency of their calls
- -XX:+UseG1GC - Uses multiple threads to collect young gen while app threads are stopped, and background thread(s) to collect old gen with minimal pauses. short GC pauses for a relatively large heap, but requires extra core for background thread.
##Sizing the Generations GC Generations: young eden, young survivor spaces (S0, S1), old (tenured) minor GC - on young. always stop the world full GC - on all - entire heap. metaspace - metadata used by JIT compiler & GC.
- -XX:NewRatio - initial ratio of young gen to old gen. DEFAULT=2. Note adaptive sizing (default enabled) -> proportion will change (except for CMS, when the young-gen size is constant). If a generation size is reduced then it will experience more GCs.
- -XX:NewSize - init size of young gen. DEFAULT = 1/3 Xms
- -XX:MaxNewSize - max size of young gen.
- -Xmn - Sets both init and max size of young gen.
- -XX:MetaspaceSize=N - (PermSize for pre-JDK 8) - initial size of metaspace. Increase for apps that use lots of classes.
- -XX:MaxMetaspaceSize=N - (MaxPermSize for pre-JDK 8) - max size the metaspace. reduce to limit the amount of native space used by class metadata.
##Advanced GC Tuning ###Adaptive Sizing
- -XX:+UseAdaptiveSizePolicy - Default: JVM will resize heaps to meet GC goals. Turn off if heap sizes have been finely tuned, if Xms == Xmx, or apps that go though phases with different profiles.
- -XX:+PrintAdaptiveSizePolicy - Add gen resize info to GC log. check output for G1 to see if full GCs are triggered by humongous object allocation.
###Tenuring and Survivor Space Hints
- -XX:+PrintTenuringDistribution - log
- -XX:InitialSurvivorRatio=N - % of young gen reserved for survivor spaces. Increase if too frequent promotion of short-lived objects into old gen
- -XX:MinSurvivorRatio=N - adaptive % of young gen reserved for survivor spaces.
- -XX:TargetSurvivorRatio=N - % of free space in survivor spaces.
- -XX:InitialTenuringThreshold=N - initial #GC-cycles to keep an object in survivor spaces.
- -XX:MaxTenuringThreshold=N - max #GC-cycles to keep an object in survivor spaces.
###CMS collector hints
- -XX:CMSInitiatingOccupancyFraction=N when to begin background scanning of old gen. reduce on CMF
- -XX:+UseCMSInitiatingOccupancyOnly - use only CMSInitiatingOccupancyFraction to determine when to start CMS background scanning.
- -XX:ConcGCThreads=N - #threads to use for CMS background scanning. Use on high CPU machine with CMF.
- -XX:+CMSPermGenSweepingEnabled - sweep the permgen - use if performing lots of class unloading.
- -XX:CMSInitiatingPermOccupancyFraction=N - when to scan permgen - use if full GCs occur because permgen is filling to fast.
- -XX:+CMSClassUnloadingEnabled - unload classes after permgen is scanned.
- -XX:+CMSIncrementalMode - Use on low CPU machine
- -XX:CMSIncrementalModeSafetyFactor=N - affect frequency of incremental CMS background threads - increase on CMF
- -XX:CMSIncrementalDutyCycleMin=N - ditto
- -XX:CMSIncrementalDutyCycleMax=N - ditto
- -XX:+CMSIncrementalDutyCycle - ditto
##G1 collector Hints
- -XX:ConcGCThreads=N - #threads to use for background scanning. Use on high CPU machine with CMFs
- -XX:InitiatingHeapOccupancyPercent=N - threshold to begin background scanning - reduce on concurrent mode failures.
- -XX:G1MixedGCCountTarget=N - #mixed GCs for freeing garbage old gen regions. Reduce on CMF, increase if mixed GC cycles take too long.
- -XX:G1HeapRegionSize=N - size of a G1 region. Increase for very large heaps, or when allocating huge objects.
##Out of memory Errors
- -XX:+HeapDumpOnOutOfMemoryError - Generates a heap dump when JVM throws out of memory error. ENABLE!
- -XX:HeapDumpPath= - automatic heap dump java_pid.hprof filepath
- -XX:SoftRefLRUPolicyMSPerMB=N - Controls how long soft references survive after being used. Decrease in low-memory machines.
- -XX:MaxDirectMemorySize=N - Controls how much native memory (NIO) can be allocated via ByteBuffer.allocateDirect()
- page mappings are held in a global page table
- most frequently used mappings are held in translation lookaside buffers (TLBs) - fast cache maximizes hit rate
- **grep Hugepagesize /proc/meminfo **- Determine huge page sizes that kernel supports - based on CPU & boot params. typically 2048 KB
- calculate HugePageCount needed: (JVM Heap size / Hugepagesize) * 1.1
- echo $HugePageCount > /proc/sys/vm/nr_hugepages
- in /etc/sysctl.conf , sys.nr_hugepages=HugePageCount
- in /etc/security/limits.conf , add soft / hard memlock entries for user permissions to modify
- enable Transparent large pages: echo always > /sys/kernel/mm/transparent_hugepage/enabled
- -XX:+UseLargePages - increace page size - JVM will allocate pages from the OS’s large page system - ENABLE!
- -XX:+LargePageSizeInBytes=N - Solaris only
- -XX:+StringTableSize=N - size of hashtable used to hold interned strings.
##TTLABs (thread local allocation buffers)
- for frequent creation of large objects
- -XX:+PrintTLAB - TLAB summary in GC log - diagnostic. ENABLE!
- -XX:TLABSize=N - size of TLABs. When the app is performing a lot of allocation outside of TLABs, use this value to increase the TLAB size.
- -XX:-ResizeTLAB - Disables resizing of TLABs.
- -XX:ParallelGCThreads=N - Control GC Parallelism: Sets #threads used by GC. reduce on multi-JVM systems. increase for large heaps, decrease for small heaps. if N < 8 CPUs, JVM will use N threads, else 8 + ((N - 8) * 5 / 8)
- -Xss - size of thread native stack - only decreace on 32-bit JVMs to make more memory available for other parts of JVM. default: 64bit:1MB / 32bit:320KB
- -XX:-BiasedLocking - Disables the biased locking algorithm of the JVM to improve performance of threadpool based apps. Note: Java-level priority of a thread has very little effect
#Miscellaneous JVM flags
- -XX:+AlwaysLockClassLoader - disable parallel classloading on low CPU machines to improve startup performance
- -XX:+PrintFlagsFinal - show defaults for all flags
- -XX:-StackTraceInThrowable - Prevents stack traces on thrown exception. Enable if deep stacks or frequently thrown exceptions (that cannot be addressed)
- -XX:-DisabledContended - set to FALSE to allow non-JDK code to use @Contended annotation to pad variables to prevent false sharing.
- -XX:+UseCompressedOops - Compressed ordinary object pointers - use 32-bit addresses within 64-bit JVM - enabled by default for heaps between 4 GB and 32 GB, . to compensate for GC impact of uncompressed, add 20% to planned heap size ???
- -XX:+AggressiveOpts - enable 'experimental default' optimizations. unecessary in JVM 1.8 ?
#Diagnostic Flags ##GC Diagnostic Logging
- **-verbose:gc **- Enables basic GC logging. ENABLE!
- -Xloggc: - Directs GC log to filepath rather than stdout. ENABLE!
- -XX:+PrintGC - enables basic GC logging. ENABLE!
- -XX:+PrintGCDetails - enables detailed GC logging. (overhead is minimal). ENABLE!
- -XX:+PrintGCTimeStamps - Prints a relative timestamp for each entry in GC log.
- -XX:+PrintGCDateStamps - Prints a readable time-of-day stamp for each entry in GC log. slightly more overhead
- -XX:+PrintReferenceGC - Prints information about soft and weak reference processing during GC - use to determine their effect on GC overhead.
- -XX:+UseGCLogFileRotation - Enables rotations of GC log to conserve file space.
- -XX:NumberOfGCLogFiles=N - When logfile rotation is enabled, indicates the number of logfiles to retain.
- -XX:GCLogFileSize=N - size trigger of logfile before rotating it.
Java Flight Recorder
- -XX:+UnlockCommercialFeatures - Allows JVM to use Flight Recorder (non open source)
- -XX:+FlightRecorder - ENABLE! tiny overhead when idle, small overhead when recording
- -XX:+FlightRecorderOptions - options