Skip to content

Instantly share code, notes, and snippets.

@abhiesa
Created May 15, 2012 18:18
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abhiesa/2703910 to your computer and use it in GitHub Desktop.
Save abhiesa/2703910 to your computer and use it in GitHub Desktop.
server compiler
The -server compiler
The -XX:+UseParallelGC parallel (throughput) garbage collector
The -Xms initial heap size is 1/64th of the machine's physical memory
The -Xmx maximum heap size is 1/4th of the machine's physical memory (up to 1 GB max).
The -XX:+UseParallelGC parallel (throughput) garbage collector, or
The -XX:+UseConcMarkSweepGC concurrent (low pause time) garbage collector (also known as CMS)
The -XX:+UseSerialGC serial garbage collector (for smaller applications and systems)
4.2.1 Tuning Example 1: Tuning for Throughput
Here is an example of specific command line tuning for a server application running on system with 4 GB of memory and capable of running 32 threads simultaneously (CPU's and cores or contexts).
java -Xmx3800m -Xms3800m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20
Comments:
-Xmx3800m -Xms3800m
Configures a large Java heap to take advantage of the large memory system.
-Xmn2g
Configures a large heap for the young generation (which can be collected in parallel), again taking advantage of the large memory system. It helps prevent short lived objects from being prematurely promoted to the old generation, where garbage collection is more expensive.
-Xss128k
Reduces the default maximum thread stack size, which allows more of the process' virtual memory address space to be used by the Java heap.
-XX:+UseParallelGC
Selects the parallel garbage collector for the new generation of the Java heap (note: this is generally the default on server-class machines)
-XX:ParallelGCThreads=20
Reduces the number of garbage collection threads. The default would be equal to the processor count, which would probably be unnecessarily high on a 32 thread capable system.
4.2.2 Tuning Example 2: Try the parallel old generation collector
Similar to example 1 we here want to test the impact of the parallel old generation collector.
java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC
Comments:
-Xmx3550m -Xms3550m
Sizes have been reduced. The ParallelOldGC collector has additional native, non-Java heap memory requirements and so the Java heap sizes may need to be reduced when running a 32-bit JVM.
-XX:+UseParallelOldGC
Use the parallel old generation collector. Certain phases of an old generation collection can be performed in parallel, speeding up a old generation collection.
4.2.3 Tuning Example 3: Try 256 MB pages
This tuning example is specific to those Solaris-based systems that would support the huge page size of 256 MB.
java -Xmx2506m -Xms2506m -Xmn1536m -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:LargePageSizeInBytes=256m
Comments:
-Xmx2506m -Xms2506m
Sizes have been reduced because using the large page setting causes the permanent generation and code caches sizes to be 256 MB and this reduces memory available for the Java heap.
-Xmn1536m
The young generation heap is often sized as a fraction of the overall Java heap size. Typically we suggest you start tuning with a young generation size of 1/4th the overall heap size. The young generation was reduced in this case to maintain a similar ratio between young generation and old generation sizing used in the previous example option used.
-XX:LargePageSizeInBytes=256m
Causes the Java heap, including the permanent generation, and the compiled code cache to use as a minimum size one 256 MB page (for those platforms which support it).
4.2.4 Tuning Example 4: Try -XX:+AggressiveOpts
This tuning example is similar to Example 2, but adds the AggressiveOpts option.
java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:+AggressiveOpts
Comments:
-Xmx3550m -Xms3550m
Sizes have been increased back to the level of Example 2 since we no longer using huge pages.
-Xmn2g
Sizes have been increased back to the level of Example 2 since we no longer using huge pages.
-XX:+AggressiveOpts
Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team's latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.
4.2.5 Tuning Example 5: Try Biased Locking
This tuning example is builds on Example 4, and adds the Biased Locking option.
java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:+AggressiveOpts -XX:+UseBiasedLocking
Comments:
-XX:+UseBiasedLocking
Enables a technique for improving the performance of uncontended synchronization. An object is "biased" toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact.
4.2.6 Tuning Example 6: Tuning for low pause times and high throughput
This tuning example similar to Example 2, but uses the concurrent garbage collector (instead of the parallel throughput collector).
java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:ParallelGCThreads=20 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31
Comments:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Selects the Concurrent Mark Sweep collector. This collector may deliver better response time properties for the application (i.e., low application pause time). It is a parallel and mostly-concurrent collector and and can be a good match for the threading ability of an large multi-processor systems.
-XX:SurvivorRatio=8
Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). Larger survivor spaces allow short lived objects a longer time period to die in the young generation.
-XX:TargetSurvivorRatio=90
Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.
-XX:MaxTenuringThreshold=31
Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
4.2.7 Tuning Example 7: Try AggressiveOpts for low pause times and high throughput
This tuning example is builds on Example 6, and adds the AggressiveOpts option.
java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:ParallelGCThreads=20 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31 -XX:+AggressiveOpts
Comments:
-XX:+AggressiveOpts
Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team's latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.
Concurrency Utilities
Overview
Introduction
The Java 2 platform includes a new package of concurrency utilities. These are classes which are designed to be used as building blocks in building concurrent classes or applications. Just as the Collections Framework greatly simplified the organization and manipulation of in-memory data by providing implementations of commonly used data structures, the Concurrency Utilities aims to simplify the development of concurrent classes by providing implementations of building blocks commonly used in concurrent designs. The Concurrency Utilities include a high-performance, flexible thread pool; a framework for asynchronous execution of tasks; a host of collection classes optimized for concurrent access; synchronization utilities such as counting semaphores; atomic variables; locks; and condition variables.
Using the Concurrency Utilities, instead of developing components such as thread pools yourself, offers a number of advantages:
Reduced programming effort. It is far easier to use a standard class than to develop it yourself.
Increased performance. The implementations in the Concurrency Utilities were developed and peer-reviewed by concurrency and performance experts; these implementations are likely to be faster and more scalable than a typical implementation, even by a skilled developer.
Increased reliability. Developing concurrent classes is difficult -- the low-level concurrency primitives provided by the Java language (synchronized, volatile, wait(), notify(), and notifyAll()) are difficult to use correctly, and errors using these facilities can be difficult to detect and debug. By using standardized, extensively tested concurrency building blocks, many potential sources of threading hazards such as deadlock, starvation, race conditions, or excessive context switching are eliminated. The concurrency utilities have been carefully audited for deadlock, starvation, and race conditions.
Improved maintainability. Programs which use standard library classes are easier to understand and maintain than those which rely on complicated, homegrown classes.
Increased productivity. Developers are likely to already understand the standard library classes, so there is no need to learn the API and behavior of ad-hoc concurrent components. Additionally, concurrent applications are far simpler to debug when they are built on reliable, well-tested components.
In short, using the Concurrency Utilities to implement a concurrent application can help you make your program clearer, shorter, faster, more reliable, more scalable, easier to write, easier to read, and easier to maintain.
The Concurrency Utilities includes:
Task Scheduling Framework - The Executor framework is a framework for standardizing invocation, scheduling, execution, and control of asynchronous tasks according to a set of execution policies. Implementations are provided that allow tasks to be executed within the submitting thread, in a single background thread (as with events in Swing), in a newly created thread, or in a thread pool, and developers can create customized implementations of Executor supporting arbitrary execution policies. The built-in implementations offer configurable policies such as queue length limits and saturation policy which can improve the stability of applications by preventing runaway resource consumption.
Concurrent Collections - Several new Collections classes have been added, including the new Queue, BlockingQueue and BlockingDeque interfaces, and high-performance, concurrent implementations of Map, List, and Queue. See the Collections Framework Guide for more details.
Atomic Variables - Classes for atomically manipulating single variables (primitive types or references), providing high-performance atomic arithmetic and compare-and-set methods. The atomic variable implementations in java.util.concurrent.atomic offer higher performance than would be available by using synchronization (on most platforms), making them useful for implementing high-performance concurrent algorithms as well as conveniently implementing counters and sequence number generators.
Synchronizers - General purpose synchronization classes, including semaphores, mutexes, barriers, latches, and exchangers, which facilitate coordination between threads.
Locks - While locking is built into the Java language via the synchronized keyword, there are a number of inconvenient limitations to built-in monitor locks. The java.util.concurrent.locks package provides a high-performance lock implementation with the same memory semantics as synchronization, but which also supports specifying a timeout when attempting to acquire a lock, multiple condition variables per lock, non-nested ("hand-over-hand") holding of multiple locks, and support for interrupting threads which are waiting to acquire a lock.
Nanosecond-granularity timing - The System.nanoTime method enables access to a nanosecond-granularity time source for making relative time measurements, and methods which accept timeouts (such as the BlockingQueue.offer, BlockingQueue.poll, Lock.tryLock, Condition.await, and Thread.sleep) can take timeout values in nanoseconds. The actual precision of System.nanoTime is platform-dependent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment