A Cassandra service might have the following included in its' "shell initialization" script:
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
source: cassandra/jvm.option
So, to enable logging, make it rotate, and give that rotation some options:
...
-Xloggc:/var/log/<discovery-environment-service-name>/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
The important information from the JVM to log...
-XX:+PrintGCDetails # more verbose information regarding the collection
-XX:+PrintGCDateStamps # this gives you "dates" instead of UNIX timestamps
The next options are more related to what you'd like to know when you start tuning:
-XX:+PrintTenuringDistribution # information about objects as move from generation to generation
-XX:+PrintGCApplicationStoppedTime # important ** it will help find the worst full-stop GCs
-XX:+PrintPromotionFailure # help denote the cause of full garbage collections
The -XX:+PrintGCApplicationStoppedTime
helped me zero in on GCs that were not OutOfMemory
, but causing disturbing long GC pauses.
Once you have a problem, then you want to consider getting a "heap dump" of an active service. Or, you want to go harvest any dumps that you got when a service failed because of "OutOfMemory" and created a dump:
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=<path>
A quick reference (somewhat old) for the different options for logging in the JVM can be found here (in "GC logging options" section).
The amount of effort that goes into tuning ParNew/CMS can be seen in CASSANDRA-8150. The new default for Cassandra is G1GC.
A nice outline of the processing of tuning the JVM can be found here
You can add something like jHiccup to help measure pauses.
There is an example of the jamm
agent being added in cassandra.sh
.
Also, you might want to include a Metrics jar for taking measurements like metrics-core
.