Generating Flame Graphs for Apache Spark
Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.
When are flame graphs useful?
Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to speed up the scheduler by switching to use Kryo instead of the Java serializer.
Installing Oracle JDK
The flight recorder is only available in the Hotspot JVM, so if you're using OpenJDK, you'll need to replace it with the Oracle JDK (version 7u40 or higher). These instructions describe how to replace Open JDK on Red Hat.
To see which JDK you're using, run:
If you have OpenJDK, first remove it:
yum -y remove java*
Now if you try
java -version again, it should say "No such file or directory".
Next, download the Oracle JDK and use the Red Hat package manager to install it (the extra wget flags are necessary to avoid an error saying you need to accept Oracle's license):
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/7u60-b19/jdk-7u60-linux-x64.rpm rpm -ivh jdk-7u60-linux-x64.rpm
You may also need to fix
JAVA_HOME, which Spark uses, to point to the correct location. If you're using EC2 instances launched by the spark-ec2 script, the easiest way to do this is to create a symlink from the expected
JAVA_HOME to the new installation:
ln -s /usr/java/jdk1.7.0_60/ /usr/lib/jvm/java-1.7.0
You may also need to add
$JAVA_HOME/bin to your path so that Java commands like
jps work correctly (
If you're running on a cluster of machines, you'll need to update Java on all of the machines. The easiest way to do this, if you're using the spark-ec2 scripts, is to put the 4 commands above into a file (e.g.,
upgrade_java.sh), copy that file to all of the machines on the cluster:
and then use the helpful
slaves.sh script to run the script on all of the workers:
Next, you'll need to re-compile Spark, copy the updated Spark to all of the machines:
/root/spark-ec2/copy-dir --delete /root/spark
and re-start Spark:
Enabling the flight recorder
To enable the flight recorder, you need to add
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder when you start the JVM. For Spark, you can do this by adding the following to
# Enable flight recording on the driver. spark.driver.extraJavaOptions -XX:+UnlockCommercialFeatures -XX:+FlightRecorder # Enable flight recording on the executors. spark.executor.extraJavaOptions -XX:+UnlockCommercialFeatures -XX:+FlightRecorder
You can do this after you've started the master, but you need to do it before starting your applicaton.
Collecting a flight recording
To collect a flight recording from your running Spark job, you'll first need to figure out the process ID. If you want to collect a flight recording from an executor, for example, run
jps on the worker machine:
$ jps 18977 Worker 19163 CoarseGrainedExecutorBackend 5465 DataNode 19224 Jps
You want the ID of
CoarseGrainedExecutorBackend (this will have a different name if you're using YARN or Mesos -- look for the process name that includes
ExecutorBackend). Then use
jcmd to create a flight recording (the example command below creates a 10 second recording). Note that this runs asynchronously, so the command will return immediately, but the file won't be available for 10 seconds.
$ jcmd 19163 JFR.start filename=spark.jfr duration=10s 19163: Started recording 1. The result will be written to: /root/spark/work/app-20160628234739-0001/1/spark.jfr
When you run
jcmd, if you get an error that says:
java.lang.IllegalArgumentException: Unknown diagnostic command
it probably means that the extra Java options didn't get passed to the JVM correctly. To see all of the valid commands for a particular process, you can use
jcmd with the
$ jcmd 19163 help The following commands are available: VM.native_memory VM.commercial_features ManagementAgent.stop ManagementAgent.start_local ManagementAgent.start Thread.print GC.class_histogram GC.heap_dump GC.run_finalization GC.run VM.uptime VM.flags VM.system_properties VM.command_line VM.version help
In the output above,
JFR.start isn't listed as a valid command. You'll notice, from the output above, that
VM.flags is a valid command; you can use this to check what what options got passed to the JVM:
$ jcmd 19163 VM.flags
Double check that the output of the above command includes the Java options that you added above. In Spark, you can also check that the Java options got added correctly by clicking the "Environment" tab in the UI and checking the value of the
Generating a flame graph
First, if you created the recording on a remote machine, copy the file back to your laptop (this will make it easier to view the flame graph). Next, follow the instructions at this GitHub repo to generate a FlameGraph.
In order to run the
install-mc-jars.sh script to install the flame graph tools, you'll need to have the
JAVA_HOME environment variable set correctly. On mac, you can figure out where Java is installed with:
/usr/libexec/java_home -v 1.7
JAVA_HOME to whatever was returned by that command before running the install script. The GitHub repository linked above describes all necessary remaining steps to create a flame graph.
While flame graphs can be useful for spotting big performance issues, we've found them to be less useful for fine-grained performance issues. After fixing some of the major issues with the scheduler, for example, we found the flame graph to be minimally useful, because it doesn't convey anything about the critical path, and many of the later scheduler performance issues we found were subtle issues along the critical path.
Also, flame graphs only display time that the JVM spent using the CPU. They will not show time spent blocked waiting on I/O, nor will they display time spent in native code. This means that, for example, they're not useful for diagnosing when network bandwidth to ship tasks to workers becomes a bottleneck.
References and Credits
The Flame Graph tool was written by Brendan Gregg, and is described (including a video about how to interpret them) on his blog. He has a blog post on how to generate flame graphs that include CPU time spent in the kernel and in system libraries (in addition to the CPU time spent in Java) here. He also wrote an ACMQueue article about flame graphs.
Isuru Perera wrote the tool to generate a Flame Graph from Java Flight Recordings, which is described in his blog.
Marcus Hirt's blog has some useful writeups about using the Java flight recorder and Java mission control -- e.g., http://hirt.se/blog/?p=364. There's also a great writeup about Java Mission Control on Takipi's blog.
This page describes in more detail how to remove OpenJDK and install Oracle JDK.