Skip to content

Instantly share code, notes, and snippets.

@RooseveltAdvisors
Created September 1, 2017 12:00
Show Gist options
  • Save RooseveltAdvisors/9683aa4ada3b90bb7ce02d027ce3b077 to your computer and use it in GitHub Desktop.
Save RooseveltAdvisors/9683aa4ada3b90bb7ce02d027ce3b077 to your computer and use it in GitHub Desktop.
Debugging Spark

To connect a debugger to the driver

Append the following to your spark submit (or gatk-launch) options:

replace 5005 with a different available port if necessary

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

This will suspend the driver until it gets a remote connection from intellij.

Configure a new intellij remote debugging configuration as follows:

Select Run -> Edit Configurations Hit the + to add a new configuration. Choose Remote set Mode to Attach set Host to your driver node name i.e. dataflow01.broadinstitute.org set Port to whatever port you used before Click OK Now start your spark tool and then run your debug configuration.

To debug an executor

add the following to your gatk-launch command

  --num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.broadinstitute.org:5005,suspend=n"

Replace the given address with your local computer's address and port. (intellij's remote debug configuration screen will show you the address if you're not sure what it is)

(It's important to set num-executors to 1 or each executor will try to connect to your debugger causing problems.)

Note that this will not suspend the executor (or the spark program will crash when run..) Instead, set the Mode in your run configuration to listen. Start your debug configuration before you start the spark program and it will wait for a connection from the executor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment