Skip to content

Instantly share code, notes, and snippets.

@mamonu
Last active October 22, 2019 18:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mamonu/810102f560789a39d1cf4420eb39b783 to your computer and use it in GitHub Desktop.
Save mamonu/810102f560789a39d1cf4420eb39b783 to your computer and use it in GitHub Desktop.
sparkui

I will have to think about a sensible place to put this.
But here’s how you can get the spark UI for a glue job:

job = GlueJob('my_dir/', bucket=bucket, job_role=my_role,
              job_arguments={"--test_arg": 'some_string',
                             '--enable-spark-ui': 'true',
                             '--spark-event-logs-path': 's3://alpha-data-linking/glue_test_delete/logsdelete' })

then

sync files from s3://alpha-data-linking/glue_test_delete/logsdelete to a local folder called events .

Then build:

ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v2.4.0
FROM ${SPARK_IMAGE}

RUN apk --update add coreutils

RUN mkdir /tmp/spark-events
ENTRYPOINT ["/opt/spark/sbin/start-history-server.sh"]

docker build -t shs .

then

docker run -v ${PWD}/events:/tmp/spark-events -p 18080:18080 shs (edited)

What’s happening?

  • The glue job outputs logs to the logs path
  • This is all the spark ui needs to allow you to analyse what went on in a job. Copy them to a directory on your local computer called events
  • We start the spark history server in Docker. By default this looks for new logs in /tmp/spark_events.
  • We map our local events directory into this directory in the container using -v
  • By default the history server gui is on 18080 so we map this using -p
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment