mamonu/SparkUI.md

## SparkUI.md

      
    Raw
  

              SparkUI.md
            
          
    I will have to think about a sensible place to put this.

But here’s how you can get the spark UI for a glue job:
job = GlueJob('my_dir/', bucket=bucket, job_role=my_role,
              job_arguments={"--test_arg": 'some_string',
                             '--enable-spark-ui': 'true',
                             '--spark-event-logs-path': 's3://alpha-data-linking/glue_test_delete/logsdelete' })
then
sync files from s3://alpha-data-linking/glue_test_delete/logsdelete to a local folder called events .
Then build:
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v2.4.0
FROM ${SPARK_IMAGE}

RUN apk --update add coreutils

RUN mkdir /tmp/spark-events
ENTRYPOINT ["/opt/spark/sbin/start-history-server.sh"]

docker build -t shs .
then
docker run   -v ${PWD}/events:/tmp/spark-events -p 18080:18080 shs  (edited) 
What’s happening?

The glue job outputs logs to the logs path
This is all the spark ui needs to allow you to analyse what went on in a job.  Copy them to a directory on your local computer called events
We start the spark history server in Docker.  By default this looks for new logs in /tmp/spark_events.
We map our local events directory into this directory in the container using -v
By default the history server gui is on 18080 so we map this using -p