Skip to content

Instantly share code, notes, and snippets.

@dfparker2002
Forked from gwhitelaw/aws-glue-zepplin.md
Created January 26, 2022 01:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dfparker2002/6d41780818a82c82f729b1f5f148c83b to your computer and use it in GitHub Desktop.
Save dfparker2002/6d41780818a82c82f729b1f5f148c83b to your computer and use it in GitHub Desktop.
Easily connect to an AWS Glue Dev endpoint

This is how I quickly got an Apache Zepplin notebook running against the AWS Glue Dev endpoint. None of the guides out there seemed concise, and I found some custom Docker containers doing what you can do easily. This gives you the power - it sets up port forwarding & runs the official Docker image.

  1. Create your Glue Dev endpoint (this involves creating a keypair, I just used ssh-keygen)
  2. Once READY, select it and copy the "SSH tunnel to remote interpreter"
  • eg: ssh -i <private-key.pem> -vnNT -L :9007:169.254.76.1:9007 glue@..compute.amazonaws.com
  1. Connect to the endpoint in a terminal session, modifying the above to match: ssh -i ~/.ssh/glue-dev -vnNT -L :9007:*127.0.0.1*:9007 glue@<ec2-endpoint>.<region>.compute.amazonaws.com
  2. Run the Apache Zepplin Docker container docker run -p 8080:8080 --rm -v $PWD/logs:/logs -v $PWD/notebook:/notebook -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_NOTEBOOK_DIR='/notebook' --name zeppelin apache/zeppelin:0.7.3
  3. Update your interpreters to use the existing process (the AWS Glue endpoint).
  • Find the intepreter of choice
  • Hit edit top right
  • Check "Connect to existing process"
  • Set Host to: host.docker.internal
  • Set Port to: 9007
  1. You should now be able to create a notebook and get started!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment