Skip to content

Instantly share code, notes, and snippets.

@gwhitelaw
Last active January 26, 2022 01:39
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save gwhitelaw/88095e01209b79b627a7ff7c8371b2cf to your computer and use it in GitHub Desktop.
Save gwhitelaw/88095e01209b79b627a7ff7c8371b2cf to your computer and use it in GitHub Desktop.
Easily connect to an AWS Glue Dev endpoint

This is how I quickly got an Apache Zepplin notebook running against the AWS Glue Dev endpoint. None of the guides out there seemed concise, and I found some custom Docker containers doing what you can do easily. This gives you the power - it sets up port forwarding & runs the official Docker image.

  1. Create your Glue Dev endpoint (this involves creating a keypair, I just used ssh-keygen)
  2. Once READY, select it and copy the "SSH tunnel to remote interpreter"
  • eg: ssh -i <private-key.pem> -vnNT -L :9007:169.254.76.1:9007 glue@..compute.amazonaws.com
  1. Connect to the endpoint in a terminal session, modifying the above to match: ssh -i ~/.ssh/glue-dev -vnNT -L :9007:*127.0.0.1*:9007 glue@<ec2-endpoint>.<region>.compute.amazonaws.com
  2. Run the Apache Zepplin Docker container docker run -p 8080:8080 --rm -v $PWD/logs:/logs -v $PWD/notebook:/notebook -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_NOTEBOOK_DIR='/notebook' --name zeppelin apache/zeppelin:0.7.3
  3. Update your interpreters to use the existing process (the AWS Glue endpoint).
  • Find the intepreter of choice
  • Hit edit top right
  • Check "Connect to existing process"
  • Set Host to: host.docker.internal
  • Set Port to: 9007
  1. You should now be able to create a notebook and get started!
@mohemed2087
Copy link

It is failing with below error
a.lang.RuntimeException: Fail to callRemoteFunction, because connection is broken
at org.apache.zeppelin.interpreter.remote.PooledRemoteClient.callRemoteFunction(PooledRemoteClient.java:108)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:98)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:159)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:126)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:271)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:444)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:72)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)

@matthinea
Copy link

Note that for Glue 1.0 and alter, use Zeppelin v0.8.1, not 0.7.3 as is stated in the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment