Skip to content

Instantly share code, notes, and snippets.

@microamp
Last active April 29, 2021 11:15
Show Gist options
  • Save microamp/bfbb90021d4e22c3ffe7d3f6e74da640 to your computer and use it in GitHub Desktop.
Save microamp/bfbb90021d4e22c3ffe7d3f6e74da640 to your computer and use it in GitHub Desktop.
Using Emacs for AWS Glue PySpark Development

Emacs AWS Glue PySpark Development Setup

  1. Create a new developer endpoint

    • Specify $HOME/.ssh/jumphost.pub as your public key
  2. Update $HOME/.ssh/config accordingly:

    Host gluepyspark3
      Hostname ec2-???-???-???-???.ap-southeast-2.compute.amazonaws.com
      User glue
      IdentityFile ~/.ssh/jumphost
      PreferredAuthentications publickey
    
  3. Try connecting to the developer endpoint:

    ssh gluepyspark3
    
           __|  __|_  )
           _|  (     /   Amazon Linux AMI
          ___|\___|___|
    
    https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/
    53 package(s) needed for security, out of 88 available
    Run "sudo yum update" to apply all updates.
    
    EEEEEEEEEEEEEEEEEEEE MMMMMMMM           MMMMMMMM RRRRRRRRRRRRRRR
    E::::::::::::::::::E M:::::::M         M:::::::M R::::::::::::::R
    EE:::::EEEEEEEEE:::E M::::::::M       M::::::::M R:::::RRRRRR:::::R
      E::::E       EEEEE M:::::::::M     M:::::::::M RR::::R      R::::R
      E::::E             M::::::M:::M   M:::M::::::M   R:::R      R::::R
      E:::::EEEEEEEEEE   M:::::M M:::M M:::M M:::::M   R:::RRRRRR:::::R
      E::::::::::::::E   M:::::M  M:::M:::M  M:::::M   R:::::::::::RR
      E:::::EEEEEEEEEE   M:::::M   M:::::M   M:::::M   R:::RRRRRR::::R
      E::::E             M:::::M    M:::M    M:::::M   R:::R      R::::R
      E::::E       EEEEE M:::::M     MMM     M:::::M   R:::R      R::::R
    EE:::::EEEEEEEE::::E M:::::M             M:::::M   R:::R      R::::R
    E::::::::::::::::::E M:::::M             M:::::M RR::::R      R::::R
    EEEEEEEEEEEEEEEEEEEE MMMMMMM             MMMMMMM RRRRRRR      RRRRRR
    
    [glue@ip-172-31-0-183 ~]$
    
  4. Do initial setup:

    • Python dependencies:

      python3 -m pip install --user python-language-server[all]
    • Optional:

      python3 -m pip install --user --upgrade pip
      pip3 --version
      # pip 21.1 from /home/glue/.local/lib/python3.6/site-packages/pip (python 3.6)
  5. Create a temporary file for interactive development (e.g. spark_job.py):

    touch $HOME/spark_job.py
  6. Open the file using tramp

    1. Create .dir-locals.el under /ssh:gluepyspark3:/home/glue:

      ;;; Directory Local Variables
      ;;; For more information see (info "(emacs) Directory Variables")
      
      ((python-mode . ((python-shell-interpreter-args . "")
               (python-shell-interpreter . "/usr/bin/gluepyspark3"))))
      
    2. Make sure pyls is in the $PATH:

      which pyls
      # ~/.local/bin/pyls
    3. Configure pyls to work when using TRAMP in Emacs:

      (lsp-register-client
       (make-lsp-client :new-connection (lsp-tramp-connection "~/.local/bin/pyls")
                :major-modes '(python-mode)
                :remote? t
                :server-id 'pyls-remote))
      
    4. Open spark_job.py in Emacs:

      M-x M-f /ssh:gluepyspark3:/home/glue/spark_job.py
      
    5. Make sure python-shell-interpreter has been set to "/usr/bin/gluepyspark3" as per .dir-locals.el above:

      C-h v python-shell-interpreter
      
    6. Run M-x run-python to start a Python shell within Emacs for interactive development

      • NOTE: elpy-shell-switch-to-shell doesn't work for some reason, run run-python instead
    7. ???

    8. Profit!

@microamp
Copy link
Author

Make sure you have configured lsp-mode correctly in your $HOME/.emacs.d/init.el.

(use-package lsp-mode
  :ensure t
  :commands (lsp lsp-deferred)
  :hook ((python-mode . lsp-deferred))
  :config
  (lsp-register-client
   (make-lsp-client :new-connection (lsp-tramp-connection "~/.local/bin/pyls")
		    :major-modes '(python-mode)
		    :remote? t
		    :server-id 'pyls-remote)))

Also,

(setq enable-remote-dir-locals t)

to enable per-directory local variables on the remote machine (so-called "developer endpoint" in Glue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment