nblauch/Python_for_Jupyter_notebooks_development_on_a_remote_cluster.md

## Python_for_Jupyter_notebooks_development_on_a_remote_cluster.md

      
    Raw
  

              Python_for_Jupyter_notebooks_development_on_a_remote_cluster.md
            
          
    Setting up Python for Jupyter notebooks development on a remote cluster

This gist will cover:

Setting up Python on your local machine using Anaconda
Running local Jupyter notebooks
Launching a Jupyter notebook on a remote computing cluster. This will allow you to take advantage of greater computational resources than you have available on your machine, while making beautiful visualizations through the web browser connection, even if your remote server is a headless system without X11 graphics support.
Using conda environments

Step 1: Install Anaconda

Follow these instructions
Miniconda is the better choice for people who do not have a lot of available disk space. You can always download the rest of the packages later with conda install or pip install.
Step 1b. Verify your installation

Make sure that your installation is working. Open the Anaconda Application and try to launch the Spyder IDE or a local Jupyter Notebooks server.
Now open up a terminal (Mac or Linux) or Anaconda command prompt (Windows).
# check that your system is now using Anaconda's python
which python

# and that you installed Python 3.7
python -V
Don't worry too much about the version - it's good to have your base install have the newest version of Python, but you can always create virtual environments later with earlier versions of Python to, e.g., build upon someone else's code which is only compatible with earlier versions (e.g. Python 2.7, how are people still using this??)
Step 2: Install JupyterLab

Within your teminal/command prompt, you now have access to the conda package manager. You should read up about it here
Use conda to install Jupyterlab:
# jupyterlab is located on the conda-forge channel, along with many other packages
conda install -c conda-forge jupyterlab
Step 3: Launch a local JupyterLab server

Launching a jupyterlab server is now as easy as:
jupyter-lab
# now open a browser and go to the address given by the jupyter-lab terminal output
Explore the interface. Much better than regular Jupyter Notebooks. Although still lacking in some areas. For that, we can install (or even develop!) extensions. I recommend installing the following 3 which are sure to improve your experience:
# we will need nodejs for the extensions
conda install nodejs

# a file tree
jupyter labextension install jupyterlab_filetree

# table of contents for notebooks (super awesome!!!)
jupyter labextension install @jupyterlab/toc

# variable explorer (similar to those offered in various IDEs)
jupyter labextension install @lckr/jupyterlab_variableinspector

After you install the extensions, you might need to restart your JupyterLab server for them to take effect. If they are working, you will see extra icons on the left side of JupyterLab.
Many more useful extensions can be found here
Step 4: Set up a remote JupyterLab server

For this step, you need to have a remote machine to SSH in to. We will pretend that you have an environment variable pointing to its address called SERVER_ID. This approach was developed for the Carnegie Mellon MIND cluster but should work well for other SLURM based clusters, and could be modified for any computing cluster. Launching a remote JupyterLab server is incredibly useful, and allows you to run interactive jobs, and plotting, all in one environment and taking advantage of the resources of your computing cluster. For example, you can work on code that requires a GPU or more RAM than your local machine has available. Cool. It is the same basic idea as launching a local server, with a couple added ingredients.

an interactive job launched on the remote server in which you can run jupyterlab, ideally within a screen or tmux session to allow you to leave the jupyterlab server running after disconnecting your SSH connection.
SSH piping. We will assume you have an environment variable PORT=8888 for example, but we will refer to it by the variable name so you can use a port that is available.

Follow these instructions to get it going (credit to the approach taken here goes to Jayanth Koushik).
# log into the remote cluster and pipe between localhost port $PORT locally and remotely 
ssh -L $PORT:localhost:$PORT $SERVER_ID
# we are now in the server
# we want to be able to end our SSH session and resume where we left off.
# let's use a screen session (tmux would also work). we will name the session jupyterlab for easy access later
screen -S jupyterlab
# we are now in the screen session
# let's run an interactive CPU job
srun -p cpu --cpus-per-task=6 --gres=gpu:0 --mem=20GB --time=14-00:00:00 --pty bash
# we are now in the interactive job
# we need the ssh port to reach inside the computing node we are on
ssh -N -f -R $PORT:localhost:$PORT $SLURM_SUBMIT_HOST
# now when we launch the jupyterlab server, we will be able to access it locally!
jupyter-lab --no-browser --port=$PORT
Now, open a browser and go to localhost:$PORT/lab (but type in the port number)
You can kill your SSH session, turn off your computer, etc. and the jupyterlab server will still be there (until it is killed remotely or your interactive job expires). Just remember to pipe through the correct port when you SSH in.
Step 5: Create a conda virtual environment

For simple Python use, one base installation is usually fine. But once you start working on multiple projects, or interacting with other people's code, it gets to be very dangerous to do all of this in one environment. Further, it makes reproducing your results much more difficult. Virtual environments were developed to solve this problem. Python has virtual environments natively, but I recommend using conda environments which are even more isolated than virtual environments (at the expense of greater disk use) and can control the environment beyond just the python packages. For more insight into the differences between conda and standard virtual environments, do some googling. Conda also has some nice integration with Jupyter, as will be illustrated below with nb_conda_kernels.
Open a terminal/command prompt and create a conda virtual environment:
# using the base python version, i.e. 3.7
conda create --name cool_env
# using the base python version, and additionally installing the full anaconda distrubiton
conda create --name cool_env anaconda
# if for some reason you need python 2
conda create --name cool_env_2 --python=2.7
# for help on the env creation function
conda env --help
I recommend installing the conda kernels extension, which will make it easy to switch between different virtual environments (e.g. for different projects with different requirements) within the same JupyterLab server.
# assuming you are launching jupyterlab from the base environment
conda install nb_conda_kernels
# then just install ipykernel in other environments to allow them to show up in jupyterlab from the base env
conda activate cool_env
conda install ipykernel