Skip to content

Instantly share code, notes, and snippets.

@mangecoeur
Created November 5, 2019 16:19
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mangecoeur/d1a6317985b5fa1b8bf5fa4c07da2833 to your computer and use it in GitHub Desktop.
Save mangecoeur/d1a6317985b5fa1b8bf5fa4c07da2833 to your computer and use it in GitHub Desktop.
Install Jupyterhub and Jupyterlab The Hard Way

Install Jupyterhub and Jupyterlab The Hard Way

The combination of Jupyterhub and Jupyterlab is a great way to make shared computing resources available to group.

These instruction are a guide for a manual, 'bare metal' install of Jupyterhub and Jupyterlab. This is ideal for running on a single server: build a beast of a machine and share it within your lab, or use a virtual machine from any VPS or cloud provider.

This guide has similar goals to that of The Littlest Jupyerhub setup script. However, instead of bundling all these step for you into one installer, we will perform every step manually - The Hard Way. This makes it easy to customize any part (e.g. if you want to run other services on the same system and need to make them work together), as well as giving you full control and understanding of your setup.

Prerequisites

Your own server (obviously) with administrator (root) access. Each user who will need access to Jupyter should have a standard user account on the machine. The install will be done through the command line - useful if you log into your machine remotely using SSH.

This tutorial was tested on Ubuntu 18.04. No other Linux distributions have been tested, but the instructions should be reasonably straightforward to adapt.

Goals

Jupyterlab enables access to a multiple 'kernels', each one being a given environment for a given language. The most common is a Python environment, for scientific computing usually one managed by the conda package manager.

This guide will set up Jupyterhub and Jupyterlab seperately from the Python environment. In other words, we treat Jupyterhub+Jupyterlab as a 'app' or webservice, which will connect to kernels available on the system. Specifically:

  • We will create an installation of Jupyterhub and Jupyterlab using a virtualenv under /opt using the system Python.

  • We will install conda globally.

  • We will create a shared conda environment which can be used (but not modified) by all users.

  • We will show how users can create their own private conda environments, where they can install whatever they like.

User accounts for Jupyter are simply the user accounts of the system - this makes it easy for everyone to manage data in their home folder and to mix and match different services and access methods (e.g. SSH) which all work using the Linux system user accounts.

Another goal of this guide is to use system provided packages wherever possible. This has the advantage that these packages get automatic patches and security updates (be sure to turn on automatic updates in Ubuntu). This means less maintenance work and a more reliable system.

Part 1: Jupyterhub and Jupyterlab

Setup the Jupyterhub and Jupyterlab in a virtual environment

First we create a virtual environment under '/opt/jupyterhub'. The '/opt' folder is apps not belonging to the operating system are commonly installed. Both jupyterlab and jupyterhub will be installed into this virtualenv. Create it with the command:

sudo python3 -m venv /opt/jupyterhub/

Now we use pip to install the required Python packages into the new virtual environment. Be sure to install wheel first. Since we are separating the user interface from the computing kernels, we don't install any Python scientific packages here. The only exception is ipywidgets because this is needed to allow connection between interactive tools running in the kernel and the user interface.

Note that we use /opt/jupyterhub/bin/python3 -m pip install each time - this makes sure that the packages are installed to the correct virtual environment.

Perform the install using the following commands:

sudo /opt/jupyterhub/bin/python3 -m pip install wheel
sudo /opt/jupyterhub/bin/python3 -m pip install jupyterhub jupyterlab
sudo /opt/jupyterhub/bin/python3 -m pip install ipywidgets

Jupyterhub also currently defaults to requiring configurable-http-proxy, which needs nodejs and npm. The versions of these available in Ubuntu therefore need to be installed first (they are a bit old but this is ok here):

sudo apt install nodejs npm

Then install configurable-http-proxy:

npm install -g configurable-http-proxy

Create the configuration for Jupyterhub

Now we start creating configuration files. To keep everything together, we put all the configuration into the folder created for the virtualenv, under /opt/jupyterhub/etc/. For each thing needing configuration, we will create a further subfolder and necessary files.

First create the folder for Jpyterhub configuration and navigate to it:

sudo mkdir -p /opt/jupyterhub/etc/jupyterhub/
cd /opt/jupyterhub/etc/jupyterhub/

Then generate the default configuration file

sudo /opt/jupyterhub/bin/jupyterhub --generate-config

This will produce the default configuration file /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py

Setup Systemd service

We will setup Jupyterhub to run as a system service using Systemd (which is responsible for managing all services and servers that run on startup in Ubuntu). We will create a service file in a suitable location in the virtualenv folder and then link it to the system services. First create the folder for the service file:

mkdir -p /opt/jupyterhub/etc/systemd

Then create a text file using your favourite editor at

/opt/jupyterhub/etc/systemd/jupyterhub.service

Paste the following into the file:

[Unit]
Description=Jupyterhub
After=syslog.target network.target

[Service]
User=root
Environment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/jupyterhub/venv/bin"
ExecStart=/opt/jupyterhub/bin/jupyterhub -f /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py

[Install]
WantedBy=multi-user.target

This sets up the environment to use the virtual environment we created, tells Systemd how to start jupyterhub using the configuration file we created, specifies that jupyterhub will be started as the root user (needed so that it can start jupyter on behalf of other logged in users), and specifies that jupyterhub should start on boot after the network is enabled.

Finally, we need to make systemd aware of our service file. First we symlink our file into systemd's directory:

sudo ln -s /opt/jupyterhub/etc/systemd/jupyterhub.service /etc/systemd/system/jupyterhub.service

Then tell systemd to reload its configuration files

sudo systemctl daemon-reload

And finally enable the service

sudo systemctl enable jupyterhub.service

The service will start on reboot, but we can start it straight away using:

sudo systemctl start jupyterhub.service

...and check that it's running using:

sudo systemctl status jupyterhub.service

You should now be already be able to access jupyterhub using <your servers ip>:8000 (assuming you haven't already set up a firewall or something). However, when you log in the jupyter notebooks will be trying to use the Python virtualenv that was created to install Jupyterhub, this is not what we want. So on to part 2

Part 2: Conda environments

Install conda for the whole system

We will use conda to manage Python environments. We will install the officially maintained conda packages for Ubuntu, this means they will get automatic updates with the rest of the system. Setup repo for the official Conda debian packages, instructions are copied from here:

Install Anacononda public gpg key to trusted store

curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg 
sudo install -o root -g root -m 644 conda.gpg /etc/apt/trusted.gpg.d/

Add Debian repo

sudo echo "deb [arch=amd64] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list

Install conda

sudo apt update
sudo apt install conda

This will install conda into the folder /opt/conda/, with the conda command available at /opt/conda/bin/conda.

Finally, we can make conda more easily available to users by symlinking the conda shell setup script to the profile 'drop in' folder so that it gets run on login

sudo ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh

Install a default conda environment for all users

First create a folder for conda envs (might exist already):

sudo mkdir /opt/conda/envs/

Then create a conda environment to your liking within that folder. Here we have called it 'python' because it will be the obvious default - call it whatever you like. You can install whatever you like into this environment, but you MUST at least install ipykernel.

sudo /opt/conda/bin/conda create --prefix /opt/conda/envs/python python=3.7 ipykernel

Once your env is set up as desired, make it visible to Jupyter by installing the kernel spec. There are two options here:

1 ) Install into the jupyterhub virtualenv - this ensures it overrides the default python version.

sudo /opt/conda/envs/python/bin/python -m ipykernel install --prefix=/opt/jupyterhub/ --name 'python' --display-name "Python (default)"

2 ) Install it system-wide by putting it into /usr/local, where any Jupyter install will look for kernels

sudo /opt/conda/envs/python/bin/python -m ipykernel install --prefix /usr/local/ --name 'python' --display-name "Python (default)"

Setting up users' own conda environments

There is relatively little to do here, users will have to set up their own environments using the shell. On login they should run conda init or /opt/conda/bin/conda. The can then use conda however they like to set up their environment, although they must also install ipykernel. Once done, they can enable their kernel using:

/path/to/kernel/env/bin/python -m ipykernel install --name 'python-my-env' --display-name "Python My Env"

This will place the kernel spec into their home folder, where Jupyter will look for it on startup.

Epilogue: Setting up a reverse proxy using Nginx

The guide so far results in jupyterhub running on port 8000. It is not generally advisable to run open web services in this way - instead, use a reverse proxy running on standard HTTP/HTTPS ports.

Nginx is a mature and established web server and reverse proxy and is easy to install using sudo apt install nginx. Details on using Nginx as a reverse proxy can be found elsewhere.

Often a useful thing to do is to setup jupyterhub to work at a given url path e.g. <your-server-address>/jupyter. This could be useful for example if you are running several services on the server (e.g. you might have RStudio server running also).

To achieve this needs a few tweaks to both the Jupyterhub configuration and the Nginx config. First, edit the configuration file /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py and add the line:

c.JupyterHub.bind_url = 'http://:8000/jupyter'

where /jupyter will be the relative URL of the server.

Now Nginx must be configured with a to pass all traffic from /jupyter to the the local address 127.0.0.1:8000. Add the following snippet to your nginx configuration file (e.g. /etc/nginx/sites-available/default). You will need to restart nginx for the new configuration to take effect.

  location /jupyter/ {    
    # NOTE important to also set base url of jupyterhub to /jupyter in its config
    proxy_pass http://127.0.0.1:8000;

    proxy_redirect   off;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # websocket headers
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;

  }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment