IAmSuyogJadhav/Interpretability-Course-Docker-Instructions.md

## Interpretability-Course-Docker-Instructions.md

      
    Raw
  

              Interpretability-Course-Docker-Instructions.md
            
          
    This short tutorial will get you set up with your programming environment, so that you can get started with your programming assignments.
Setting up the docker container

We will be using a prebuilt docker instance from NVIDIA's docker repository (NGC), as it simplifies setup for us.
Run the following command. Replace <local_dir> with a local directory path you would like to access from within the container.
docker run -p 8888:9989 --ulimit memlock=-1 --ulimit stack=67108864 --shm-size=8g --gpus all -it -v <local_dir>:/workspace/ nvcr.io/nvidia/pytorch:23.05-py3
You can change the part after nvcr.io/pytorch: to change the version of the docker. 23.05-py3 is the latest version from 2023.
I have personally tested 21.10-py3 (from 2021), to be working well. If you face any issues with the provided version, you can try the older 21.10-py3.
This will pull and setup a docker container with CUDA libraries and pyTorch already installed and ready-to-use. The first run of this command usually takes a couple of minutes. Once the first run has been completed, you can use the same command later on to log into your docker container.
There are a lot of arguments in this command, some of which you might want to change. Let's discuss some of these here:


-v <local_dir>:/workspace/: This 'mounts' the local directory <local_dir> to a path /workspace within the container. You can change it to a different path if you wish to.
Docker containers are disconnected from the parent system, so this directory serves as a way of easily making data and models accessible from within the container.
Docker containers are also designed to be ephemeral, so the changes you make to the container are always reverted when you close and re-launch the container later.
if there are some packages or python modules you would like to be always available from within the container, you can install them to this directory, and access them later from within the container.
You can also mount multiple such directories, if you want by using the -v argument multiple times.


--shm-size=8g: This is helpful if you are planning to use parallel computing. The default shared memory used by docker is usually too little.
This argument lifts this limit to 8GB. You can increase it further if you think it would help.


--gpus all: Makes all the GPUs connected to the parent system accessible from within the container. You can also choose to make only certain GPUs accessible if you wish.


-p 8888:9989: This is a port forward. We will use this for accessing our jupyter instance. More detailed instructions are given in the "Jupyter" sub-section down below.


You can learn more about this container on NVIDIA's website.
Verify installation

Some basic tests you can run to make sure everything is working properly:

Verify if all the GPUs are visible to our docker instance:

nvidia-smi
This should print a list of all the GPUs you made available to the system (all of them, if all was specified).

Verify if CUDA libraries are setup correctly:

nvcc -V
This should print the installed CUDA version on the console.

Verify if common python packages are installed:

conda list | grep -E "numpy|jupyter|scipy|torch|scikit"
Examine the output to check the packages and their versions.

Verify if the GPUs are visible to pyTorch:

python -c "import torch; print(torch.cuda.is_available())"
This should print True.
Installing packages

The container comes pre-installed with the Anaconda package manager.
In addition to pyTorch, the base environment also comes with some common machine learning packages pre-installed,
such as scikit-learn, numpy, matplotlib, jupyterlab, and opencv.
If you would like to install any additional packages, there are a couple of ways you can use to do this.

(Recommended) Create a requirements.txt with all the packages you need and install them at the start of each docker session:

pip install -r requirements.txt


(Usually not recommended by the docker community) Perform the package installation once in the docker, 'commit' the changes permanently,
and use this container image in the future. See here for a step-by-step guide if you're interested: Link.


(Also a recommended method in other contexts, but I personally had issues getting it to work with Nvidia's docker images) Create a new environment and save it to the shared directory.
Since the base environment already comes with a set of packages pre-configured, you might also want to 'copy' all the packages from the base environment:


conda create -p /workspace/myenv --clone base
This will create a new environment inside the /workspace/myenv folder. Also, since this is a shared directory between the docker instance and the parent
system, this environment will stay persistent between your docker sessions.
To activate your environment, use:
conda activate /workspace/myenv
If conda complains about the shell not being configured properly for using conda activate, run the following to fix the issue:
conda init bash && source ~/.bashrc
You should now be able to activate the environment without any issues now.
To install packages, you can now simply use either conda install or pip install.
Jupyter

Using the Jupyter lab suite is a good way of interacting with the computational server. The setup instructions are:


Expose the docker port 8888 to a port of your choice on the parent system (the computational server) by adding the -p <parent port>:8888 argument in your docker command.
For example, if you want to use the port 9989, you can use -p 9989:8888 so that the 8888 port from your docker instance is now connected to the port 9989
of the parent system.


Start jupyter lab inside the docker instance:


jupyter lab --ip 0.0.0.0 --no-browser


Your jupyter instance should now be up and running. To access it from your local system, you can simply forward your chosen port from the computational server to a port of your local computer.
It might sound complicated, but it is a simple matter of adding a port forward similar to step 2, but in your ssh command this time.
Following up with the same example, you would add -L 9989:localhost:9989 to your ssh command you used to join the computational server.
This will make it accessible from the local port 9989 on your local system.


You should now be able to navigate to http://localhost:9989  to access your jupyter lab instance.