Skip to content

Instantly share code, notes, and snippets.

@skseth
Last active January 31, 2023 12:12
Show Gist options
  • Save skseth/169aed75621f1a3f5acaf9b7c3d4351d to your computer and use it in GitHub Desktop.
Save skseth/169aed75621f1a3f5acaf9b7c3d4351d to your computer and use it in GitHub Desktop.
Setting up a Debian Bullseye GPU Dev Machine

Create a machine

Choose

  • GPU (T4)
  • image (Debian 11 Bulleye w/ 50 GB boot disk size)
  • add an extra 100 GB disk - choose to delete when vm is deleted, and give it a custom device name to make it easier to see
  • add an ssh key if you want - give user name that you want to login as
  • make this a spot instance to save money

Now, ssh in

  • use gcloud compute instances list | grep to see the public IP
  • ssh-keygen -R # to remove any existing entries in known host
  • ssh @<public IP>

Setup the extra disk

From this link

sudo lsblk
sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
sudo mkdir -p /mnt/disks/data
sudo mount -o discard,defaults /dev/sdb /mnt/disks/data
sudo chmod a+w /mnt/disks/data

Setup Python

Verify python installed

  • python3 -V
  • sudo apt install python3-pip
  • pip3 --version

Move pip cache to extra disk (From this link)

# find the config file location under variant "global"
pip config list -v

# create the file and add
[global]
cache-dir=/path/to/dir

# test if it worked
pip config list
pip cache dir

setup /etc/fstab

Get the UUID of the new disk, e.g.

  • sudo blkid /dev/sdb

Backup the /etc/fstab file

  • sudo blkid /dev/DEVICE_NAME

Add following line to /etc/fstab file

  • UUID=<UUID_VALUE> /mnt/disks/<MOUNT_DIR> ext4 discard,defaults,nofail 0 2

restart the vm to verify mount point is ok

Setup Docker

From this link

sudo apt update
sudo apt install apt-transport-https ca-certificates curl gnupg lsb-release -y
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt -y install docker-ce docker-ce-cli containerd.io

You can verify docker is setup correctly by

  • sudo docker version
  • sudo systemctl status docker

Add your user to docker group

  • sudo usermod -aG docker <your user name>

Logout and login, to give effect to the group

Move Docker Root Directory to another location

From this link

# Shut down docker
sudo systemctl stop docker
sudo systemctl stop docker.socket
sudo systemctl stop containerd
# Move /var/lib/docker
sudo mkdir -p /mnt/disks/data/<base_dir> # if not existing
sudo mv /var/lib/docker /mnt/disks/data/<base_dir>

Edit /etc/docker/daemon.json :

{
  "data-root": "/mnt/disks/data/<base_dir>/docker"
}

Restart docker

  • sudo systemctl start docker

Verify

  • docker info # look for Docker Root Dir
  • docker run hello-world

Setup Docker-Compose

From this link

sudo apt update
sudo apt install -y curl wget
curl -s https://api.github.com/repos/docker/compose/releases/latest | grep browser_download_url  | grep docker-compose-linux-x86_64 | cut -d '"' -f 4 | wget -qi -
chmod +x docker-compose-linux-x86_64
sudo mv docker-compose-linux-x86_64 /usr/local/bin/docker-compose
docker-compose version

Install contrib & non-free apt repos

First we need add-apt-repository

sudo apt install software-properties-common
sudo apt update
sudo add-apt-repository contrib
sudo add-apt-repository non-free

Install pciutils

Install pciutils

  • sudo apt-get -y install pciutils

Verify which gpus / cpus are installed :

  • lspci
  • lscpu

Install nvidia drivers

From this link

# install the nvidia repository
sudo apt install dirmngr ca-certificates software-properties-common apt-transport-https dkms curl -y
curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/3bf863cc.pub | sudo gpg --dearmor | sudo tee /usr/share/keyrings/nvidia-drivers.gpg > /dev/null 2>&1
echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers.gpg] https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/ /' | sudo tee /etc/apt/sources.list.d/nvidia-drivers.list
# enable contrib repository
sudo add-apt-repository contrib
sudo apt update

To install the latest drivers, run

sudo apt install nvidia-driver nvidia-kernel-open-dkms cuda nvidia-smi nvidia-settings
sudo reboot now

Now verify the installation :

  • nvidia-smi

Finally, update ~/.bashrc with

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin

Uninstalling & Reinstalling NVidia/Cuda drivers

sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
sudo apt-get install linux-headers-$(uname -r)

Then

sudo apt install nvidia-driver nvidia-kernel-open-dkms cuda nvidia-smi nvidia-settings
sudo reboot now

Nvidia Docker Support

From this link

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2

You may get a conflict with the installation updating daemon.json. If so, allow the installation to override the daemon.json file, and then add back your changes. The final /etc/docker/daemon.json file should look something like the following :

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "data-root": "/mnt/disks/data/<your-docker-data-dir>"
}

Then,

sudo systemctl restart docker
# test
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Setup Chrome Remote Desktop

From this link - just the chrome remote desktop part

sudo apt update
sudo apt install --assume-yes wget tasksel
wget https://dl.google.com/linux/direct/chrome-remote-desktop_current_amd64.deb
sudo apt-get install --assume-yes ./chrome-remote-desktop_current_amd64.deb
sudo DEBIAN_FRONTEND=noninteractive apt install --assume-yes xfce4 desktop-base dbus-x11 xscreensaver
sudo bash -c 'echo "exec /etc/X11/Xsession /usr/bin/xfce4-session" > /etc/chrome-remote-desktop-session'
sudo systemctl disable lightdm.service

Go to the remote desktop site, https://remotedesktop.google.com/headless, from your local machine. Then, move to: Set up another computer > Begin > Next > Authorize.

Copy the command for Debian Linux and run it in the VM. Provide a PIN when asked.

Then, connect using the "Remote Access" link on the left, and login using your PIN.

If you encounter an error "Authentication is required to create a color profile/managed device", you can fix this using the technique provided at this link.

Edit file "sudo nano /etc/polkit-1/localauthority/50-local.d/45-allow-colord.pkla", and enter :

[Allow Colord all Users]
Identity=unix-user:*
Action=org.freedesktop.color-manager.create-device;org.freedesktop.color-manager.create-profile;org.freedesktop.color-manager.delete-device;org.freedesktop.color-manager.delete-profile;org.freedesktop.color-manager.modify-device;org.freedesktop.color-manager.modify-profile
ResultAny=no
ResultInactive=no
ResultActive=yes

Save and reboot, then try logging in again.

Install gcloud CLI

sudo apt-get install apt-transport-https ca-certificates gnupg
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update && sudo apt-get install google-cloud-cli

gcloud auth configure-docker

Install poetry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment