Skip to content

Instantly share code, notes, and snippets.

@Glavin001
Last active April 28, 2024 16:48
Show Gist options
  • Save Glavin001/3f73a15ab337a3c88ce645ee41f27c34 to your computer and use it in GitHub Desktop.
Save Glavin001/3f73a15ab337a3c88ce645ee41f27c34 to your computer and use it in GitHub Desktop.
How to update CUDA version for TensorDock

How to update CUDA version for TensorDock

Problem

TensorDock is pre-installed with CUDA 10.1 (old).

For many use-cases, such as Flash Attention 2 require newer versions of CUDA.

One symptom of this is nvcc and nvidia-smi will show different CUDA versions:

nvidia-smi nvcc -V
✅ 12.2 ❌ 10.1
$ nvidia-smi
Tue Sep  5 00:50:23 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:05:00.0 Off |                  N/A |
| 42%   38C    P0             103W / 390W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Solution/Fixes

1. Unhold NVIDIA libraries

By default, nvidia driver versions are held as to prevent them from auto updating

sudo apt-mark unhold nvidia* libnvidia*

2. Install latest NVIDIA CUDA

Go to CUDA downloads and select target. You'll be provided a command like:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda

Run this to allow update apt-get.

3. Install newer CUDA toolkit

For example:

sudo apt-get install cuda-toolkit-12-2

By now you should see /usr/local/cuda-12.2 (or your version) installed:

$ ls -l /usr/local/
total 36
drwxr-xr-x  2 root root 4096 Sep  5 00:48 bin
lrwxrwxrwx  1 root root   22 Sep  5 00:48 cuda -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root   25 Sep  5 00:48 cuda-12 -> /etc/alternatives/cuda-12
drwxr-xr-x 15 root root 4096 Sep  5 00:48 cuda-12.2
drwxr-xr-x  2 root root 4096 Jun 19 21:39 etc
drwxr-xr-x  2 root root 4096 Jun 19 21:39 games
drwxr-xr-x  2 root root 4096 Jun 19 21:39 include
drwxr-xr-x  3 root root 4096 Jun 19 21:39 lib
lrwxrwxrwx  1 root root    9 Jun 19 21:39 man -> share/man
drwxr-xr-x  2 root root 4096 Jun 19 21:39 sbin
drwxr-xr-x  5 root root 4096 Jul  5 03:43 share
drwxr-xr-x  2 root root 4096 Jun 19 21:39 src

Unfortunately, nvcc will still not use it and version is still outdated.

4. Update environments variables for CUDA

Append to ~/.bashrc the following:

CUDA_VERSION="12.2"
export PATH=/usr/local/cuda-${CUDA_VERSION}/bin${PATH:+:${PATH}}$
export LD_LIBRARY_PATH=/usr/local/cuda-${CUDA_VERSION}/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Then either create a new shell or update current shell with:

$ source ~/.bashrc 

5. Profit

You're good now!

✅ Release 12.2

(tensorml) user@7ff6481e-fbd6-4dda-b12b-ac7b7c1ca4b2:~/axolotl$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
@Glavin001
Copy link
Author

sudo apt install nvidia-cuda-toolkit
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-cuda-toolkit is already the newest version (10.1.243-3).
The following packages were automatically installed and are no longer required:
  linux-headers-5.4.0-152 linux-headers-5.4.0-152-generic linux-image-5.4.0-152-generic linux-modules-5.4.0-152-generic
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 37 not upgraded.
$ sudo apt-get install cuda-toolkit-12-2

this worked!

$ ls -l /usr/local/
total 36
drwxr-xr-x  2 root root 4096 Sep  5 00:48 bin
lrwxrwxrwx  1 root root   22 Sep  5 00:48 cuda -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root   25 Sep  5 00:48 cuda-12 -> /etc/alternatives/cuda-12
drwxr-xr-x 15 root root 4096 Sep  5 00:48 cuda-12.2
drwxr-xr-x  2 root root 4096 Jun 19 21:39 etc
drwxr-xr-x  2 root root 4096 Jun 19 21:39 games
drwxr-xr-x  2 root root 4096 Jun 19 21:39 include
drwxr-xr-x  3 root root 4096 Jun 19 21:39 lib
lrwxrwxrwx  1 root root    9 Jun 19 21:39 man -> share/man
drwxr-xr-x  2 root root 4096 Jun 19 21:39 sbin
drwxr-xr-x  5 root root 4096 Jul  5 03:43 share
drwxr-xr-x  2 root root 4096 Jun 19 21:39 src

Now have cuda-12 installed in /usr/local/!

@Glavin001
Copy link
Author

Still out of sync:

$ nvidia-smi
Tue Sep  5 00:50:23 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:05:00.0 Off |                  N/A |
| 42%   38C    P0             103W / 390W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

@Glavin001
Copy link
Author

Append to ~/.bashrc the following:

CUDA_VERSION="12.2"
export PATH=/usr/local/cuda-${CUDA_VERSION}/bin${PATH:+:${PATH}}$
export LD_LIBRARY_PATH=/usr/local/cuda-${CUDA_VERSION}/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

And now it works!

$ source ~/.bashrc 
(tensorml) user@7ff6481e-fbd6-4dda-b12b-ac7b7c1ca4b2:~/axolotl$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment