TensorDock is pre-installed with CUDA 10.1 (old).
For many use-cases, such as Flash Attention 2 require newer versions of CUDA.
One symptom of this is nvcc
and nvidia-smi
will show different CUDA versions:
nvidia-smi |
nvcc -V |
---|---|
✅ 12.2 | ❌ 10.1 |
$ nvidia-smi
Tue Sep 5 00:50:23 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:05:00.0 Off | N/A |
| 42% 38C P0 103W / 390W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
By default, nvidia driver versions are held as to prevent them from auto updating
sudo apt-mark unhold nvidia* libnvidia*
Go to CUDA downloads and select target. You'll be provided a command like:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
Run this to allow update apt-get
.
For example:
sudo apt-get install cuda-toolkit-12-2
By now you should see /usr/local/cuda-12.2
(or your version) installed:
$ ls -l /usr/local/
total 36
drwxr-xr-x 2 root root 4096 Sep 5 00:48 bin
lrwxrwxrwx 1 root root 22 Sep 5 00:48 cuda -> /etc/alternatives/cuda
lrwxrwxrwx 1 root root 25 Sep 5 00:48 cuda-12 -> /etc/alternatives/cuda-12
drwxr-xr-x 15 root root 4096 Sep 5 00:48 cuda-12.2
drwxr-xr-x 2 root root 4096 Jun 19 21:39 etc
drwxr-xr-x 2 root root 4096 Jun 19 21:39 games
drwxr-xr-x 2 root root 4096 Jun 19 21:39 include
drwxr-xr-x 3 root root 4096 Jun 19 21:39 lib
lrwxrwxrwx 1 root root 9 Jun 19 21:39 man -> share/man
drwxr-xr-x 2 root root 4096 Jun 19 21:39 sbin
drwxr-xr-x 5 root root 4096 Jul 5 03:43 share
drwxr-xr-x 2 root root 4096 Jun 19 21:39 src
Unfortunately, nvcc
will still not use it and version is still outdated.
Append to ~/.bashrc
the following:
CUDA_VERSION="12.2"
export PATH=/usr/local/cuda-${CUDA_VERSION}/bin${PATH:+:${PATH}}$
export LD_LIBRARY_PATH=/usr/local/cuda-${CUDA_VERSION}/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Then either create a new shell or update current shell with:
$ source ~/.bashrc
You're good now!
✅ Release 12.2
(tensorml) user@7ff6481e-fbd6-4dda-b12b-ac7b7c1ca4b2:~/axolotl$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
Append to
~/.bashrc
the following:And now it works!