Skip to content

Instantly share code, notes, and snippets.

@kmhofmann
Last active October 13, 2024 14:55
Show Gist options
  • Save kmhofmann/cee7c0053da8cc09d62d74a6a4c1c5e4 to your computer and use it in GitHub Desktop.
Save kmhofmann/cee7c0053da8cc09d62d74a6a4c1c5e4 to your computer and use it in GitHub Desktop.
Installing the NVIDIA driver, CUDA and cuDNN on Linux

Installing the NVIDIA driver, CUDA and cuDNN on Linux (Ubuntu 20.04)

This is a companion piece to my instructions on building TensorFlow from source. In particular, the aim is to install the following pieces of software

on an Ubuntu Linux system, in particular Ubuntu 20.04.

At the time of writing (2020-08-06), these were the latest available versions. As a disclaimer, please note that I am not interested in running an outdated Ubuntu version or installing a CUDA/cuDNN version that is not the latest. Therefore, the below instructions may or may not be useful to you. Please also note that the instructions are likely outdated, since I only update them occasionally. Don't just copy these instructions, but check what the respective latest versions are and use these instead!

Installing the NVIDIA driver

Download and install the latest NVIDIA graphics driver from here: https://www.nvidia.com/en-us/drivers/unix/. Note that every CUDA version requires a minimum version of the driver; check this beforehand.

Ubuntu 20.04 currently offers installation of the NVIDIA driver version 440.100 through its built-in 'Additional Drivers' mechanism, which should be sufficient for CUDA 10.2. CUDA 11.0 appears to require a newer version of the NVIDIA driver, so we're going to install this manually.

Download and install the latest NVIDIA graphics driver from here: https://www.nvidia.com/en-us/drivers/unix/.

sudo sh NVIDIA-Linux-x86_64-450.57.run

The CUDA runfile also includes a version of the NVIDIA graphics driver, but I like to separate installing either, as the version supplied with CUDA Is not necessarily the latest version of the driver.

Installing CUDA

Download the latest CUDA version here. For example, I downloaded:

$ wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run

Thankfully, CUDA 11 currently supports the up-to-date Ubuntu version, 20.04, so we don't need to jump through hoops to deal with an unsupported GNU version error as in previous versions of this document. Simply install as per the official instructions:

$ sudo sh cuda_11.0.2_450.51.05_linux.run

You may need to confirm that the display driver is already installed, and de-select installation of the display driver.

Once finished, you should see a summary like this:

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.0/
Samples:  Installed in /home/michael/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-11.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.0/lib64, or, add /usr/local/cuda-11.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-11.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Do what the instructions given in the summary say and add the given directories to your PATH and LD_LIBRARY_PATH. For example by adding the following lines to your .bashrc, .zshrc, or whatever shell you are using:

export PATH=/usr/local/cuda-11.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH

Installing cuDNN

Just go here and follow the instructions. You'll have to log in, so downloading of the right cuDNN binary packages cannot be easily automated. Meh.

Once downloaded, un-tar the file and copy the contents to their respective locations:

$ tar -xzvf cudnn-11.0-linux-x64-v8.0.2.39.tgz

$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
@tyuvraj
Copy link

tyuvraj commented May 22, 2021

Update:

Solved the problem by removing all libcudnn*.so and libcudnn*.so.8 & again creating hardlinks for _libcudnn*.8.0.5. Look at the answer given below:
https://askubuntu.com/questions/1025928/why-do-i-get-sbin-ldconfig-real-usr-local-cuda-lib64-libcudnn-so-7-is-not-a

Hey, I downloaded using package manager. Everything went great and i was able to use tensorflow on gpu. But, while running ldconfig, I see the following error:
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8 is not a symbolic link /sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 is not a symbolic link /sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8 is not a symbolic link /sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8 is not a symbolic link /sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8 is not a symbolic link /sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8 is not a symbolic link /sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8 is not a symbolic link

To check I went to:
/usr/local/cuda-11.0/targets/x86_64-linux/lib
and did:
ls -ln

Among all the symlinks I got as my ouput, I saw:
-rwxr-xr-x 1 0 0 98957080 Nov 18 13:54 libcudnn_adv_infer.so
-rwxr-xr-x 1 0 0 98957080 Nov 18 13:54 libcudnn_adv_infer.so.8
-rwxr-xr-x 1 0 0 98957080 Nov 18 13:54 libcudnn_adv_infer.so.8.0.5
-rwxr-xr-x 1 0 0 65344120 Nov 18 13:54 libcudnn_adv_train.so
-rwxr-xr-x 1 0 0 65344120 Nov 18 13:54 libcudnn_adv_train.so.8
-rwxr-xr-x 1 0 0 65344120 Nov 18 13:54 libcudnn_adv_train.so.8.0.5
-rwxr-xr-x 1 0 0 1288305728 Nov 18 13:55 libcudnn_cnn_infer.so
-rwxr-xr-x 1 0 0 1288305728 Nov 18 13:55 libcudnn_cnn_infer.so.8
-rwxr-xr-x 1 0 0 1288305728 Nov 18 13:55 libcudnn_cnn_infer.so.8.0.5
-rwxr-xr-x 1 0 0 58705816 Nov 18 13:55 libcudnn_cnn_train.so
-rwxr-xr-x 1 0 0 58705816 Nov 18 13:55 libcudnn_cnn_train.so.8
-rwxr-xr-x 1 0 0 58705816 Nov 18 13:55 libcudnn_cnn_train.so.8.0.5
-rwxr-xr-x 1 0 0 251390696 Nov 18 13:55 libcudnn_ops_infer.so
-rwxr-xr-x 1 0 0 251390696 Nov 18 13:55 libcudnn_ops_infer.so.8
-rwxr-xr-x 1 0 0 251390696 Nov 18 13:55 libcudnn_ops_infer.so.8.0.5
-rwxr-xr-x 1 0 0 26002104 Nov 18 13:55 libcudnn_ops_train.so
-rwxr-xr-x 1 0 0 26002104 Nov 18 13:55 libcudnn_ops_train.so.8
-rwxr-xr-x 1 0 0 26002104 Nov 18 13:55 libcudnn_ops_train.so.8.0.5
-rwxr-xr-x 1 0 0 158264 Nov 18 13:55 libcudnn.so
-rwxr-xr-x 1 0 0 158264 Nov 18 13:55 libcudnn.so.8
-rwxr-xr-x 1 0 0 158264 Nov 18 13:55 libcudnn.so.8.0.5

I got confused whether this files are to be completely removed or symlinks are to be created for then, and while copy pasting they got messed.
While searching on the web for this answer I came across one command for checking cudnn:
/sbin/ldconfig -N -v $(sed ‘s/:/ /’ <<< $LD_LIBRARY_PATH) 2>/dev/null | grep libcudnn

I ran the above command (I don't know what it means, thought of understanding how it works after seeing the output, but got even more confused), and got something like below:
sed: -e expression #1, char 1: unknown command: '�'
libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.0.5
libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.0.5
libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.0.5
libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.0.5
libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.0.5
libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.0.5
libcudnn.so.8 -> libcudnn.so.8.0.5

Now, I don't know what to do whether to generate symlinks or remove libcudnn* files from /usr/local/cuda-11.0/targets/x86_64-linux/lib.
Kindly help me.

Thank you in advance :)

PS: If I have to create symlinks, then it would be helpful if I can get an example using one of the ones that have to be created, I just started using linux and am not to familiar with it. :)

I know I am way late to answer this

cd  /usr/local/cuda/lib64
sudo ln -sf libcudnn_adv_infer.so.8.2.0 libcudnn_adv_infer.so.8 && sudo ln -sf libcudnn_adv_infer.so.8 libcudnn_adv_infer.so
sudo ln -sf libcudnn_ops_infer.so.8.2.0 libcudnn_ops_infer.so.8 && sudo ln -sf libcudnn_ops_infer.so.8 libcudnn_ops_infer.so
sudo ln -sf libcudnn_adv_train.so.8.2.0 libcudnn_adv_train.so.8 && sudo ln -sf libcudnn_adv_train.so.8 libcudnn_adv_train.so
sudo ln -sf libcudnn_cnn_infer.so.8.2.0 libcudnn_cnn_infer.so.8 && sudo ln -sf libcudnn_cnn_infer.so.8 libcudnn_cnn_infer.so
sudo ln -sf libcudnn_cnn_train.so.8.2.0 libcudnn_cnn_train.so.8 && sudo ln -sf libcudnn_cnn_train.so.8 libcudnn_cnn_train.so
sudo ln -sf libcudnn_adv_infer.so.8.2.0 libcudnn_adv_infer.so.8 && sudo ln -sf libcudnn_adv_infer.so.8 libcudnn_adv_infer.so
sudo ln -sf libcudnn_ops_train.so.8.2.0 libcudnn_ops_train.so.8 && sudo ln -sf libcudnn_ops_train.so.8 libcudnn_ops_train.so

@Wallkhan
Copy link

Wallkhan commented May 27, 2021

Hello!
I am going to start a project on Object Detection so I want to use the Tensorflow framework but can Tesorflow supports Cuda-11.0 or I need to install any other version on Cuda.
Can anyone brief me about how I should set up my development environment? I am using Ubuntu-20.10 with Nvidia's GPU.

You can use cuda-11.0, It is supported now as per this https://www.tensorflow.org/install/source#gpu

Thank you for your response!

I successfully installed the cuda-11.0 and it is working great with tensorflow.

I think the best configuration is:

* [Cuda -  11.0](https://developer.nvidia.com/cuda-11.0-download-archive)

* [Cudnn - 8.0.4](https://developer.nvidia.com/cudnn)

* Tesorflow - 2.4 +

This worked for me

Hey I followed your step for Tensorflow 2.4 but when I installed cuda 11.0.2 and cudnn 8.0.4 and check nvidia drivers I get the following
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 165... On | 00000000:57:00.0 Off | N/A |
| N/A 43C P8 3W / N/A | 465MiB / 3911MiB | 11% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1581 G /usr/lib/xorg/Xorg 80MiB |
| 0 N/A N/A 2191 G /usr/lib/xorg/Xorg 156MiB |
| 0 N/A N/A 2365 G /usr/bin/gnome-shell 149MiB |
| 0 N/A N/A 5048 G ...f_4654.log --shared-files 14MiB |
| 0 N/A N/A 12847 C /usr/bin/python3 53MiB |
+-----------------------------------------------------------------------------+

and when I check tensorflow gpu with
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

I get the output
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 4696811581786634953
]
Num GPUs Available: 0

UPDATE:
It was the latest NVIDIA driver that was causing it. I had to downgrade to 450

@brc7
Copy link

brc7 commented Aug 4, 2021

Thank you for putting this up! Very helpful and worked for me

@johndpope
Copy link

johndpope commented Aug 4, 2021

Pro tip: once you get a working system - make a backup using timeshift
https://github.com/teejee2008/timeshift
Work files are deliverately excluded and that way can roll back OS / nvidia drivers etc to a previous working restore point. The restore process will prompt you with list of files that will be deleted / recovered. Have used it 5-6 times this year as some frameworks / machine learning projects error with latest RTX 30X0 hardware. Some need gcc 10.2 but OS wants to update to 10.3 (which breaks cuda toolkit 11.4 for some ML stuff) - wasted so many hours faffing around - when all you want to do is have it work. I've also had cudnn bizzarely go missing - even though it's clearly in the checkpoint where you can browse files and restore them.
nvidia Driver downgraded - which has forced me to disable updates.

I have pytorch 1.8 + cudatoolkit 11.4 / 470 driver is all working for me (today).
(As nvidia labs has abandoned tensorflow in favour of pytorch - I've steered away from those projects.)

@gourav2001k
Copy link

Tip: adding -av flag while copying the cudnn libs solves the problem of symlinks. The exacts commands would be

$ sudo cp -av cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp -av cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

@MeAmarP
Copy link

MeAmarP commented Feb 17, 2022

Thanks

@russelljjarvis
Copy link

POP

I run PopOS, I downloaded and installed the CUDA version of PopOS onto a Dell. I find that in practice the Cuda driver isn't compatible with the gcc compiler, repeating the Cuda install instructions doesn't help with this either.

I am exploring ways to install Cuda besides the popOs supported ways.

@ggtux
Copy link

ggtux commented Nov 1, 2023

Hi, is there any newer installation guid for Ubuntu 22?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment