Skip to content

Instantly share code, notes, and snippets.

@Mahedi-61
Last active November 16, 2024 10:21
Show Gist options
  • Save Mahedi-61/2a2f1579d4271717d421065168ce6a73 to your computer and use it in GitHub Desktop.
Save Mahedi-61/2a2f1579d4271717d421065168ce6a73 to your computer and use it in GitHub Desktop.
Instructions for CUDA v11.8 and cuDNN 8.9.7 installation on Ubuntu 22.04 for PyTorch 2.1.2
#!/bin/bash
### steps ####
# Verify the system has a cuda-capable gpu
# Download and install the nvidia cuda toolkit and cudnn
# Setup environmental variables
# Verify the installation
###
### to verify your gpu is cuda enable check
lspci | grep -i nvidia
### If you have previous installation remove it first.
sudo apt-get purge nvidia*
sudo apt remove nvidia-*
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt-get autoremove && sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
# system update
sudo apt-get update
sudo apt-get upgrade
# install other import packages
sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
# first get the PPA repository driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# install nvidia driver with dependencies
sudo apt install libnvidia-common-470
sudo apt install libnvidia-gl-470
sudo apt install nvidia-driver-470
# installing CUDA-11.8
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
# setup your paths
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig
# install cuDNN v8.9.7
# First register here: https://developer.nvidia.com/developer-program/signup
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz"
wget https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/11.x/cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz
tar -xvf ${CUDNN_TAR_FILE}
# copy the following files into the cuda toolkit directory.
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
$ sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
# Finally, to verify the installation, check
nvidia-smi
nvcc -V
# install Pytorch (an open source machine learning framework)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
@apiszcz
Copy link

apiszcz commented Jan 1, 2021

Unpacking xfonts-base (1:1.0.4+nmu1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-0cZNOW/106-libnvidia-compute-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/107-libnvidia-decode-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/108-libnvidia-encode-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/109-libnvidia-fbc1-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/110-libnvidia-gl-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/111-libnvidia-ifr1-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/112-nvidia-compute-utils-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/117-libnvidia-extra-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/118-nvidia-utils-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/119-libnvidia-cfg1-460_460.27.04-0ubuntu1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

@hongweilibran
Copy link

the best tutorial that saves my life!!!!!

@guruvishnuvardan
Copy link

Yes, It's the best Tensorflow Installation Guide, I had resolved all my previous issues.

@MAMOMIMOMU
Copy link

If you run this on shell, tensorflow recognizes gpus?
I ran this shell script, and seemed like there was no problem running it, but tensorflow-gpu still doesn't recognize gpus.
Tensorflow-gpu version is 2.3.0 and this version must also be compatible with cuda 10.1 and cudnn 7.6.

@jrincon23
Copy link

If you run this on shell, tensorflow recognizes gpus?
I ran this shell script, and seemed like there was no problem running it, but tensorflow-gpu still doesn't recognize gpus.
Tensorflow-gpu version is 2.3.0 and this version must also be compatible with cuda 10.1 and cudnn 7.6.

Try this
sudo apt-get install -y --no-install-recommends
cuda-10-1
libcudnn7=7.6.0.64-1+cuda10.1
libcudnn7-dev=7.6.0.64-1+cuda10.1;

sudo apt-get install -y --no-install-recommends
libnvinfer6=6.0.1-1+cuda10.1
libnvinfer-dev=6.0.1-1+cuda10.1
libnvinfer-plugin6=6.0.1-1+cuda10.1;

also as someone above said, cuda 10.1 install some cuda 10.2 components

@KnightOfTheMoonlight
Copy link

it works on my pc. Thanks!

@riyaj8888
Copy link

can you tell what changes should be done for cuda 11.1 in 18.04 ubuntu system.

Thanks

@kikefdezl
Copy link

Was having issues getting the TensorFlow Object Detection API to work without errors. This guide worked for Ubuntu 20.04, CUDA 11.2, CuDNN 8.1.0 and TensorFlow 2.6.

Thanks a lot!

@yummyKnight
Copy link

@mnielsen There is extra i in sudo apt install libnividia-gl-470. I think it should be sudo apt install libnvidia-gl-470

@Mahedi-61
Copy link
Author

@yummyKnight Thanks for your correction.

@asadolahzade
Copy link

Can someone tell me is sudo ubuntu-drivers autoinstall the same as three following commands? Do they do the same job?

sudo apt install libnvidia-common-470
sudo apt install libnvidia-gl-470
sudo apt install nvidia-driver-470

@hiraksarkar
Copy link

After installing this I was getting the following (non-fatal) warning

>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-11-24 09:01:58.877869: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-24 09:01:58.899255: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-24 09:01:58.900051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Num GPUs Available:  1

I resolved it by following tensorflow/tensorflow#53184
for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done

@AI-Hunter
Copy link

Thanks for nice repo
I have installed using your instruction. but when type nvidia-smi it shows 11.5. Why, how can I install 11.2?

@apanagar
Copy link

apanagar commented Feb 13, 2022

Works great up till cuDNN, and then I get the following

$ wget https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.1.33/11.2_20210301/cudnn-11.2-linux-x64-v8.1.1.33.tgz
--2022-02-13 13:24:40--  https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.1.33/11.2_20210301/cudnn-11.2-linux-x64-v8.1.1.33.tgz
Resolving developer.nvidia.com (developer.nvidia.com)... 152.195.19.142
Connecting to developer.nvidia.com (developer.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-02-13 13:24:41 ERROR 403: Forbidden.

EDIT: This link worked: wget https://developer.download.nvidia.com/compute/redist/cudnn/v8.1.1/cudnn-11.2-linux-x64-v8.1.1.33.tgz

@addy1997
Copy link

addy1997 commented Apr 3, 2022

I want to install CUDA 11.3 or higher version on Ubuntu 18.04 (which is installed using a Virtual Machine). Which instructions should I follow?

@dademiller360
Copy link

Thanks for nice repo I have installed using your instruction. but when type nvidia-smi it shows 11.5. Why, how can I install 11.2?

I had to implement the end of this tutorial:
https://towardsdatascience.com/installing-multiple-cuda-cudnn-versions-in-ubuntu-fcb6aa5194e2

I used his edit of bash so tensorflow (in my case) can choose what cuda toolkit use, and it worked.

Thank you very much @Mahedi-61, much appreciated

@initialneil
Copy link

initialneil commented Jun 22, 2022

RTX 3090 requires driver version of 515 (not 470).

# install nvidia driver with dependencies
sudo apt install libnvidia-common-515
sudo apt install libnvidia-gl-515
sudo apt install nvidia-driver-515

@addy1997
Copy link

I am wondering whether these work for installing cuda 11.3 on ubuntu 22.04 also?

@saravananpsg
Copy link

Will it work for nvidia-server on ubuntu 20.04 server ?

install nvidia driver with dependencies

sudo apt install libnvidia-common-470-server
sudo apt install libnvidia-gl-470-server
sudo apt install nvidia-driver-470-server

@initialneil
Copy link

Will it work for nvidia-server on ubuntu 20.04 server ?

install nvidia driver with dependencies

sudo apt install libnvidia-common-470-server sudo apt install libnvidia-gl-470-server sudo apt install nvidia-driver-470-server

@saravananpsg It's works for server. I tested. I also changed 470 to 515 to support 3090.

@jackkolb
Copy link

jackkolb commented Jul 22, 2022

I also had to change the version from 470 to 515 for a 1070 TI.

sudo apt install libnvidia-common-515
sudo apt install libnvidia-gl-515
sudo apt install nvidia-driver-515

After installing, if nvidia-smi gives a kernel/client version mismatch error, reboot.

@mshajarrazip
Copy link

This helped A LOT! Thanks!

@Ahanio
Copy link

Ahanio commented Apr 27, 2023

Thank you! It was veeery helpful!

@lolo912
Copy link

lolo912 commented Jun 29, 2023

Thank you verry much you just forgotten a star character after cudnn here :
sudo cp -P cuda/include/cudnn*.h /usr/local/cuda-11.3/include

Verry important because else an error can be encountered while compiling for example pytorch "cudnn_version.h" not found.
Regards

@abpani
Copy link

abpani commented Jul 25, 2024

tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz
xz: (stdin): File format not recognized
tar: Child returned status 1
tar: Error is not recoverable: exiting now
I get this error
I have my tar and xz installed

@abnassiri
Copy link

with this command :
$ sudo cp cudnn--archive/include/cudnn.h /usr/local/cuda/include
message error :
cp: cannot stat 'cudnn--archive/include/cudnn.h': No such file or directory
the same with the other commands :
$ sudo cp cudnn--archive/include/cudnn.h /usr/local/cuda/include
====> cp: cannot stat 'cudnn--archive/include/cudnn.h': No such file or directory
$sudo cp cudnn--archive/include/cudnn.h /usr/local/cuda/include
====> cp: cannot stat 'cudnn--archive/include/cudnn.h': No such file or directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment