Skip to content

Instantly share code, notes, and snippets.

@Mahedi-61
Last active November 16, 2024 10:21
Show Gist options
  • Save Mahedi-61/2a2f1579d4271717d421065168ce6a73 to your computer and use it in GitHub Desktop.
Save Mahedi-61/2a2f1579d4271717d421065168ce6a73 to your computer and use it in GitHub Desktop.
Instructions for CUDA v11.8 and cuDNN 8.9.7 installation on Ubuntu 22.04 for PyTorch 2.1.2
#!/bin/bash
### steps ####
# Verify the system has a cuda-capable gpu
# Download and install the nvidia cuda toolkit and cudnn
# Setup environmental variables
# Verify the installation
###
### to verify your gpu is cuda enable check
lspci | grep -i nvidia
### If you have previous installation remove it first.
sudo apt-get purge nvidia*
sudo apt remove nvidia-*
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt-get autoremove && sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
# system update
sudo apt-get update
sudo apt-get upgrade
# install other import packages
sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
# first get the PPA repository driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# install nvidia driver with dependencies
sudo apt install libnvidia-common-470
sudo apt install libnvidia-gl-470
sudo apt install nvidia-driver-470
# installing CUDA-11.8
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
# setup your paths
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig
# install cuDNN v8.9.7
# First register here: https://developer.nvidia.com/developer-program/signup
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz"
wget https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/11.x/cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz
tar -xvf ${CUDNN_TAR_FILE}
# copy the following files into the cuda toolkit directory.
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
$ sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
# Finally, to verify the installation, check
nvidia-smi
nvcc -V
# install Pytorch (an open source machine learning framework)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
@AndreiMiculita
Copy link

@supersexy I would need an abstract for that version meaning what the script should do, how and when which wasn't provided yet and i am not motivated enough to try to reverse engineer it.

Wow, zero willingness to help, zero willingness to give more detail, just complaining about linter warnings on some code you found on the internet. And malware, talk about crying wolf.

I'll address your complaints:

  • the paths will be reappended every time the script runs source ~/.bashrc. So, once. Hardly overwhelming, and not at all spam. If you care so much about having an aesthetically-pleasing $PATH variable, there are scripts on the internet that can remove duplicates, and it gets reset to $(getconf PATH) anyway (probably on reboot).
  • the expressions in lines 45-46 are supposed to be non-expanding since they are written to .bashrc, not executed

There is zero reverse-engineering needed here as you have full access to the sources and documentation (if we're taking ourselves so seriously about a 71-line script).

@rnrpereira
Copy link

rnrpereira commented Dec 8, 2020

What if I want to install cuda 10.0 ? Is the same sequence of instructions? thanx

@apiszcz
Copy link

apiszcz commented Jan 1, 2021

Unpacking xfonts-base (1:1.0.4+nmu1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-0cZNOW/106-libnvidia-compute-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/107-libnvidia-decode-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/108-libnvidia-encode-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/109-libnvidia-fbc1-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/110-libnvidia-gl-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/111-libnvidia-ifr1-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/112-nvidia-compute-utils-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/117-libnvidia-extra-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/118-nvidia-utils-460_460.27.04-0ubuntu1_amd64.deb
/tmp/apt-dpkg-install-0cZNOW/119-libnvidia-cfg1-460_460.27.04-0ubuntu1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

@hongweilibran
Copy link

the best tutorial that saves my life!!!!!

@guruvishnuvardan
Copy link

Yes, It's the best Tensorflow Installation Guide, I had resolved all my previous issues.

@MAMOMIMOMU
Copy link

If you run this on shell, tensorflow recognizes gpus?
I ran this shell script, and seemed like there was no problem running it, but tensorflow-gpu still doesn't recognize gpus.
Tensorflow-gpu version is 2.3.0 and this version must also be compatible with cuda 10.1 and cudnn 7.6.

@jrincon23
Copy link

If you run this on shell, tensorflow recognizes gpus?
I ran this shell script, and seemed like there was no problem running it, but tensorflow-gpu still doesn't recognize gpus.
Tensorflow-gpu version is 2.3.0 and this version must also be compatible with cuda 10.1 and cudnn 7.6.

Try this
sudo apt-get install -y --no-install-recommends
cuda-10-1
libcudnn7=7.6.0.64-1+cuda10.1
libcudnn7-dev=7.6.0.64-1+cuda10.1;

sudo apt-get install -y --no-install-recommends
libnvinfer6=6.0.1-1+cuda10.1
libnvinfer-dev=6.0.1-1+cuda10.1
libnvinfer-plugin6=6.0.1-1+cuda10.1;

also as someone above said, cuda 10.1 install some cuda 10.2 components

@KnightOfTheMoonlight
Copy link

it works on my pc. Thanks!

@riyaj8888
Copy link

can you tell what changes should be done for cuda 11.1 in 18.04 ubuntu system.

Thanks

@kikefdezl
Copy link

Was having issues getting the TensorFlow Object Detection API to work without errors. This guide worked for Ubuntu 20.04, CUDA 11.2, CuDNN 8.1.0 and TensorFlow 2.6.

Thanks a lot!

@yummyKnight
Copy link

@mnielsen There is extra i in sudo apt install libnividia-gl-470. I think it should be sudo apt install libnvidia-gl-470

@Mahedi-61
Copy link
Author

@yummyKnight Thanks for your correction.

@asadolahzade
Copy link

Can someone tell me is sudo ubuntu-drivers autoinstall the same as three following commands? Do they do the same job?

sudo apt install libnvidia-common-470
sudo apt install libnvidia-gl-470
sudo apt install nvidia-driver-470

@hiraksarkar
Copy link

After installing this I was getting the following (non-fatal) warning

>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-11-24 09:01:58.877869: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-24 09:01:58.899255: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-24 09:01:58.900051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Num GPUs Available:  1

I resolved it by following tensorflow/tensorflow#53184
for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done

@AI-Hunter
Copy link

Thanks for nice repo
I have installed using your instruction. but when type nvidia-smi it shows 11.5. Why, how can I install 11.2?

@apanagar
Copy link

apanagar commented Feb 13, 2022

Works great up till cuDNN, and then I get the following

$ wget https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.1.33/11.2_20210301/cudnn-11.2-linux-x64-v8.1.1.33.tgz
--2022-02-13 13:24:40--  https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.1.33/11.2_20210301/cudnn-11.2-linux-x64-v8.1.1.33.tgz
Resolving developer.nvidia.com (developer.nvidia.com)... 152.195.19.142
Connecting to developer.nvidia.com (developer.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-02-13 13:24:41 ERROR 403: Forbidden.

EDIT: This link worked: wget https://developer.download.nvidia.com/compute/redist/cudnn/v8.1.1/cudnn-11.2-linux-x64-v8.1.1.33.tgz

@addy1997
Copy link

addy1997 commented Apr 3, 2022

I want to install CUDA 11.3 or higher version on Ubuntu 18.04 (which is installed using a Virtual Machine). Which instructions should I follow?

@dademiller360
Copy link

Thanks for nice repo I have installed using your instruction. but when type nvidia-smi it shows 11.5. Why, how can I install 11.2?

I had to implement the end of this tutorial:
https://towardsdatascience.com/installing-multiple-cuda-cudnn-versions-in-ubuntu-fcb6aa5194e2

I used his edit of bash so tensorflow (in my case) can choose what cuda toolkit use, and it worked.

Thank you very much @Mahedi-61, much appreciated

@initialneil
Copy link

initialneil commented Jun 22, 2022

RTX 3090 requires driver version of 515 (not 470).

# install nvidia driver with dependencies
sudo apt install libnvidia-common-515
sudo apt install libnvidia-gl-515
sudo apt install nvidia-driver-515

@addy1997
Copy link

I am wondering whether these work for installing cuda 11.3 on ubuntu 22.04 also?

@saravananpsg
Copy link

Will it work for nvidia-server on ubuntu 20.04 server ?

install nvidia driver with dependencies

sudo apt install libnvidia-common-470-server
sudo apt install libnvidia-gl-470-server
sudo apt install nvidia-driver-470-server

@initialneil
Copy link

Will it work for nvidia-server on ubuntu 20.04 server ?

install nvidia driver with dependencies

sudo apt install libnvidia-common-470-server sudo apt install libnvidia-gl-470-server sudo apt install nvidia-driver-470-server

@saravananpsg It's works for server. I tested. I also changed 470 to 515 to support 3090.

@jackkolb
Copy link

jackkolb commented Jul 22, 2022

I also had to change the version from 470 to 515 for a 1070 TI.

sudo apt install libnvidia-common-515
sudo apt install libnvidia-gl-515
sudo apt install nvidia-driver-515

After installing, if nvidia-smi gives a kernel/client version mismatch error, reboot.

@mshajarrazip
Copy link

This helped A LOT! Thanks!

@Ahanio
Copy link

Ahanio commented Apr 27, 2023

Thank you! It was veeery helpful!

@lolo912
Copy link

lolo912 commented Jun 29, 2023

Thank you verry much you just forgotten a star character after cudnn here :
sudo cp -P cuda/include/cudnn*.h /usr/local/cuda-11.3/include

Verry important because else an error can be encountered while compiling for example pytorch "cudnn_version.h" not found.
Regards

@abpani
Copy link

abpani commented Jul 25, 2024

tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz
xz: (stdin): File format not recognized
tar: Child returned status 1
tar: Error is not recoverable: exiting now
I get this error
I have my tar and xz installed

@abnassiri
Copy link

with this command :
$ sudo cp cudnn--archive/include/cudnn.h /usr/local/cuda/include
message error :
cp: cannot stat 'cudnn--archive/include/cudnn.h': No such file or directory
the same with the other commands :
$ sudo cp cudnn--archive/include/cudnn.h /usr/local/cuda/include
====> cp: cannot stat 'cudnn--archive/include/cudnn.h': No such file or directory
$sudo cp cudnn--archive/include/cudnn.h /usr/local/cuda/include
====> cp: cannot stat 'cudnn--archive/include/cudnn.h': No such file or directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment