Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MihailCosmin/affa6b1b71b43787e9228c25fe15aeba to your computer and use it in GitHub Desktop.
Save MihailCosmin/affa6b1b71b43787e9228c25fe15aeba to your computer and use it in GitHub Desktop.
Instructions for CUDA v11.8 and cuDNN 8.7 installation on Ubuntu 22.04 for PyTorch 2.0.0
#!/bin/bash
### steps ####
# verify the system has a cuda-capable gpu
# download and install the nvidia cuda toolkit and cudnn
# setup environmental variables
# verify the installation
###
### to verify your gpu is cuda enable check
lspci | grep -i nvidia
### If you have previous installation remove it first.
sudo apt purge nvidia* -y
sudo apt remove nvidia-* -y
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt autoremove -y && sudo apt autoclean -y
sudo rm -rf /usr/local/cuda*
# system update
sudo apt update && sudo apt upgrade -y
# install other import packages
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
# first get the PPA repository driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# find recommended driver versions for you
ubuntu-drivers devices
# install nvidia driver with dependencies
sudo apt install libnvidia-common-515 libnvidia-gl-515 nvidia-driver-515 -y
# reboot
sudo reboot now
# verify that the following command works
nvidia-smi
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
# Update and upgrade
sudo apt update && sudo apt upgrade -y
# installing CUDA-11.8
sudo apt install cuda-11-8 -y
# setup your paths
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig
# install cuDNN v11.8
# First register here: https://developer.nvidia.com/developer-program/signup
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz"
sudo wget https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
sudo tar -xvf ${CUDNN_TAR_FILE}
sudo mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive cuda
# copy the following files into the cuda toolkit directory.
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-11.8/include
sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/
sudo chmod a+r /usr/local/cuda-11.8/lib64/libcudnn*
# Finally, to verify the installation, check
nvidia-smi
nvcc -V
# install Pytorch (an open source machine learning framework)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
@filmo
Copy link

filmo commented Jul 31, 2023

HI,

I was following this exactly and I got the follow when I got to sudo apt install cuda-11-8

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libnvidia-extra-525 : Conflicts: libnvidia-extra
 libnvidia-extra-535 : Conflicts: libnvidia-extra
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.

I think libnvidia-extra-525 is added when running the install nvidia-driver-515 command. To fix this I had to insert an sudo apt upgrade into the gist.

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
sudo apt-get update

#### Added This Line #####
sudo apt upgrade
#######################

 # installing CUDA-11.8
sudo apt install cuda-11-8

After which running sudo apt install cuda-11-8 seems to have worked.

At the end I get:

nvcc -V
(base) philglau@ubuntu-22LTS:~/Downloads$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

But when I run nvidia-smi it shows:
Driver Version: 535.54.03 CUDA Version: 12.2

Not sure why it says CUDA 12.2 instead of 11.8 in nvidia-smi?? Perhaps this is only related to the graphics driver??

in /usr/local I see:

drwxr-xr-x  2 root root 4096 Jul 30 21:09 bin/
lrwxrwxrwx  1 root root   22 Jul 30 21:09 cuda -> /etc/alternatives/cuda/
lrwxrwxrwx  1 root root   25 Jul 30 21:09 cuda-11 -> /etc/alternatives/cuda-11/
drwxr-xr-x 15 root root 4096 Jul 30 21:09 cuda-11.8/

In the /usr/local/cuda/version.json file it lists:

   "nvidia_driver" : {
      "name" : "NVIDIA Linux Driver",
      "version" : "520.61.05"
   },

So perhaps running 'apt upgrade' was the wrong thing to do?? Do I need to downgrade my driver to 520.61.05 in order to make all this work correctly??

@mkabatek
Copy link

mkabatek commented Aug 1, 2023

@filmo I got it working with the following step:

# Need to sudo apt-get upgrade or the next step wont work
sudo apt upgrade

# installing CUDA-11.8
sudo apt install cuda-11-8

I think this is the proper way to install it.

@priyamshah112
Copy link

priyamshah112 commented Aug 4, 2023

-> sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/
cp: target '/usr/local/cuda-11.8/lib64/' is not a directory

fix: mkdir /usr/local/cuda-11.8/lib64
and if priyamshah@priyamshah-System-Product-Name:~$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.86

do a sudo reboot

this fixes nvidia-smi

but nvcc -V is broken

on trying sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc

it says

The following packages have unmet dependencies:
libcuinj64-11.5 : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
libnvidia-compute-495-server (>= 495) but it is not installable or
libcuda.so.1 (>= 495) or
libcuda-11.5-1
libnvidia-ml-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
libnvidia-compute-495-server (>= 495) but it is not installable or
libnvidia-ml.so.1 (>= 495)
nvidia-cuda-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
libnvidia-compute-495-server (>= 495) but it is not installable or
libcuda.so.1 (>= 495) or
libcuda-11.5-1
Recommends: libnvcuvid1 but it is not installable

follow this link https://stackoverflow.com/questions/66380789/nvidia-driver-installation-unmet-dependencies
[Unchecking the cuda repo from Software & Updates did the trick.]

then try again sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc

this should fix nvcc -V

@emlcpfx
Copy link

emlcpfx commented Aug 19, 2023

Thank you for this thread, I would have never figured this out.

@toebee82
Copy link

Fantastic! Thanks so much! Had to do it a couple times. Ended up just replacing 515 with 535 for the NVIDIA drivers and it worked!

Avoided some of the other elaborate schemes - including NVIDIAs own, very confusing and lengthy guide.

@noamsgl
Copy link

noamsgl commented Aug 30, 2023

Worked like a charm (with the fix mentioned by @filmo and @mkabatek)

@wassname
Copy link

wassname commented Oct 6, 2023

Perhaps this regex would work better, getting libnvidia, kernel modules, etc

sudo apt-get purge `.*nvidia.*`
sudo apt remove `.*nvidia.*`

@qinchuanhui
Copy link

I just used the 535 version NVIDIA drivers mentioned by @toebee82. When using nvidia-smi after all the installation, it showed "Failed to initialize NVML: Driver/library version mismatch";
Then I reboot the machine, and all of them worked, but with version 520 (not 535). I guess it means to align with the 11.8 CUDA_runtime_toolkits.
Btw, about the different cuda-version showed in nvidia-smi and nvcc, there's an answer: https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi

@rsmath
Copy link

rsmath commented Nov 21, 2023

One should never do this
sudo rm -rf /usr/local/cuda*
Apt gets confused about what it expects to be there and what is actually there. If something needs to be removed, use apt purge, similar to pip uninstall.

@wbreslin951
Copy link

Does not work with 545 drivers. I just used the 515 drivers in the command (which show up as 525 in smi?) but it seems to be working now. thanks for the thread. ive been through every tut and this is the only one thats been successful

@rsmath
Copy link

rsmath commented Nov 26, 2023

@wbreslin951, curious, do your nvidia-smi and nvcc --version show the same cuda version being used? If so, which version is it?

@cirr8765
Copy link

cirr8765 commented Jan 3, 2024

SOLVED: https://forums.developer.nvidia.com/t/ubuntu-cuda-11-8-package-wrong-dependency-on-cuda-drivers/238891
When running sudo apt install cuda -y you can specify the current nvidia driver version, preventing the installer from upgrading:

sudo apt install cuda-11-8 cuda-drivers=535.129.03-1

I need to run the 535 drivers, but after sudo apt install cuda-11-8 -y it automatically switches over to 545 which then causes:

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 545.23

But, when I go into "Software and Updates" and try to switch back it complains about unmet dependencies, and also all files in /usr/local/cuda-11.8/ except for ./targets/ is automatically deleted at this stage !?

@cirr8765
Copy link

cirr8765 commented Jan 3, 2024

-> sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/ cp: target '/usr/local/cuda-11.8/lib64/' is not a directory

fix: mkdir /usr/local/cuda-11.8/lib64 and if priyamshah@priyamshah-System-Product-Name:~$ nvidia-smi Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.86

do a sudo reboot

this fixes nvidia-smi

but nvcc -V is broken

on trying sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc

it says

The following packages have unmet dependencies: libcuinj64-11.5 : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libcuda.so.1 (>= 495) or libcuda-11.5-1 libnvidia-ml-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libnvidia-ml.so.1 (>= 495) nvidia-cuda-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libcuda.so.1 (>= 495) or libcuda-11.5-1 Recommends: libnvcuvid1 but it is not installable

follow this link https://stackoverflow.com/questions/66380789/nvidia-driver-installation-unmet-dependencies [Unchecking the cuda repo from Software & Updates did the trick.]

then try again sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc

this should fix nvcc -V

I am not sure this is a good solution as you have just installed the cuda-toolkit and if you do this you risk running into dependency problems. The problem could just be, as it was for me, that you didn't sucessfully add /usr/local/cuda-11.8/bin to $PATH. First take a look in /usr/local/cuda-11.8/bin, if nvcc is in there, just try to add it again, i.e. run

export PATH=/usr/local/cuda-11.8/bin:$PATH

and check your path with echo $PATH to see if it's in there. If this works, simply add the export line at the bottom of your ~/.bashrc to make it permanent.

@joseagraz
Copy link

Thanks for such great tutorial, made my own referencing yours
https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22

@samuponz
Copy link

@filmo "Not sure why it says CUDA 12.2 instead of 11.8 in nvidia-smi?? Perhaps this is only related to the graphics driver??"

You are right. nvidia-smi shows the latest version of CUDA supported by your GPU drivers, not the installed version of CUDA. Check this.

@zizimars
Copy link

Thank you! After lots of days, it works! Instead of 515, I put 525.

@kazmifactor
Copy link

kazmifactor commented Feb 7, 2024

I get this error even after trying the fixen given by @filmo and @mkabatek.


inp@inp-Z790-GAMING-X:~$ sudo apt install cuda-11-8
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda-11-8 : Depends: cuda-runtime-11-8 (>= 11.8.0) but it is not going to be installed
             Depends: cuda-demo-suite-11-8 (>= 11.8.86) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

I have even tried to install 515, 525, 535.

i have installed ubuntu 20.04
can anyone please help. @MihailCosmin @filmo @mkabatek

@joseagraz
Copy link

I ran into the same error very early on. Try my recipe at the link below on a fresh Ubuntu copy. The recipe was tested on different PC, but using the same GPU. Good luck!
https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22

@SonOfSkywalker
Copy link

Thank you so much, this script is marvelous !

@hawkiyc
Copy link

hawkiyc commented Apr 19, 2024

Gosh, you saved my day. I finally solved my computing env. Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment