Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Install CUDA 10 on Ubuntu 18.04
# WARNING: These steps seem to not work anymore!
#!/bin/bash
# Purge existign CUDA first
sudo apt --purge remove "cublas*" "cuda*"
sudo apt --purge remove "nvidia*"
# Install CUDA Toolkit 10
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub && sudo apt update
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt update
sudo apt install -y cuda
# Install CuDNN 7 and NCCL 2
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo dpkg -i nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt update
sudo apt install -y libcudnn7 libcudnn7-dev libnccl2 libc-ares-dev
sudo apt autoremove
sudo apt upgrade
# Link libraries to standard locations
sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/
echo 'If everything worked fine, reboot now.'
@vbaryshev4

This comment has been minimized.

Copy link

@vbaryshev4 vbaryshev4 commented Jul 1, 2019

Works nice and easily. Thanks

@cheremushkin

This comment has been minimized.

Copy link

@cheremushkin cheremushkin commented Jul 22, 2019

Do I need to install drivers separately?

@SaidaSaad

This comment has been minimized.

Copy link

@SaidaSaad SaidaSaad commented Jul 22, 2019

Helllo
I am very new for cuda.
I have ubunt 18.6 and Cuda 9 and i have run that script yesterday.
when i want to run my code in pycharm , it says no NVIDIA GPU device is present: /dev/nvidia0 does not exist

even nvidia-smi command is not working , and when i cheched the version of cuda it still 9 , can any one help me . at least to get my system back.

@nik1806

This comment has been minimized.

Copy link

@nik1806 nik1806 commented Jul 25, 2019

Did you get any error message while installation?

The possible reason for error is multiple versions of cuda, it is always advised to removed all existing cuda versions before installating new one.

Try removing all cuda version by running following command lines:
sudo apt --purge remove "cublas*" "cuda*"
sudo apt --purge remove "nvidia*"

then run the script again.

@tuzhucheng

This comment has been minimized.

Copy link

@tuzhucheng tuzhucheng commented Aug 6, 2019

The fix is this before installing cuda: mkdir -p /usr/share/man/man1. ref: geerlingguy/ansible-role-java#64

@stevedough123

This comment has been minimized.

Copy link

@stevedough123 stevedough123 commented Aug 12, 2019

This did not work for me. How do I undo all of these steps?

@vbaryshev4

This comment has been minimized.

Copy link

@vbaryshev4 vbaryshev4 commented Aug 19, 2019

This did not work for me. How do I undo all of these steps?

sudo apt --fix-broken install

sudo dpkg --configure -a

sudo apt-get clean

dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 sudo dpkg --purge

df -h

sudo apt-get purge nvidia*

sudo apt-get -f install

sudo apt autoremove

sudo apt-get --purge remove "cublas" "cuda*"

sudo apt-get --purge remove "nvidia"

@SK1417

This comment has been minimized.

Copy link

@SK1417 SK1417 commented Aug 27, 2019

I ran this code and I restarted my pc. Now Ubuntu isn't loading the display at all. It stops after the login screen itself. What should I do?

@flo-Ty

This comment has been minimized.

Copy link

@flo-Ty flo-Ty commented Sep 11, 2019

When we run:
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/

We get:
ln: target '/usr/local/cuda/nccl/lib/' is not a directory: No such file or directory

@ali1234

This comment has been minimized.

Copy link

@ali1234 ali1234 commented Sep 11, 2019

These steps don't work because Nvidia's repo forces you to install nvidia-driver-418, which does not support Turing cards, which are probably the reason you are trying to upgrade CUDA in the first place.

Edit: to avoid getting the 418 driver, install one of the toolkit packages, eg cuda-toolkit-10-1 instead of the cuda package. Then you can keep your existing, working driver version. This avoids the black screen when your card is too new for 418.

@nyck33

This comment has been minimized.

Copy link

@nyck33 nyck33 commented Sep 12, 2019

@bogdan-kulynych
Don't we need an nvidia driver first so nvidia-smi show something?
So I did:
sudo ubuntu-drivers autoinstall before starting these instructions and now I got:

sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
ln: target '/usr/local/cuda/nccl/lib/' is not a directory: No such file or directory
@lfaino

This comment has been minimized.

Copy link

@lfaino lfaino commented Oct 24, 2019

Hi,
if i run the code I get 10.1 installed and not 10.0. Any idea why?

@espoirMur

This comment has been minimized.

Copy link

@espoirMur espoirMur commented Oct 25, 2019

Hi,
if i run the code I get 10.1 installed and not 10.0. Any idea why?

Same thing here..

@espoirMur

This comment has been minimized.

Copy link

@espoirMur espoirMur commented Oct 25, 2019

@ifaino, you can try the run file at this link
rename the file with
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
to install it

@John-Almardeny

This comment has been minimized.

Copy link

@John-Almardeny John-Almardeny commented Oct 25, 2019

Hi,
if i run the code I get 10.1 installed and not 10.0. Any idea why?

Same here!!

@John-Almardeny

This comment has been minimized.

Copy link

@John-Almardeny John-Almardeny commented Oct 25, 2019

@ifaino, you can try the run file at this link
rename the file with
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
to install it

It did not work!

@John-Almardeny

This comment has been minimized.

Copy link

@John-Almardeny John-Almardeny commented Oct 25, 2019

This is a working solution for Cuda.10.0 in particular that worked for me: https://gist.github.com/Mahedi-61/2a2f1579d4271717d421065168ce6a73#file-cuda_10-0_installation_on_ubuntu_18-04

@istiaq28

This comment has been minimized.

Copy link

@istiaq28 istiaq28 commented Nov 11, 2019

This worked for me but it install Cuda 10.1 instead of 10.0. Any fix for that?

@daijin12

This comment has been minimized.

Copy link

@daijin12 daijin12 commented Nov 14, 2019

@istiaq28 Just replace
sudo apt install -y cuda
by
sudo apt install -y cuda-10-0

@keukpa

This comment has been minimized.

Copy link

@keukpa keukpa commented Nov 18, 2019

Hi,
if i run the code I get 10.1 installed and not 10.0. Any idea why?

Same thing here..

I was having the same issue, the 'apt update' and 'apt upgrade' will replace cuda-10.0 with cuda-10.1. To resolve this I had to run:

sudo apt install cuda=10.0.130-1

Don't run the upgrade command, if you do then add cuda-10.1 to blocked packages or rerun the above command after removing cuda.

@jayninja

This comment has been minimized.

Copy link

@jayninja jayninja commented Dec 18, 2019

Hi,
if i run the code I get 10.1 installed and not 10.0. Any idea why?

Same thing here..

I was having the same issue, the 'apt update' and 'apt upgrade' will replace cuda-10.0 with cuda-10.1. To resolve this I had to run:

sudo apt install cuda=10.0.130-1

Don't run the upgrade command, if you do then add cuda-10.1 to blocked packages or rerun the above command after removing cuda.

and if you want to see available versions:
apt-cache showpkg cuda

@YodaEmbedding

This comment has been minimized.

Copy link

@YodaEmbedding YodaEmbedding commented Dec 20, 2019

It might be useful to add this to the very first line:

sudo apt list > ~/apt_list_backup.txt

...in case people need to undo some of the changes.

@ploncker

This comment has been minimized.

Copy link

@ploncker ploncker commented Mar 9, 2020

Finally, to verify the installation, check

nvidia-smi
nvcc -V

@kkorovesis

This comment has been minimized.

Copy link

@kkorovesis kkorovesis commented Mar 10, 2020

sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/

ln: target '/usr/local/cuda/nccl/lib/' is not a directory: No such file or directory

In what point is nvcc installed ?

@jurjukn

This comment has been minimized.

Copy link

@jurjukn jurjukn commented Apr 18, 2020

@kkore

sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/

ln: target '/usr/local/cuda/nccl/lib/' is not a directory: No such file or directory

In what point is nvcc installed ?

I did "sudo apt-get cuda toolkit" before this script and then it worked out.

@jpison

This comment has been minimized.

Copy link

@jpison jpison commented Jul 3, 2020

Very useful! Thanks

I needed to downgrade CUDA from10.2 to 10.0 version because Pytorch 1.5.1 does not support Tesla 40 GPUs...

I reinstalled Pytorch 1.2.0 with:
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

Important steps before Pytorch installation:

  1. Use sudo apt install cuda=10.0.130-1 instead sudo apt install cuda
  2. Don't use sudo apt upgrade
  3. Include in ~/.bashrc this line: export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
  4. Warning: nvidia-smi shows 'CUDA Version 11.0' but v10.0 is really installed and working correctly... I don't know why it shows a different version...
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 450.36.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K40c          On   | 00000000:03:00.0 Off |                    0 |
| 28%   57C    P0    66W / 235W |   1058MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
@nikparmar

This comment has been minimized.

Copy link

@nikparmar nikparmar commented Jul 5, 2020

Very useful! Thanks

I needed to downgrade CUDA from10.2 to 10.0 version because Pytorch 1.5.1 does not support Tesla 40 GPUs...

I reinstalled Pytorch 1.2.0 with:
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

Important steps before Pytorch installation:

1. Use `sudo apt install cuda=10.0.130-1` instead `sudo apt install cuda`

2. Don't use `sudo apt upgrade`

3. Include in  ~/.bashrc   this line:   `export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}`

4. Warning: `nvidia-smi` **shows 'CUDA Version 11.0'  but v10.0 is really installed and working correctly**... I don't know why it shows a different version...
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 450.36.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K40c          On   | 00000000:03:00.0 Off |                    0 |
| 28%   57C    P0    66W / 235W |   1058MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |

Worked thanks!!!

@libinruan

This comment has been minimized.

Copy link

@libinruan libinruan commented Jul 18, 2020

Here is the version that works for me. Credits to @jpison and @bogdan-kulynych.

#!/bin/bash

# Purge existign CUDA first
sudo apt --purge remove "cublas*" "cuda*"
sudo apt --purge remove "nvidia*"

# Install CUDA Toolkit 10
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub && sudo apt update
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb

sudo apt update
sudo apt install -y cuda=10.0.130-1

# Install CuDNN 7 and NCCL 2
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo dpkg -i nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt update
sudo apt install -y libcudnn7 libcudnn7-dev libnccl2 libc-ares-dev

sudo apt autoremove
# sudo apt upgrade

# Link libraries to standard locations
# sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
# sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
# sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/

echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'If everything worked fine, reboot now.'
  • Since we are using Deb installation method, we don't need to change the LD_LIBRARY_PATH variables (reference).
  • After running this script, you need to reboot the system.
  • Type nvidia-smi and nvcc --version to verify your installation.
@flowtide

This comment has been minimized.

Copy link

@flowtide flowtide commented Aug 9, 2020

Whole day I had been looking for a way to install cuda 10.0, but every methods I found were ended up with the message, "cuda : Depends: cuda-10-0 (>= 10.0.130) but it is not going to be installed".
I am quite sure above procedures are only way to install cuda 10.0 version that tensorflow-gpu==1.14 requires on "Ubuntu 18.04.5 LTS".
The very very important thing is that never install "nvidia-driver-***" driver by yourself.
Required nvidia drivers are installed while doing sudo apt install -y cuda=10.0.130-1

In addition, for me following commands are not necessary.

# It seems no need following:
sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/

Here shows my driver is running that twice faster than tensorflow CPU does.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    On   | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P0    N/A /  N/A |   3972MiB /  4046MiB |     66%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1133      G   /usr/lib/xorg/Xorg                139MiB |
|    0   N/A  N/A      1388      G   /usr/bin/gnome-shell              120MiB |
|    0   N/A  N/A     17390      C   python                           3706MiB |
+-----------------------------------------------------------------------------
@shankha117

This comment has been minimized.

Copy link

@shankha117 shankha117 commented Oct 18, 2020

Whole day I had been looking for a way to install cuda 10.0, but every methods I found were ended up with the message, "cuda : Depends: cuda-10-0 (>= 10.0.130) but it is not going to be installed".
I am quite sure above procedures are only way to install cuda 10.0 version that tensorflow-gpu==1.14 requires on "Ubuntu 18.04.5 LTS".
The very very important thing is that never install "nvidia-driver-***" driver by yourself.
Required nvidia drivers are installed while doing sudo apt install -y cuda=10.0.130-1

In addition, for me following commands are not necessary.

# It seems no need following:
sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/

Here shows my driver is running that twice faster than tensorflow CPU does.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    On   | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P0    N/A /  N/A |   3972MiB /  4046MiB |     66%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1133      G   /usr/lib/xorg/Xorg                139MiB |
|    0   N/A  N/A      1388      G   /usr/bin/gnome-shell              120MiB |
|    0   N/A  N/A     17390      C   python                           3706MiB |
+-----------------------------------------------------------------------------

I have a Tesla K80 (Linux) GPU and I want to use TF 1.14.
I have a dev GPU in which I have installed driver 435.21 and CUDA 10; I can see the PID's and memory spike here. However, on a different k8 pod which comes with preinstalled nvidia-driver 450, I am not able to see any processes when running nvidia-smi (CUDA 10).

I am using the same Docker image to install the CUDA,cUDNN, and tensorflow_gpu==1.14.0, which worded with driver version 435.21.

Does anyone have any idea what is going wrong?

@szlaur

This comment has been minimized.

Copy link

@szlaur szlaur commented Jan 8, 2021

this crashed my server..

@VoNhatVinh

This comment has been minimized.

Copy link

@VoNhatVinh VoNhatVinh commented Mar 25, 2021

sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/

ln: target '/usr/local/cuda/nccl/lib/' is not a directory: No such file or directory

In what point is nvcc installed ?

You can have nvcc by install nvdia cuda toolkit: "sudo apt install nvidia-cuda-toolkit". Now you can check the directory again. good luck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment