Skip to content

Instantly share code, notes, and snippets.

@bogdan-kulynych
Last active December 5, 2023 10:26
Show Gist options
  • Save bogdan-kulynych/f64eb148eeef9696c70d485a76e42c3a to your computer and use it in GitHub Desktop.
Save bogdan-kulynych/f64eb148eeef9696c70d485a76e42c3a to your computer and use it in GitHub Desktop.
Install CUDA 10 on Ubuntu 18.04
# WARNING: These steps seem to not work anymore!
#!/bin/bash
# Purge existign CUDA first
sudo apt --purge remove "cublas*" "cuda*"
sudo apt --purge remove "nvidia*"
# Install CUDA Toolkit 10
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub && sudo apt update
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt update
sudo apt install -y cuda
# Install CuDNN 7 and NCCL 2
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo dpkg -i nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt update
sudo apt install -y libcudnn7 libcudnn7-dev libnccl2 libc-ares-dev
sudo apt autoremove
sudo apt upgrade
# Link libraries to standard locations
sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/
echo 'If everything worked fine, reboot now.'
@flowtide
Copy link

flowtide commented Aug 9, 2020

Whole day I had been looking for a way to install cuda 10.0, but every methods I found were ended up with the message, "cuda : Depends: cuda-10-0 (>= 10.0.130) but it is not going to be installed".
I am quite sure above procedures are only way to install cuda 10.0 version that tensorflow-gpu==1.14 requires on "Ubuntu 18.04.5 LTS".
The very very important thing is that never install "nvidia-driver-***" driver by yourself.
Required nvidia drivers are installed while doing sudo apt install -y cuda=10.0.130-1

In addition, for me following commands are not necessary.

# It seems no need following:
sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/

Here shows my driver is running that twice faster than tensorflow CPU does.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    On   | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P0    N/A /  N/A |   3972MiB /  4046MiB |     66%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1133      G   /usr/lib/xorg/Xorg                139MiB |
|    0   N/A  N/A      1388      G   /usr/bin/gnome-shell              120MiB |
|    0   N/A  N/A     17390      C   python                           3706MiB |
+-----------------------------------------------------------------------------

@shankha117
Copy link

Whole day I had been looking for a way to install cuda 10.0, but every methods I found were ended up with the message, "cuda : Depends: cuda-10-0 (>= 10.0.130) but it is not going to be installed".
I am quite sure above procedures are only way to install cuda 10.0 version that tensorflow-gpu==1.14 requires on "Ubuntu 18.04.5 LTS".
The very very important thing is that never install "nvidia-driver-***" driver by yourself.
Required nvidia drivers are installed while doing sudo apt install -y cuda=10.0.130-1

In addition, for me following commands are not necessary.

# It seems no need following:
sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/

Here shows my driver is running that twice faster than tensorflow CPU does.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    On   | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P0    N/A /  N/A |   3972MiB /  4046MiB |     66%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1133      G   /usr/lib/xorg/Xorg                139MiB |
|    0   N/A  N/A      1388      G   /usr/bin/gnome-shell              120MiB |
|    0   N/A  N/A     17390      C   python                           3706MiB |
+-----------------------------------------------------------------------------

I have a Tesla K80 (Linux) GPU and I want to use TF 1.14.
I have a dev GPU in which I have installed driver 435.21 and CUDA 10; I can see the PID's and memory spike here. However, on a different k8 pod which comes with preinstalled nvidia-driver 450, I am not able to see any processes when running nvidia-smi (CUDA 10).

I am using the same Docker image to install the CUDA,cUDNN, and tensorflow_gpu==1.14.0, which worded with driver version 435.21.

Does anyone have any idea what is going wrong?

@szlaur
Copy link

szlaur commented Jan 8, 2021

this crashed my server..

@VoNhatVinh
Copy link

sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/

ln: target '/usr/local/cuda/nccl/lib/' is not a directory: No such file or directory

In what point is nvcc installed ?

You can have nvcc by install nvdia cuda toolkit: "sudo apt install nvidia-cuda-toolkit". Now you can check the directory again. good luck

@tazz4843
Copy link

While doing sudo apt install -y cuda=10.0.130-1 I ran into a error: cuda : Depends: cuda-10-0 (>= 10.0.130) but it is not going to be installed. Running sudo dpkg --force-all -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb fixed it for me. Note, though, that that will overwrite any existing versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment