Skip to content

Instantly share code, notes, and snippets.

@bzamecnik
Last active January 9, 2020 15:35
Show Gist options
  • Save bzamecnik/61b293a3891e166797491f38d579d060 to your computer and use it in GitHub Desktop.
Save bzamecnik/61b293a3891e166797491f38d579d060 to your computer and use it in GitHub Desktop.
Installing CUDA 8.0 on PC with with GTX 980 Ti and Ubuntu 16.04

Installing CUDA 8.0 on PC with with GTX 980 Ti and Ubuntu 16.04

State as of 2017-07-31.

Recently I was installing CUDA on a Azure NC6 VM with Tesla K80 and later the same day I also upgraded my personal machine with GTX 980 Ti from Ubuntu 15.10 to 16.04 and completely reinstalled CUDA.

I had NVIDIA driver 361, CUDA 7.5 and cuDNN 4 or 5 and wanted CUDA 8.0 for new TensorFlow 1.2.1. So I had to upgrade also Ubuntu.

Preparing the plan

  • remove old NVIDIA drivers, CUDA toolkit, cuDNN and related packages
  • install NVIDIA drivers, CUDA toolkit, cuDNN
  • verify it works OK

Target version:

  • NVIDIA driver: 375.82 (long-living stable)
    • latest: 384.59 (possibly OK, I took the stable version to be more sure)
  • CUDA toolkit: 8.0
  • cuDNN: 5.1
    • latest: 6.0 (not supported by latest TensorFlow 1.2.1, only by 1.3.0-rc1)

Secure boot

There's a change after upgrading Ubuntu from 15.10 that Ubuntu 16.04 uses Secure Boot mechanism to digitally sign kernel drivers (originally used to protect Windows from viruses). Unfortunately this doesn't work well with 3rd-party binary drivers such as NVIDIA. The working option for me was to disable Secure Boot entirely, otherwise Ubuntu is not able to load the drivers. The complication is that this cannot be done over SSH, but at the boot time in the UEFI BIOS settings! Since my computer is across the world, I had to call somebody to modify the BIOS settings for me.

Example procedure for my motherboard ASUS Z170-E (video):

  • restart, press F8 during boot to enter the bios
  • Boot -> Secure Boot (down) -> Key Management -> Clear Secure Boot Keys
  • Boot -> Secure Boot -> Secure Boot state - should be disabled
  • Exit -> Save changes & Reset
  • restart

I recommend to do this even before upgrading Ubuntu.

What GPU do we have?

PC with ASUS Z170-E motherboard and GTX 980 Ti GPU.

$ lspci | grep -i NVIDIA
01:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX 980 Ti] (rev a1)

Upgrading Ubuntu to 16.04

It was quite easy:

sudo do-release-upgrade

And resolve a few conflicts in /etc files. If I could disable Secure Boot first it would be easier.

Took about an hour.

Removing old drivers

If NVIDIA driver was installed via deb package:

sudo apt-get remove --purge 'nvidia*' 'cuda*' 'libcuda1*' 'libcudnn5*' libxnvctrl0

If NVIDIA driver was installed via the *.run file:

sudo nvidia-uninstall

Installing NVIDIA driver

There are two options:

Although NVIDIA recommends only the official installer it's so much a hassle to work with and it didn't work well work me. The debian package (not recommended by NVIDIA) worked like charm and without tons of questions. So my recommendation is to the the debian package.

Debian package from PPA

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-375

NVIDIA installer

Although I don't recommend it, here are the steps for the NVIDIA installer:

NV_VERSION=375.82
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/${NV_VERSION}/NVIDIA-Linux-x86_64-${NV_VERSION}.run
chmod +x NVIDIA-Linux-x86_64-${NV_VERSION}.run

If you have X11 server (GUI) running, you need to stop it during the installation.

## ERROR: You appear to be running an X server; please exit X before installing.
sudo service lightdm stop # display manager
sudo systemctl stop vncserver@1 # also stop VNC server if present
sudo init 3 # run level without GUI
sudo ./NVIDIA-Linux-x86_64-${NV_VERSION}.run
sudo reboot

CUDA toolkit

https://developer.nvidia.com/cuda-downloads

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

cuDNN

TensorFlow 1.2.1 needs cuDNN 5.1 (not 6.0).

Needs to be downloaded via registered NVIDIA account. https://developer.nvidia.com/rdp/cudnn-download

This can be downloaded from a browser and then copied to the target machine via SCP:

https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v5.1/prod_20161129/8.0/libcudnn5_5.1.10-1+cuda8.0_amd64-deb

sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64-deb

Add to ~/.profile:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
. ~/.profile

All can be installed together

sudo apt-get remove --purge 'nvidia*' 'cuda*' 'libcuda1*' 'libcudnn5*' libxnvctrl0
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64-deb
sudo apt-get update
sudo apt-get install nvidia-375 cuda

Reboot

sudo reboot

Test that it's working

We should see the GPU infomation:

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.82                 Driver Version: 375.82                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  Off  | 0000:01:00.0      On |                  N/A |
| 22%   39C    P8    19W / 250W |     92MiB /  6075MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1096    G   /usr/lib/xorg/Xorg                              53MiB |
|    0      2388    G   /usr/bin/gnome-shell                            37MiB |
+-----------------------------------------------------------------------------+

Let's run a simple "hello world" MNIST MLP in Keras/Tensorflow:

pip install tensorflow-gpu==1.2.1 keras==2.0.6
wget https://raw.githubusercontent.com/fchollet/keras/master/examples/mnist_mlp.py
python mnist_mlp.py

We should see that it uses the GPU and trains properly:

Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)

That's it. Happy training!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment