This is a collection of notes that explains how to configure linux for hardware-accelerated deep learning in R. The goal is to have linux drive the main display from an Intel integrated GPU, while reserving an Nvidia GPU strictly for computation using CUDA. It skips some of the more straightforward steps while detailing any "gotchas" I ran into.
- ElementaryOS Loki 0.4.1 (an Ubuntu 16.04 LTS derivative; so these notes apply to Ubuntu as well)
- Latest Nvidia drivers
- CUDA Toolkit
- cuDNN
- R/Rstudio
- R Keras (which includes tensorflow)
Before starting this whole process, check which CUDA versions are compatible with tensorflow and vice versa. Similarly, check which versions of cuDNN are compatible with the CUDA version that you're going to install. Here are all the versions that played nicely together for me:
- Nvidia Drivers 390.25
- CUDA 9.0
- cuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0
- tensorflow-gpu 1.6.0 Note that the latest version of CUDA (9.1) is NOT compatible with the latest version of tensorflow-gpu (1.6.0).
- Before installation, configure your BIOS to use the integrated GPU as the default display driver
- Connect display to integrated GPU HDMI (typically the motherboard HDMI, not external GPU HDMI)
- Install ElementaryOS
- Optional: Install Intel microcode drivers via apt-get (do not install any Nvidia drivers!)
We need the latest proprietery Nvidia drivers to use the CUDA libraries. The CUDA installer will ask if you want to install drivers as the first step, but the bundled drivers are out of date and failed to install for me. We'll install the latest drivers separately first, after a few setup steps.
Download the latest Nvidia drivers.
By default, when linux sees an Nvidia GPU, it will load nouveau, an open source Nvidia driver maintained by the linux community. Nouveau will conflict with the proprietary Nvidia drivers and we must block linux from loading nouveau. Create the following file:
/etc/modprobe.d/blacklist-nouveau.conf
# inside blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
Load the new blacklist and reboot:
sudo update-initramfs -u
sudo reboot
Launch into text-only mode by pressing Ctrl-Alt-F1 at the login screen. The Nvidia driver installer will complain if we attempt to install the driver while the desktop environment is active. Disable your desktop environment:
sudo service lightmd stop
If you manually install drivers and the linux kernel gets updated, those drivers will need to be recompiled. We can tell linux recompile the drivers automatically with dkms. This is generally a good idea and will keep the drivers from breaking when minor updates are made to the kernel. Install dkms:
sudo apt-get install dkms
Lastly, install the driver:
cd ~/Downloads/
chmod +x NVIDIA-Linux-x86_64-390.25.run
sudo ./NVIDIA-Linux-x86_64-384.69.run --dkms --no-opengl-files
The --no-opengl-files flag blocks the Nvidia GPU from ever loading X-server. The --dkms flag enables automatic driver re-compilation as described above. The installer will ask several questions. Ignore any questions about an unsupported system (the installer won't recognize ElementaryOS, for example). Ignore the question about 32-bit systems (everything is 64-bit these days). Say NO when the installer asks to run nvidia-xconfig. We do not want the installer attempting to configure X-server to run on the GPU.
If the installation was successful, reboot, open a terminal and run nvidia-smi
and you should see something like this:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A |
| 30% 27C P0 N/A / 75W | 0MiB / 4040MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
If you see No running processes found
, you've set up things correctly. Again, we don't want to see X-server running (or consuming memory) here.
The CUDA toolkit is really just a collection of libraries. There's no reason to install these via apt-get or a deb package. Just download the runfile from Nvidia's website.
Execute the CUDA toolkit runfile installer. When the installer asks you to install drivers say NO, otherwise you'll overwrite the drivers we just installed above.
cd ~/Downloads/
chmod +x cuda_9.0.176_384.81_linux.run
sudo ./cuda_9.0.176_384.81_linux.run
On Ubuntu, library paths must be specified at the system level using ld.so.conf.d
. Attempting to set these paths in .profile or .bashrc will work for the command line, but will fail in IDEs like Rstudio. Create the following file and add two lines specifying the library paths:
/etc/ld.so.conf.d/cuda.conf
/usr/local/cuda/lib64
/usr/local/cuda/extras/CUPTI/lib64
Then re-link the libraries:
sudo ldconfig
Now add update your PATH and add some CUDA-specific environment variables to .profile. These definitions must go in .profile (not .bashrc) for an R session in Rstudio to load them correctly.
~/.profile
export PATH="/usr/local/cuda/bin:$PATH"
export CUDA_HOME=/usr/local/cuda
export CUDA_VISIBLE_DEVICES=0 # Only use GPU 0, this will differ for multi-GPU setups
Run the following to compile a simple CUDA utility:
cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest/
make
./bandwidthTest
If you see "PASS", you've successfully configured and installed CUDA.
cuDNN is an additional set of CUDA libraries specifically for deep neural networks. These libraries are a dependency for tensorflow, so we'll have to install them next.
tar xzvf cudnn-9.0-linux-x64-v7.tgz
sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
The version of R in the main Ubuntu repositories is ancient. Install the latest version of R directly from CRAN:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
sudo apt-get update
sudo apt-get install r-base
Download the latest Rstudio *.deb package and install it:
sudo apt install gdebi-core
sudo gdebi rstudio-xenial-1.1.423-amd64.deb
By default, Keras will attempt to install tensorflow in a python virtual environment. Make sure that you have pip and virtualenv installed:
sudo apt-get install python-pip python-dev python-virtualenv
Next, open up Rstudio, and in the R console run the following:
install.packages(keras)
install_keras(tensorflow="gpu")
Assuming you've configured your paths correctly, you should now have Keras with the GPU-accelerated tensorflow backend installed.
Problems I encountered and some workarounds.
If your R session in Rstudio unexpectedly crashes when running keras/tensorflow, open up a terminal and run the same commands in R from the terminal. R tensorflow should give you an informative error message when it crashes. In my case, I had installed a version of the cuDNN libraries not compatible with the version of tensorflow.
I initially tried to install the (somewhat older) drivers bundled with the CUDA installer. This failed no matter what I tried. Instead, I installed the latest drivers from Nvidia separately. This worked fine. I still have no idea why.
These blog posts were really helpful: