htcai/Fedora-27-tf-gpu.md

## Fedora-27-tf-gpu.md

      
    Raw
  

              Fedora-27-tf-gpu.md
            
          
    Acknowledgement

This blog benefits a lot from the helpful guidance from mdda and wangruohui.
Notice

Desktop environment sensitivity

The strategy presented below applies to the default desktop environment GNOME3 of Fedora 27, but may not work for others such as KDE Plasma. For instance, KDE tends to crash after installation of NVIDIA drivers. Actually, this is also confirmed by my own recent experience with Fedora KDE Plasma.
Optimus? NO need for Bumblebee!

You may have heard from various sources that machines with Optimus should use Bumblebee. However, this is very likely redundant now. Rather, you only need to make sure that your machine contains an NVIDIA GPU. For instance,
$ sudo lspci | grep -i NVIDIA
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 950M] (rev a2)

Main Strategy

NVIDIA Driver will be installed from negativo17 whereas CUDA and cuDNN will be directly downloaded from NVIDIA's official webiste.
Motivation

Why not NVIDIA all the way?

Although wangruohui has provided detailed descriptions of how to install NVIDIA driver via runfile, the runfile may still refuse to run because an X server is still running on your machine. Despite the large number of discussions at places such as stackoverflow, it can still be a huge pain to find a way to turn off all of the X servers.
In contrast, negativo17 provides smooth experience in both installation and usage.
Why not negativo17 all the way?

mdda installs NVIDIA driver, CUDA and cuDNN all via negativo17. Nonetheless, the versioning of CUDA and cuDNN has been a trouble since the release of CUDA 9.1. If you don't want to bother compiling tensorflow-gpu from source, TF 1.6 only supports CUDA 9.0. Unfortunately, CUDA 9.0 is no longer available at negativo17. Please note that the following check of cuda-devel is only possible after the negativo17 repo is added.
$ dnf search --showduplicate cuda-devel
=============================== Name Exactly Matched: cuda-devel ===============================
cuda-devel-1:9.1.85-3.fc27.x86_64 : Development files for cuda
cuda-devel-1:9.1.85-4.fc27.x86_64 : Development files for cuda
cuda-devel-1:9.1.85.3-5.fc27.x86_64 : Development files for cuda

Such version mismatch may well occur again in the future. Therefore, it is a better option to make use of the CUDA runfile downloaded from NVIDIA's website, where a large variety of versions are available.
Step-by-step installation

Add negativo17 repo

$ dnf config-manager --add-repo=https://negativo17.org/repos/fedora-nvidia.repo

Install NVIDIA driver

Credits go to mdda.
$ sudo dnf install kernel-devel dkms-nvidia nvidia-driver-cuda

Install CUDA

As wangruohui pointed out, installation of CUDA is quite smooth, as it just involves copying files to target directories. Take CUDA 9.0 as an example. After downloading CUDA runfile to the home directory (ignore the issue that NVIDIA only appears to support Fedora 25 in CUDA 9.0 download page; just go for the only option of Fedora version).
$ ./cuda_9.0.176_384.81_linux.run --extract=$HOME

Three extracted files will be added to the home directory. The only one that is needed is cuda-linux.9.0.176-22781540.run.
$ sudo ./cuda-linux.9.0.176-22781540.run

Install (or copy?) cuDNN

TF 1.6 requires cuDNN 7.0, which can be downloaded from NVIDIA's website.
After extracting the tgz file, copy / move the files in the two sub-folders include and lib64 to corresponding system directories. On Fedora 27, it is /usr/include and /usr/lib64.
Create symbol link

If tensorflow-gpu is installed without further setting, an ImportError will be raised.
ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory

The fix is straightforward: add the following line to .bashrc (credit goes to abrahamrhoffman).
export LD_LIBRARY_PATH="/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

Install tensorflow-gpu

Just use Anaconda and pip... Assume the environment name is py35 with Python version 3.5.
(py35) $ pip install --ignore-installed --upgrade \
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp35-cp35m-linux_x86_64.whl

Enjoy.