awwong1/gpu_setup.md

## gpu_setup.md

      
    Raw
  

              gpu_setup.md
            
          
    Deep Learning Set-up

Here, I am documenting the steps I took to get my GPU set up. I do not expect
these to be interesting to anyone else, but am making them public in case they
may help anyone. Note that I did not start with a cold system, so these notes
are likely incomplete. Summary of the fully installed system:

OS: Ubuntu 17.10
Card: TITAN Xp (memory: 12 GB)
Driver: nvidia-387/artful (387.34-0ubuntu0~gpu17.10.2)
CUDA Version: 9.1 (9.1.85-1_amd64)
cuDNN Version: 7.1
TensorFlow Version: 1.4.1 (from source)
Bazel Version: 0.9.0
GCC Version: gcc-6 (Ubuntu 6.4.0-8ubuntu1) 6.4.0 20171010
Python Version: 3.6.3
Keras: 2.1.5
OpenCV Version: 3.4.0
dlib Version: 19.10.0

I managed to get this working thanks to the help of these tutorials:

Deep learning setup for Ubuntu 16.04
How to install Tensorflow GPU

NVidia Drivers

It is a good idea to start by purging and existing drivers (they conflict
with one another when running TensorFlow):
sudo apt-get purge nvidia*

Next, add the repository that holds NVidia drivers, update the cache, and
and install the nvidia-387 drivers (note that it took some trial and error
to figure out which of these was needed):
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
sudo apt-get install nvidia-387

You then must reboot the computer. Check that the correct version of the
driver is running:
cat /proc/driver/nvidia/version

CUDA

The next step is to install CUDA. I grabbed the link I am using here by
clicking through the options here.
wget https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda-repo-ubuntu1704-9-1-local_9.1.85-1_amd64
mv cuda-repo-ubuntu1704-9-1-local_9.1.85-1_amd64 cuda-repo-ubuntu1704-9-1-local_9.1.85-1_amd64.deb

Next, install the package (note that you will have to change the 9-1-local
in the second line if using a different version of CUDA:
sudo dpkg -i cuda-repo-ubuntu1704-9-1-local_9.1.85-1_amd64.deb
sudo apt-key add /var/cuda-repo-9-1-local/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

cuDNN

The next step is to install cuDNN, which you can get from here.
You need to go through obnoxious developers program with Nvidia to get this
link to work. Then, unzip the file and install.
tar xvf cudnn*.tgz
cd cuda
sudo cp */*.h /usr/local/cuda/include/
sudo cp */libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Of all the steps, this was the most painless.
Python

I installed the following in accordance with the tutorials linked above.
Likely only some of these are needed, but I don't want to remove something
crucial as I have no good way to test that I am not breaking something.
sudo apt-get update
sudo apt-get install -y python-numpy python-scipy python-nose \
  python-h5py python-skimage python-matplotlib python-pandas \
  python-sklearn python-sympy

sudo apt-get clean && sudo apt-get autoremove

sudo apt-get update
sudo apt-get install git python-dev python3-dev python-numpy python3-numpy \
  build-essential python-pip python3-pip python-virtualenv swig python-wheel \
  libcurl3-dev
sudo apt-get install -y libfreetype6-dev libpng12-dev
pip3 install -U matplotlib ipython[all] jupyter pandas scikit-image

OpenBLAS

OpenBLAS is open source and easy to install. Just run the following:
git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS
make FC=gfortran -j16
sudo make PREFIX=/usr/local install

Installing TensorFlow

Now for the most painful step. I had to install TensorFlow from source because
the pip version did not match my version of CUDA. First, get libcupti:
sudo apt-get install libcupti-dev
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

And then, download and install bazel:
sudo apt-get install openjdk-8-jdk
wget https://github.com/bazelbuild/bazel/releases/download/0.9.0/bazel_0.9.0-linux-x86_64.deb
sudo dpkg -i bazel_0.9.0-linux-x86_64.deb
source ~/.bashrc
sudo ldconfig

Finally, install TensorFlow. This gets tricky because the configure requires
a number of options. See How to install Tensorflow GPU for the full details, but the major changes I had
to adjust were to link to Python3 and to GCC-6 (it does not support GCC-7).
You also of course need to the correct CUDA and cuDNN version. Finally, you
have to decide what compute capabilities are needed. I picked just 6.1 for
now.
wget https://github.com/tensorflow/tensorflow/archive/v1.4.1.zip
unzip v1.4.1.zip
cd tensorflow-1.4.1
./configure

Then, this fixes some error:
sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp

Now, build (this take a little while):
bazel build --config=opt --config=cuda --incompatible_load_argument_is_label=false //tensorflow/tools/pip_package:build_pip_package

Finally, build the python wheel and install the TensorFlow Python package:
bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg
pip3 install tensorflow*.whl

Keras

Installing keras is easy:
sudo pip3 install keras

I am not sure you even need to re-do this if you have already installed
keras (that is, I think keras does not have a specific GPU version; its the
backend that matters), but I did just in case.
Testing

To test that this all works, run the following in python3:
import tensorflow as tf
a = tf.constant(5)
b = tf.constant(6)
sess = tf.Session()
sess.run(a+b)

The second to last line should spit out something about the GPU we are using.
OpenCV

I originally tried to compile OpenCV myself to support CUDA. However, for some
reason my compiled version was very slow, particularly the function cvtColor.
It was some 100x slower than the version I had on my laptop. I though this might
be a problem with GPU code, but it persisted even when I compiled without CUDA
support. So, I instead switched to a just grabbing a python wheel:
pip3 install opencv-python

And that fixed the problem (but doesn't support CUDA). I'll try to figure out
what my build missed; I'll leave the original notes below for then.
Finally, we want to get OpenCV working with Python support. We need to build
from source to support CUDA 9.1. The first tutorial suggested the following
dependencies, so I will grab the same:
sudo apt-get install libjpeg8-dev libtiff5-dev libpng-dev
sudo apt install qtbase5-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt-get install libgtk2.0-dev
sudo apt-get install libatlas-base-dev gfortran
sudo apt-get install libhdf5-serial-dev

Now grab the source and checkout the most recent stable release (currently
3.4.1):
git clone https://github.com/opencv/opencv
cd opencv
git checkout 3.4.1

Create the configuration file (note: you have to set GCC-6 as higher versions
are not allowed):
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D CMAKE_CXX_COMPILER=/usr/bin/g++-6 \
-D CMAKE_C_COMPILER=/usr/bin/gcc-6 \
-D WITH_CUDA=ON \
-D WITH_CUBLAS=ON -D WITH_TBB=ON \
-D WITH_V4L=ON -D WITH_QT=ON \
-D WITH_OPENGL=ON \
-D BUILD_PERF_TESTS=OFF \
-D BUILD_TESTS=OFF \
-DCUDA_NVCC_FLAGS="-D_FORCE_INLINES --expt-relaxed-constexpr" ..

And compile (it takes a while, and the percentages do not seem to match
that actual clock time, probably due to what can and cannot be parallelised):
make -j16
sudo make install

Test it by running this in python3:
import cv2
print(cv2.getBuildInformation())

It should print something out about CUDA support being turned.
dlib: Python

The face detector I am using dependson dlib and is very slow if not using
the GPU (and very fast otherwise). In order to get this working with a GPU,
again have to compile from source:
git clone https://github.com/davisking/dlib
cd dlib
sudo python3 setup.py --clean install --yes USE_AVX_INSTRUCTIONS \
  --set CUDA_HOST_COMPILER=/usr/bin/gcc-6

The difficult bit was just getting it to use the right compiled for CUDA (gcc-6
rather than ggc-7). After those lines, it should make the configuration files
and build the project. Just make sure that you see this line somewhere in the
configuration script:
-- Enabling CUDA support for dlib.  DLIB WILL USE CUDA

There is, annoyingly, no easy way to check that it works because dlib provides
no user-level indication of whether CUDA support was installed. You best bet
is to run the face detector and see how well it runs:
import face_recognition as fr
import cv2
import time

i = 1000
path = "" # set path and image file
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

start_time = time.time()
faces = fr.face_locations(img, 1, model="cnn")
end_time = time.time()
print("Elapsed time was %g seconds" % (end_time - start_time))

On a CPU this takes about 4 seconds for a standard definition image. On my GPU
takes a little less than a second and subsequent calls take around 0.06 seconds.