andreajparker/ml_dl_setup_feb_2019.md

## ml_dl_setup_feb_2019.md

      
    Raw
  

              ml_dl_setup_feb_2019.md
            
          
    walkthrough

We'll be setting up our Ubuntu 18.04 LTS (long-term support) environment to support ML/DL via the GPU in our GeForce GTX 1060 3GB card. Using lshw (see man lshw) via the command sudo lshw -C display | grep product we were able to find out our GPU details such as its "make", "model number", etc. Armed with those details, it's recommended to head on over to Nvidia's CUDA GPUs site: select the GPU family that you have (in our case GeForce), upon which you'll be taken to a page showing you the "Compute Capability" score (in our case, a 6.1 Compute Capability score).
Once you've got your Compute Capability score, click on the score value to be taken to a GPU-card-specific landing page (again, in our case a GeForce GTX 1060). Scrollling down we note that our 3GB card is under the Pascal architecture, with 1152 CUDA cores, 3GB GDDR5 RAM (graphics double data rate type 5 synchronous RAM), is listed as having a memory speed of 8 Gbps, with a Boost Clock speed of 1170 MHz.
pre-reqs

A copy of Ubuntu 18.04 If you're not sure of your processor type, type lscpu to get the details about your procesor (32 or 64 bit), vendor type (AMD, Intel, etc.) and pick up a copy of Ubuntu 18.04 that suits your processor. We select the AMD 64 version as we have an x86_64 Intel processor, but you can find a list of all available versions hereUbuntu 18.04 Netboot page with downloads. If you want a non-netboot installer, go here: https://www.ubuntu.com/download/desktop Then, using the Start-up Disk Utility (if yu're already running Ubuntu) create a bootable USB with the 18.04 image that you just downloaded.
Create an Nvidia Developer Program account If you don't already have an Nvidia Developer Program account (which you'll need to download a whole host of CUDA drivers and programs to take advantage of GPU-accelerated applications and/or libraries. Create an account at https://developer.nvidia.com/join
Determine your GPU's "Compute Capability" score Use the following commmand to get the details about your GPU sudo lshw -C display | grep product and then check its "Compute Capability" at Nvidia's CUDA GPU page.
steps


install Ubuntu
install GPU drivers
install CUDA
install cuDNN

DETOUR 1: creating system snapshots and restore points

install python, conda
setup ml, dl frameworks

step 1 - Installing Ubuntu

Install your copy of Ubuntu 18.04 LTS. It'll install some generic GPU drivers, but these 'one-size-fits-all' GPU drivers (via the Ubuntu Default Recommended Driver) and while these drivers work most of the time now (which is a welcome change from using Linux even 5 years ago as one's dev environment) we'll try installing the drivers via the ppa (personal package archive). Note that you can install drivers for your Nvida GPU in several ways which are summarized in this StackExchange answer. In the past I tried using the official drivers from the Nvidia site but installation and setup was a major PITA, so let's try using the good ol' ppa sudo apt-get blah method to get our CUDA GPU up and running under Ubuntu 18.04 LTS.
sudo apt-get install synaptic to get synaptic.
step 2 - Installing GPU drivers

Now that you've got your shiny, new Ubuntu 18.04 LTS setup, let's use the ppa method of setting up our GPU. The ppa method is nice in that it doesn't auto-magically update the drivers (you'll be prompted to approve/deny an update if one becomes available) and this method has fewer issues than other methods such as the Nouveau / default drivers or the official Nvidia site drivers:
    sudo add-apt-repository ppa:graphics-drivers/ppa

To install the driver (the StackExchange answer chooses 396, a newer driver) we choose version 390 as it's more stable:
sudo apt install nvidia-driver-390
Then verify driver installation:
nvidia-smi
which should give you a nice little table showing your graphics card's name, driver number, and any processes currently running on your graphics card (probably X-Server / your GUI and a few others).
The, freeze your driver so that it doesn't automatically update itself:
sudo apt-mark hold nvidia-driver-390
You should receive a confirmation message sayin nvidia-driver-390 set on hold.
(Note: we also installed some temp. sensor / monitoring utilities: https://itsfoss.com/check-laptop-cpu-temperature-ubuntu/)
step 3 - Installing CUDA 9.0

Check your gcc and g++ versions.
    gcc -v
    gcc --version
    g++ -v

Our gcc version is:
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
Then, install your headers. You can think of a header file as a description of a function, its parameters, etc. This forward delcaration is required in C and the Linux kernel is based on C you can thank header files for allowing you to access functions in the kernel:
    sudo apt install linux-headers-$(uname -r)

Grab CUDA package from the Nvidia site, being sure to select the appropriate OS, architecture, etc.:
    sudo dpkg -i cuda-something_something_version_numbers_blahblah.deb

You will then be prompted to install the GPG public key with which the CUDA install will be verified:
    sudo apt-key add /var/cuda-repo-9–0-local/7fa2af80.pub

You'll receive an OK as confirmation that the public key was successfully added to your keychain.
Then run sudo apt update followed by sudo apt install cuda and let the installation run. (You'll see done and done as confirmation.) After that step you'll want to hold your CUDA package from being updated sudo apt-mark hold cuda. (You'll see cuda set on hold as confirmation.)
Export your PATH variable:
     sudo nano .bashrc
     export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}

Verify that the installation was successful:
    cat /proc/driver/nvidia/version

You should see something like the following:
    data-janitor@datajanitor-desktop:~/Downloads$ cat /proc/driver/nvidia/version
    NVRM version: NVIDIA UNIX x86_64 Kernel Module  390.87  Tue Aug 21 12:33:05 PDT 2018
    GCC version:  gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04) 

Note that we'll need gcc 6 not 7 but we'll install that in a minute...
Install the Nvidia CUDA Compiler by typing  sudo apt-get install nvidia-cuda-toolkit
Then check your Nvidia CUDA Compiler version:
    nvcc --version

Which should give you something like:
    data-janitor@datajanitor-desktop:~/Downloads$ nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2017 NVIDIA Corporation
    Built on Fri_Nov__3_21:07:56_CDT_2017
    Cuda compilation tools, release 9.1, V9.1.85

step 4 - Installing cuDNN!

cuDNN 7.0.5 is compatible with TensorFlow, so we pick that version. You can download the archived 7.0.5 cuDNN libraries for CUDA 9.0 version here: https://developer.nvidia.com/rdp/cuDNN-download . Make sure to download the runtime library, developer library and the code samples. The libraries will be called cuDNN v.7.0.5 [library name] for Ubuntu 16.04' but don't worry. They work with 18.04.
Note that installing the nvidia-cuda-toolkit should have installed gcc 6, but if not... Make sure you have gcc 6 installed:
sudo apt install gcc-6 g++-6
We get:
data-janitor@datajanitor-desktop:~/Downloads$ sudo apt install gcc-6 g++-6
[sudo] password for data-janitor: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
g++-6 is already the newest version (6.5.0-2ubuntu1~18.04).
g++-6 set to manually installed.
gcc-6 is already the newest version (6.5.0-2ubuntu1~18.04).
gcc-6 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Navigate to where you downloaded your three packages and start installing:
Runtime file:
sudo dpkg -i libcuDNN7_7.0.5.15–1+cuda9.0_amd64.deb
Developer file:
sudo dpkg -i libcuDNN7-dev_7.0.5.15–1+cuda9.0_amd64.deb
Documentation file (contains the MNIST data to test your cuDNN installation):
sudo dpkg -i libcuDNN7-doc_7.0.5.15–1+cuda9.0_amd64.deb
We installed gcc 6 before, so now let's update the symlinks so that CUDA can find gcc 6:
sudo ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++
Test your installation using the MNIST data in /usr/src/cudnn_samples_v7/mnistCUDNN:
cp -r /usr/src/cudnn_samples_v7/ $HOME/cuDNN_test_dir/
cd $HOME/cuDNN_test_dir/cuDNN_samples_v7/mnistcuDNN
make clean && make
Which should - if everything is successfully installed - give you the following back:
data-janitor@datajanitor-desktop:~/cuDNN_test_dir/mnistCUDNN$ make clean && makerm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o  -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm

Now let's test our cuDNN convolutional neural net using the MNIST numbers dataset:
./mnistcuDNN
Provided you've installed things correctly you'll get a Results of classification: (some integers) and a Test passed! response.
Don't forget to hold your packages and delete you sample MNIST project directory:
sudo apt-mark hold libcudnn7 libcudnn7-dev libcudnn7-doc
which confirms that the hold succeeded by returning to you three set on hold messages.
rm -r cuDNN_test_dir/
DETOUR 1: creating system snapshots and restore points

Now that we have CUDA and CuDNN working let's 'save' our work by setting up system snapshots in case we break anything in the next step of this walkthrough when we'll be installing Python, a Python package manager (pip, Anaconda, etc.), and Tensorflow as well as a few other ML and DL frameworks.
Timeshift for Ubuntu leverages rsync to create backups and restore points for your system.
We type sudo apt-add-repository ppa:teejee2008/ppa followed by sudo apt-get update followed by sudo apt-get install timeshift. Then launch Timeshift from the launcher, enjoy the GUI, and set up your system snapshots and restore points.
Upon starting Timeshift we're prompted to choose a snapshot type. Weirdly, BTRFS is an option that is selectable, but since we're not using BTRFS we can't / shouldn't choose that option as our snapshot type. Thus, we're left with the 'choice' of RSYNC as our snapshot type. Select RSYNC as our snapshot type and hit 'Next'. You'll see a message "Estimating system size" for a few seconds, followed by a "Select Snapshot Location" screen. Timeshift tells us that devices with Windows file systems (NTFS, FAT, etc.) are not supported. (We have ext4, so we're good to go.) Timeshift also tells us that remote and network locations for saving the backups is not supported. TODO: IN THE FUTURE WE WILL WANT TO MOVE THESE SNAPSHOTS OFF OF OUR MACHINE. CONSIDER ENCRYPTING THEM AND UPLOADING THEM TO GLACIAL STORAGE SOMEWHERE The next screen lets us choose the "Snapshot Levels", i.e. the temporal frequency with which to perform backups (monthly, weekly, daily, hourly, etc.) and how many backups to retain. We choose daily backups and retain the last 5 days' worth of backups.
step 5 - setting up your Python dev environment

After some deliberation, we decided to go with (mini)conda as our package and environment manager. Though we work mostly in Python some project have dependencies outside of the Python ecosystem and since *conda allows us to manage a wide range of dependicies - even those outside of Python-land - we decided to go with conda instead of pip/'PyPI + virtualenv, etc. For me, the advantages of being able to use a multi-language stack for my projects, and to be able to manage all that from one place, means that conda has more utility to me than using pip + virtualenv for package and environment management. (If for some reason you want to go back to using pip and virtualenv this page is a great reminder resource!)
To install (mini)conda we get the installation shell script directly from Continuum, the original devs of Conda, (ch)modify the shell script so that our user can execute the script, and then run the script:
wget --secure-protocol=protocol https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh #see man page for protocols
chmod +x Miniconda-latest-Linux-x86_64.sh
./Miniconda-latest-Linux-x86_64.sh

You'll get some big EULA to mash the Enter key 'to read'. Type yes and Miniconda will be installed to the default directory which is /home/your_user_name/miniconda3.
Type yes when prompted to add miniconda3 to your .bashrc file:
Do you wish the installer to initialize Miniconda3
in your /home/data-janitor/.bashrc ? [yes|no]

Setting up an environment

To create a new virtual environment called my_env using a specific version of Python (3.4) with some specific versions of packages:
conda create -n my_env python=3.4 scipy=0.15.0 asteroid numpy
If you have a YAML file, you can also create your environment that way too; see 'Creating an environment manually'
conda env create -f environment.yml
Cloning environments

You can clone environments as follows:
conda create --name myClone --clone myInitialEnv
Building identical environments

You can build identical environments (locally or on another machine) by first finding the specific packages in your extant environment:
conda list --explicit > your_spec_file.txt lets you produce a spec(ification) list which can then be used to build your environment elsewhere via conda create --name <env> --file your_spec_file.txt
(De)activating your conda environments

source deactivate
Current env and all envs

Current env: conda info --envs All envs: same command as before or conda env list
List all packages in current env

conda list -n myenv
See link below for additional Conda commands

https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
step 6 - setting up ML/DL frameworks

In this next step we'll get TensorFlow, PyTorch and a few other ML,DL frameworks up and runnning (with GPU suppport!). We create a directory where our projects will live, /home/data-janitor/Documents/work, and initialize a new environment:
conda create -n dev_env python=3.6 pip numpy scipy pandas matplotlib seaborn
Let those packages install and then type conda source activate dev_env or conda activate dev_env. Next we'll install Tensorflow in our dev_env:
pip3 install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.8.0-cp36-cp36m-linux_x86_64.whl

Check that tensorflow installed first:
conda list | grep tensor
And then, using Tensorflow, give the ol' Hello World a whirl in tf:
$ python
>>> import tensorflow as tf
>>> hello = tf.constant(‘Hello, TensorFlow!’)
>>> sess = tf.Session()
>>> print(sess.run(hello)) # this should give back b'Hello, TensorFlow!'
$ exit()

Deactivate this env. Create a new env for pytorch. And so on...
(pytorch_env) data-janitor@datajanitor-desktop:~/Documents/work$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.current_device()
0
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7f5acfe313c8>
>>> torch.cuda.get_device_name(0)
'GeForce GTX 1060 3GB'
>>> exit()
(pytorch_env) data-janitor@datajanitor-desktop:~/Documents/work$ source deactivate