planetceres/tf_1-8_keras_2-1_gpu_setup.md

## tf_1-8_keras_2-1_gpu_setup.md

      
    Raw
  

              tf_1-8_keras_2-1_gpu_setup.md
            
          
    Requirements


OS

Ubuntu 16.04


CUDA

CUDA 9.0
NVCC 9.0.176
CUDNN 7.1.4


Tensorflow

GPU 1.8.0


Keras

2.1.5


Python

3.6


Begin provisioning

Update Ubuntu and reboot (you will need to login again):
sudo apt-get update -y
sudo reboot

Mount disk:

https://askubuntu.com/questions/956470/add-additional-hdd-in-ubuntu-16-04
http://www.srimax.com/index.php/mount-a-drive-in-centos/
sudo lsblk
sudo fdisk -l
fdisk -l |grep Disk |grep bytes
mount | grep ' \/ '

mkfs -t ext4 /dev/sdb1

sudo mkdir /data1
sudo mount /dev/sdb1 /data1
sudo chmod 755 /data1 ## make sensible permissions
cd /data1 ## change to the mount directory
ls -a ## list content of the new hard drive

#unmount
sudo umount /dev/sdb1

#Fstab
sudo blkid
sudo cp /etc/fstab /etc/fstab.bak
sudo nano /etc/fstab

#add to last line of fstab
UUID="<UUID>" /data ext4 defaults 0 2

mount -a

# Create user `<username>`:
adduser <username>

# Set password:
passwd <username>

# add user to sudo group
usermod -aG sudo <username>

# login as user
su - <username>

Add git ssh

mkdir /home/<ubuntu_username>/.ssh/github
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

Enter file in which to save the key (/home/<ubuntu_username>/.ssh/id_rsa): /home/<ubuntu_username>/.ssh/github/id_rsa

Login to https://www.github.com and add ssh key from:
cat ~/.ssh/github/id_rsa.pub

Start ssh agent and add key:
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/github/id_rsa

Install gcc:

https://gist.github.com/application2000/73fd6f4bf1be6600a2cf9f56315a2d91#gistcomment-2569893
sudo rm /boot/grub/menu.lst
sudo update-grub-legacy-ec2 -y
sudo apt-get dist-upgrade -qq --force-yes

sudo apt-get update -y &&
sudo apt-get upgrade -y &&
sudo apt-get dist-upgrade -y &&
sudo apt-get install build-essential software-properties-common -y &&
sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y &&
sudo apt-get update -y &&
sudo apt-get install gcc-8 g++-8 -y &&
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 60 --slave /usr/bin/g++ g++ /usr/bin/g++-8 &&
sudo update-alternatives --config gcc

# sudo yum group install "Development Tools"
whereis gcc

CUDA installation

Install cuda:
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
sudo apt-get install pciutils

Check compatibility:
lspci | grep -i nvidia
uname -m && cat /etc/*release
uname -r

Install Cuda 9.0
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda -y

Confirm installation successful:
nvidia-smi

Add to path:
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64' >> ~/.bashrc
source ~/.bashrc

Environment Setup:
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

or to add to ~/.bash_profile:
echo 'export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}' >> ~/.bash_profile
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bash_profile
source ~/.bash_profile

Update /usr/lib/systemd/system/nvidia-persistenced.service

# NVIDIA Persistence Daemon Init Script
#
# Copyright (c) 2013 NVIDIA Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
#
# This is a sample systemd service file, designed to show how the NVIDIA
# Persistence Daemon can be started.
#


# Defaults
#[Unit]
#Description=NVIDIA Persistence Daemon
#Wants=syslog.target

[Service]
#Type=forking
#ExecStart=/usr/bin/nvidia-persistenced --user root
#ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

#[Install]
#WantedBy=multi-user.target


[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
Restart=always
ExecStart=/usr/bin/nvidia-persistenced --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

[Install]
WantedBy=multi-user.target

Reboot machine (from root):
reboot

Start persistence daemon:
/usr/bin/nvidia-persistenced --verbose

Test installation:
cuda-install-samples-9.0.sh ./
cd ~/NVIDIA_CUDA-9.0_Samples
sudo make
#./deviceQueryDrv

or test with:
cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQueryDrv
#sudo make
./deviceQueryDrv

Success (Note that Result at bottom == pass):
CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)

Device 0: "Tesla P100-PCIE-16GB"
  CUDA Driver Version:                           9.0
  CUDA Capability Major/Minor version number:    6.0
  Total amount of global memory:                 16276 MBytes (17066885120 bytes)
  (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1329 MHz (1.33 GHz)
  Memory Clock rate:                             715 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 4194304 bytes
  Max Texture Dimension Sizes                    1D=(131072) 2D=(131072, 65536) 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z):    (2147483647, 65535, 65535)
  Texture alignment:                             512 bytes
  Maximum memory pitch:                          2147483647 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 7
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Result = PASS

Install cuDNN:

see: https://gist.github.com/mjdietzx/0ff77af5ae60622ce6ed8c4d9b419f45
download https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.1.2/prod/9.0_20180316/cudnn-9.0-linux-x64-v7.1 from
https://developer.nvidia.com/rdp/cudnn-download
cd /usr/local
sudo tar -xzvf cudnn-9.0-linux-x64-v7.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Check Installation
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Install Miniconda

Install latest miniconda:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Accept option:
Do you wish the installer to prepend the Miniconda3 install location
to PATH in your /home/<username>/.bashrc ? [yes|no]

>> yes

Check installation and remove installer
source ~/.bashrc

# check installation
which conda

# Remove installer
rm -f Miniconda3-latest-Linux-x86_64.sh

Now create a new conda environment and activate it.
Install Tensorflow/Keras

sudo apt -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel

Install Tensorflow 1.8.0 cuDNN:
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.8.0-cp36-cp36m-linux_x86_64.whl --no-cache-dir


Keras will install non-GPU version Tensorflow if the --no-deps flag is not used:
pip install --upgrade --no-deps keras==2.1.5
pip show tensorflow-gpu