We'll be setting up our Ubuntu 18.04 LTS (long-term support) environment to support ML/DL via the GPU in our GeForce GTX 1060 3GB card. Using lshw
(see man lshw) via the command sudo lshw -C display | grep product
we were able to find out our GPU details such as its "make", "model number", etc. Armed with those details, it's recommended to head on over to Nvidia's CUDA GPUs site: select the GPU family that you have (in our case GeForce), upon which you'll be taken to a page showing you the "Compute Capability" score (in our case, a 6.1
Compute Capability score).
Once you've got your Compute Capability score, click on the score value to be taken to a GPU-card-specific landing page (again, in our case a GeForce GTX 1060). Scrollling down we note that our 3GB card is under the Pascal architecture, with 1152 CUDA cores, 3GB GDDR5 RAM (graphics double data rate type 5 synchronous RAM), is listed as having a memory speed of 8 Gbps, with a Boost Clock speed of 1170 MHz.
A copy of Ubuntu 18.04 If you're not sure of your processor type, type lscpu
to get the details about your procesor (32 or 64 bit), vendor type (AMD, Intel, etc.) and pick up a copy of Ubuntu 18.04 that suits your processor. We select the AMD 64 version as we have an x86_64
Intel processor, but you can find a list of all available versions hereUbuntu 18.04 Netboot page with downloads. If you want a non-netboot installer, go here: https://www.ubuntu.com/download/desktop Then, using the Start-up Disk Utility (if yu're already running Ubuntu) create a bootable USB with the 18.04 image that you just downloaded.
Create an Nvidia Developer Program account If you don't already have an Nvidia Developer Program account (which you'll need to download a whole host of CUDA drivers and programs to take advantage of GPU-accelerated applications and/or libraries. Create an account at https://developer.nvidia.com/join
Determine your GPU's "Compute Capability" score Use the following commmand to get the details about your GPU sudo lshw -C display | grep product
and then check its "Compute Capability" at Nvidia's CUDA GPU page.
DETOUR 1: creating system snapshots and restore points
Install your copy of Ubuntu 18.04 LTS. It'll install some generic GPU drivers, but these 'one-size-fits-all' GPU drivers (via the Ubuntu Default Recommended Driver) and while these drivers work most of the time now (which is a welcome change from using Linux even 5 years ago as one's dev environment) we'll try installing the drivers via the ppa
(personal package archive). Note that you can install drivers for your Nvida GPU in several ways which are summarized in this StackExchange answer. In the past I tried using the official drivers from the Nvidia site but installation and setup was a major PITA, so let's try using the good ol' ppa
sudo apt-get blah
method to get our CUDA GPU up and running under Ubuntu 18.04 LTS.
sudo apt-get install synaptic
to get synaptic.
Now that you've got your shiny, new Ubuntu 18.04 LTS setup, let's use the ppa
method of setting up our GPU. The ppa
method is nice in that it doesn't auto-magically update the drivers (you'll be prompted to approve/deny an update if one becomes available) and this method has fewer issues than other methods such as the Nouveau / default drivers or the official Nvidia site drivers:
sudo add-apt-repository ppa:graphics-drivers/ppa
To install the driver (the StackExchange answer chooses 396
, a newer driver) we choose version 390
as it's more stable:
sudo apt install nvidia-driver-390
Then verify driver installation:
nvidia-smi
which should give you a nice little table showing your graphics card's name, driver number, and any processes currently running on your graphics card (probably X-Server / your GUI and a few others).
The, freeze your driver so that it doesn't automatically update itself:
sudo apt-mark hold nvidia-driver-390
You should receive a confirmation message sayin nvidia-driver-390 set on hold.
(Note: we also installed some temp. sensor / monitoring utilities: https://itsfoss.com/check-laptop-cpu-temperature-ubuntu/)
Check your gcc
and g++
versions.
gcc -v
gcc --version
g++ -v
Our gcc
version is:
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
Then, install your headers. You can think of a header file as a description of a function, its parameters, etc. This forward delcaration is required in C and the Linux kernel is based on C you can thank header files for allowing you to access functions in the kernel:
sudo apt install linux-headers-$(uname -r)
Grab CUDA package from the Nvidia site, being sure to select the appropriate OS, architecture, etc.:
sudo dpkg -i cuda-something_something_version_numbers_blahblah.deb
You will then be prompted to install the GPG public key with which the CUDA install will be verified:
sudo apt-key add /var/cuda-repo-9–0-local/7fa2af80.pub
You'll receive an OK
as confirmation that the public key was successfully added to your keychain.
Then run sudo apt update
followed by sudo apt install cuda
and let the installation run. (You'll see done
and done
as confirmation.) After that step you'll want to hold your CUDA package from being updated sudo apt-mark hold cuda
. (You'll see cuda set on hold
as confirmation.)
Export your PATH variable:
sudo nano .bashrc
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
Verify that the installation was successful:
cat /proc/driver/nvidia/version
You should see something like the following:
data-janitor@datajanitor-desktop:~/Downloads$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.87 Tue Aug 21 12:33:05 PDT 2018
GCC version: gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
Note that we'll need gcc 6
not 7 but we'll install that in a minute...
Install the Nvidia CUDA Compiler by typing sudo apt-get install nvidia-cuda-toolkit
Then check your Nvidia CUDA Compiler version:
nvcc --version
Which should give you something like:
data-janitor@datajanitor-desktop:~/Downloads$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
cuDNN 7.0.5 is compatible with TensorFlow, so we pick that version. You can download the archived 7.0.5 cuDNN libraries for CUDA 9.0 version here: https://developer.nvidia.com/rdp/cuDNN-download . Make sure to download the runtime library, developer library and the code samples. The libraries will be called cuDNN v.7.0.5 [library name] for Ubuntu 16.04'
but don't worry. They work with 18.04.
Note that installing the nvidia-cuda-toolkit
should have installed gcc 6
, but if not... Make sure you have gcc 6
installed:
sudo apt install gcc-6 g++-6
We get:
data-janitor@datajanitor-desktop:~/Downloads$ sudo apt install gcc-6 g++-6
[sudo] password for data-janitor:
Reading package lists... Done
Building dependency tree
Reading state information... Done
g++-6 is already the newest version (6.5.0-2ubuntu1~18.04).
g++-6 set to manually installed.
gcc-6 is already the newest version (6.5.0-2ubuntu1~18.04).
gcc-6 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Navigate to where you downloaded your three packages and start installing:
Runtime file:
sudo dpkg -i libcuDNN7_7.0.5.15–1+cuda9.0_amd64.deb
Developer file:
sudo dpkg -i libcuDNN7-dev_7.0.5.15–1+cuda9.0_amd64.deb
Documentation file (contains the MNIST data to test your cuDNN installation):
sudo dpkg -i libcuDNN7-doc_7.0.5.15–1+cuda9.0_amd64.deb
We installed gcc 6
before, so now let's update the symlinks so that CUDA can find gcc 6
:
sudo ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++
Test your installation using the MNIST data in /usr/src/cudnn_samples_v7/mnistCUDNN
:
cp -r /usr/src/cudnn_samples_v7/ $HOME/cuDNN_test_dir/
cd $HOME/cuDNN_test_dir/cuDNN_samples_v7/mnistcuDNN
make clean && make
Which should - if everything is successfully installed - give you the following back:
data-janitor@datajanitor-desktop:~/cuDNN_test_dir/mnistCUDNN$ make clean && makerm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
Now let's test our cuDNN convolutional neural net using the MNIST numbers dataset:
./mnistcuDNN
Provided you've installed things correctly you'll get a Results of classification
: (some integers) and a Test passed!
response.
Don't forget to hold your packages and delete you sample MNIST project directory:
sudo apt-mark hold libcudnn7 libcudnn7-dev libcudnn7-doc
which confirms that the hold succeeded by returning to you three set on hold
messages.
rm -r cuDNN_test_dir/
Now that we have CUDA and CuDNN working let's 'save' our work by setting up system snapshots in case we break anything in the next step of this walkthrough when we'll be installing Python, a Python package manager (pip
, Anaconda, etc.), and Tensorflow as well as a few other ML and DL frameworks.
Timeshift for Ubuntu leverages rsync
to create backups and restore points for your system.
We type sudo apt-add-repository ppa:teejee2008/ppa
followed by sudo apt-get update
followed by sudo apt-get install timeshift
. Then launch Timeshift from the launcher, enjoy the GUI, and set up your system snapshots and restore points.
Upon starting Timeshift we're prompted to choose a snapshot type. Weirdly, BTRFS is an option that is selectable, but since we're not using BTRFS we can't / shouldn't choose that option as our snapshot type. Thus, we're left with the 'choice' of RSYNC as our snapshot type. Select RSYNC
as our snapshot type and hit 'Next'. You'll see a message "Estimating system size" for a few seconds, followed by a "Select Snapshot Location" screen. Timeshift tells us that devices with Windows file systems (NTFS, FAT, etc.) are not supported. (We have ext4, so we're good to go.) Timeshift also tells us that remote and network locations for saving the backups is not supported. TODO: IN THE FUTURE WE WILL WANT TO MOVE THESE SNAPSHOTS OFF OF OUR MACHINE. CONSIDER ENCRYPTING THEM AND UPLOADING THEM TO GLACIAL STORAGE SOMEWHERE The next screen lets us choose the "Snapshot Levels", i.e. the temporal frequency with which to perform backups (monthly, weekly, daily, hourly, etc.) and how many backups to retain. We choose daily backups and retain the last 5 days' worth of backups.
After some deliberation, we decided to go with (mini)conda
as our package and environment manager. Though we work mostly in Python some project have dependencies outside of the Python ecosystem and since *conda
allows us to manage a wide range of dependicies - even those outside of Python-land - we decided to go with conda
instead of pip
/'PyPI
+ virtualenv
, etc. For me, the advantages of being able to use a multi-language stack for my projects, and to be able to manage all that from one place, means that conda
has more utility to me than using pip
+ virtualenv
for package and environment management. (If for some reason you want to go back to using pip
and virtualenv
this page is a great reminder resource!)
To install (mini)conda
we get the installation shell script directly from Continuum, the original devs of Conda, (ch)
modify the shell script so that our user can execute the script, and then run the script:
wget --secure-protocol=protocol https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh #see man page for protocols
chmod +x Miniconda-latest-Linux-x86_64.sh
./Miniconda-latest-Linux-x86_64.sh
You'll get some big EULA to mash the Enter key 'to read'. Type yes
and Miniconda will be installed to the default directory which is /home/your_user_name/miniconda3
.
Type yes
when prompted to add miniconda3 to your .bashrc file:
Do you wish the installer to initialize Miniconda3
in your /home/data-janitor/.bashrc ? [yes|no]
To create a new virtual environment called my_env
using a specific version of Python (3.4) with some specific versions of packages:
conda create -n my_env python=3.4 scipy=0.15.0 asteroid numpy
If you have a YAML file, you can also create your environment that way too; see 'Creating an environment manually'
conda env create -f environment.yml
You can clone environments as follows:
conda create --name myClone --clone myInitialEnv
You can build identical environments (locally or on another machine) by first finding the specific packages in your extant environment:
conda list --explicit > your_spec_file.txt
lets you produce a spec(ification) list which can then be used to build your environment elsewhere via conda create --name <env> --file your_spec_file.txt
source deactivate
Current env: conda info --envs
All envs: same command as before or conda env list
conda list -n myenv
https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
In this next step we'll get TensorFlow, PyTorch and a few other ML,DL frameworks up and runnning (with GPU suppport!). We create a directory where our projects will live, /home/data-janitor/Documents/work
, and initialize a new environment:
conda create -n dev_env python=3.6 pip numpy scipy pandas matplotlib seaborn
Let those packages install and then type conda source activate dev_env
or conda activate dev_env
. Next we'll install Tensorflow in our dev_env
:
pip3 install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.8.0-cp36-cp36m-linux_x86_64.whl
Check that tensorflow installed first:
conda list | grep tensor
And then, using Tensorflow, give the ol' Hello World a whirl in tf
:
$ python
>>> import tensorflow as tf
>>> hello = tf.constant(‘Hello, TensorFlow!’)
>>> sess = tf.Session()
>>> print(sess.run(hello)) # this should give back b'Hello, TensorFlow!'
$ exit()
Deactivate this env. Create a new env for pytorch. And so on...
(pytorch_env) data-janitor@datajanitor-desktop:~/Documents/work$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.current_device()
0
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7f5acfe313c8>
>>> torch.cuda.get_device_name(0)
'GeForce GTX 1060 3GB'
>>> exit()
(pytorch_env) data-janitor@datajanitor-desktop:~/Documents/work$ source deactivate