mangreen/Build tensorflow on OSX with NVIDIA CUDA support.md

## Build tensorflow on OSX with NVIDIA CUDA support.md

      
    Raw
  

              Build tensorflow on OSX with NVIDIA CUDA support.md
            
          
    Build tensorflow on OSX with NVIDIA CUDA support (GPU acceleration)

These instructions are based on Mistobaan's
gist
but expanded and updated to work with the
latest tensorflow OSX CUDA PR.
Requirements

OS X 10.10 (Yosemite) or newer

I tested these intructions on OS X v10.10.5. They will probably work on
OS X v10.11 (El Capitan), too.
Xcode Command-Line Tools

These instructions assume you have Xcode installed and your machine is already set up
to compile c/c++ code.
If not, simply type gcc into a terminal and it will prompt you to download and
install the Xcode Command-Line Tools.
homebrew

To compile tensorflow on OS X, you need several dependent libraries. The easiest way to
get them is to install them with the homebrew package manager.
If you don't already have brew installed, you can install it like this:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

If you don't want to blindly run a ruby script loaded from the internet, they have
alternate install options.
coreutils, swig, bazel

First, make sure you have brew up to date with the latest available packages:
brew update
brew upgrade
Then install these tools:
brew install coreutils
brew install swig
brew install bazel
Check the version to make sure you installed bazel 0.1.4 or greater.
bazel 0.1.3 or below will fail when building tensorflow.
$ bazel version

Build label: 0.1.4-homebrew
NVIDIA's CUDA libraries

Also installed from brew:
brew cask install cuda
Check the version to make sure you installed CUDA 7.5. Older versions will fail.
$ brew cask info cuda

cuda: 7.5.20
Nvidia CUDA
NVIDIA's cuDNN library

NVIDIA requires you to sign up and be approved before you can download this.
First, go sign up here:
https://developer.nvidia.com/accelerated-computing-developer
When you sign up, make sure you provide accurate information. A human at NVIDIA will
review your application. If it's a business day, hopefully you'll get approved quickly.
Then go here to download cuDNN:
https://developer.nvidia.com/cudnn
Click 'Download' to fill out their survey and agree to their Terms.
Finally, you'll see the download options.
However, you'll only see download options for cuDNN v4 and cuDNN v3. You'll want to
scroll to the very bottom and click "Archived cuDNN Releases".
This will take you to this page where you can download cuDNN v2:
https://developer.nvidia.com/rdp/cudnn-archive
On that page, download "cuDNN v2 Library for OSX".
Next, tou need to manually install it by copying over some files:
tar zxvf ~/Downloads/cudnn-6.5-osx-v2.tar.gz
sudo cp ./cudnn-6.5-osx-v2/cudnn.h /usr/local/cuda/include/
sudo cp ./cudnn-6.5-osx-v2/libcudnn* /usr/local/cuda/lib/

Finally, you need to make sure the library is in your library load path.
Edit your ~/.bash_profile file and add this line at the bottom:
export DYLD_LIBRARY_PATH="/usr/local/cuda/lib":$DYLD_LIBRARY_PATH

After that, close and reopen your terminal window to apply the change.
Checkout tensorflow

Since OS X CUDA support is still an unmerged pull request
(#664), you need to check
out that specific branch:
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow
git fetch origin pull/664/head:cuda_osx
git checkout cuda_osx

Look up your NVIDIA card's Graphics Capability on the CUDA website

Before you start, open up System Report in OSX:
Apple Menu > About this Mac > System Report...

In System Report, click on "Graphics/Displays" and find out the exact model
NVIDIA card you have:
NVIDIA GeForce GT 650M:

  Chipset Model:	NVIDIA GeForce GT 650M

Then go to https://developer.nvidia.com/cuda-gpus and find that exact model
name in the list:
 CUDA-Enabled GeForce Products > GeForce GT 650M

There it will list the Compute Capability for your card. For the GeForce GT 650M
used in late 2011 Macbook Pro Retinas, it is 	3.0. Write this down as it's
critical to have this number for the next step.
Configure and Build tensorflow

You will first need to configure the tensorflow build options:
TF_UNOFFICIAL_SETTING=1 ./configure

During the config process, it will ask you a bunch of questions. You can use
the answers below except make sure to use the Compute Capability for your NVIDIA card
you looked up in the previous step:
WARNING: You are configuring unofficial settings in TensorFlow. Because some external libraries are not backward compatible, these settings are largely untested and unsupported.

Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify the Cuda SDK version you want to use. [Default is 7.0]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Default is 6.5]:
Please specify the location where cuDNN 6.5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0
Setting up Cuda include
Setting up Cuda lib
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished

Now you can actually build and install tensorflow!
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.6.0-py2-none-any.whl

Verify Installaion

You need to exit the tensorflow build folder to test your installation.
cd ~

Now, run python and paste in this test script:
import tensorflow as tf

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

# Runs the op.
print sess.run(c)
You should get output that looks something like this:
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.6.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.7.5.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 452.21MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:705] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0
I tensorflow/core/common_runtime/direct_session.cc:142] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0
b: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] MatMul: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 252.21MiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 0 memory begins at 0x700a80000 extends to 0x7106b6000

[[ 22.  28.]
 [ 49.  64.]]

Yay! Now you can train your models using a GPU!
If you are using a Retina Macbook Pro with only a 1GB GeForce 650M, you
will probably run into Out of Memory errors with medium to large models. But at
least it will make small-scale experimentation faster.