Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jganzabal/8e59e3b0f59642dd0b5f2e4de03c7687 to your computer and use it in GitHub Desktop.
Save jganzabal/8e59e3b0f59642dd0b5f2e4de03c7687 to your computer and use it in GitHub Desktop.
How to setup Nvidia Titan XP for deep learning on a MacBook Pro with Akitio Node + Tensorflow + Keras

This configuration worked for me, hope it helps

It is based on: https://becominghuman.ai/deep-learning-gaming-build-with-nvidia-titan-xp-and-macbook-pro-with-thunderbolt2-5ceee7167f8b

and on: https://stackoverflow.com/questions/44744737/tensorflow-mac-os-gpu-support

Hardware

Software versions

  • macOS Sierra Version 10.12.6
  • GPU Driver Version: 10.18.5 (378.05.05.25f01)
  • CUDA Driver Version: 8.0.61
  • cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0: Need to register and download
  • tensorflow-gpu 1.0.0
  • Keras 2.0.8

Procedure:

Install GPU driver

  1. ShutDown your system, power it up again with pressing (⌘ and R) keys until you see , this will let you in Recovery Mode.
  2. From the Menu Bar click Utilities > Terminal and write ‘csrutil disable; reboot’ press enter to execute this command.
  3. When your mac restarted, run this command in Terminal:
cd ~/Desktop; git clone https://github.com/goalque/automate-eGPU.git; chmod +x ~/Desktop/automate-eGPU/automate-eGPU.sh; sudo ~/Desktop/automate-eGPU/./automate-eGPU.sh
  1. Unplug your eGPU from your Mac, and restart. This is important if you did not unplug your eGPU you may end up with black screen after restarting.
  2. When your Mac restarted, Open up Terminal and execute this command: sudo ~/Desktop/automate-eGPU/./automate-eGPU.sh -a
  3. Plug your eGPU to your mac via TH2.
  4. Restart your Mac.
  5. Got to About this Mac / Sytem Report / Graphics/Displays and you should see the Nvidia Card with the correct model.

Install CUDA, cuDNN, Tensorflow and Keras

At this moment, Keras 2.08 needs tensorflow 1.0.0. Tensorflow-gpu 1.0.0 needs CUDA 8.0 and cuDNN v5.1 is the one that worked for me. I tried other combinations but doesn't seem to work

  1. Download and installing CUDA 8.0 CUDA Toolkit 8.0 GA2 (Feb 2017)
  2. Install it and follow the instructions
  3. Set env variables
vim ~/.bash_profile
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH="$CUDA_HOME/lib:$CUDA_HOME:$CUDA_HOME/extras/CUPTI/lib"
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH

(If your bash_profile does not exist, create it. This is executed everytime you open a terminal window) 4. Downloading and installing cuDNN (cudnn-8.0-osx-x64-v5.1) Need to register before downloading it 5. Copy cuDNN files to CUDA

cd ~/Downloads/cuda
sudo cp include/* /usr/local/cuda/include/
sudo cp lib/* /usr/local/cuda/lib/
  1. Create envirenment and install tensorflow
conda create -n egpu python=3
source activate egpu
pip install tensorflow-gpu==1.0.0
  1. Verify it works

Run the following script:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
    
with tf.Session() as sess:
    print (sess.run(c))
  1. Install Keras in the envirenment and set tensorflow as backend:
pip install --upgrade --no-deps keras # Need no-deps flag to prevent from installing tensorflow dependency
KERAS_BACKEND=tensorflow python -c "from keras import backend"
Output:
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.1.dylib. LD_LIBRARY_PATH: /usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/extras/CUPTI/lib
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.8.0.dylib locally
  1. Log after importing keras in Jupyter Notebook:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.1.dylib. LD_LIBRARY_PATH: /usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/extras/CUPTI/lib
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.8.0.dylib locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: TITAN Xp
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:c3:00.0
Total memory: 12.00GiB
Free memory: 11.79GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN Xp, pci bus id: 0000:c3:00.0)
@geoHeil
Copy link

geoHeil commented Oct 4, 2017

sudo ./automate-eGPU.sh

reboot

sudo ./automate-eGPU.sh -a

fails for me in the sense that I only see the generic 256MB Nvidia Chip model and not my 1080 TI being recognized.

@jganzabal
Copy link
Author

What is the output on the second execution with parameter -a?
Can you see this option?
screen shot 2017-10-04 at 6 33 05 pm

@geoHeil
Copy link

geoHeil commented Oct 5, 2017

You are correct.
Differendes are:

  • 1080TI
  • MacBookPro 2017 (TochBar) 15
  • build of osx is 10.12.6 (16G29)
sudo ./automate-eGPU.sh
Password:
Hot-plug the Thunderbolt cable and run the script again.

i.e. I need to plugin the egpu before I can run the script a first time.
Then the output is:

% sudo ./automate-eGPU.sh
***      automate-eGPU.sh v1.0.1      ***
* (c) 2016, 2017 by Goalque & FricoRico *
*****************************************
Detected eGPU
 GP102 [GeForce GTX 1080 Ti]
Current OS X
 10.12.6 16G29
Previous OS X
 [not found]
Latest installed Nvidia web driver
 Version: 378.05.05.25f01
 Source: 3rd Party
 Install Date: 04.10.17, 20:25

Checking IOPCITunnelCompatible keys...

Searching for matching driver...

Driver [378.05.05.25f01] found from:
https://images.nvidia.com/mac/pkg/378/WebDriver-378.05.05.25f01.pkg
Do you want to download this driver (y/n)?
y
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 62.1M  100 62.1M    0     0  9039k      0  0:00:07  0:00:07 --:--:-- 9220k
Driver downloaded.
Removing validation checks...
Modified package ready. Do you want to install (y/n)?
y
installer: Package name is NVIDIA Web Driver 378.05.05.25f01
installer: Upgrading at base path /
installer: The upgrade was successful.
installer: The install requires restarting now.
Checking IOPCITunnelCompatible keys...

IOPCITunnelCompatible mods done.
Board-id added.
Rebuilding caches...
All ready. Please restart the Mac.
    • when unplugging I get a Kernel Freeze - rebooting then without the connected card is just fine.
  1. rerunning with -a has the following output:
 sudo ./automate-eGPU.sh -a
Password:
automate-eGPU-daemon launched.
Background services enabled.
  1. then plugging in the gpu & rebooting - but the card is not identified correctly
  2. uninstall via
sudo ./automate-eGPU.sh -uninstall                                                                                                                  1 ↵
Background services unloaded.
Rebuilding caches...
Automate-eGPU uninstall ready.

After rebooting I usually see:
01_gpu_missing
when the GPU is plugged in and even though only the generic model is recognized this window does not show up.

As outlined by you the NVIDIA WebDriver is already selected:
02_gpu

here a screenshot of the GPU
banners_and_alerts_und_macbook_pro

and the thunderbolt device
macbook_pro_und_downloads

@jganzabal
Copy link
Author

Ohh! I think I see what one of the problems could be:
Your CUDA driver should be: CUDA Driver Version: 8.0.61 to work with Tensorflow and Keras
Anyway, this has nothing to do with the fact that you got "No GPU detected"
I got that problem at one moment and I believe it was fixed when I reboot my system after the -a command with the eGPU plugged.
Before rebooting, make sure you have the NVidia Web Driver selected and not the OSX default...

Hope it helps!

@jganzabal
Copy link
Author

jganzabal commented Oct 5, 2017

Can u also check that your NVidia Web Driver version is this:
screen shot 2017-10-05 at 2 01 12 pm

@geoHeil
Copy link

geoHeil commented Oct 8, 2017

@ganzabal : the driver seems to be the same.

Driver [378.05.05.25f01] found from:
https://images.nvidia.com/mac/pkg/378/WebDriver-378.05.05.25f01.pkg

@j8
Copy link

j8 commented Nov 13, 2017

@geoHeil I'm with the same configuration as yours, any progress on this?

@jganzabal
Copy link
Author

jganzabal commented Nov 19, 2017

@goeHeil, what I realized is that this won't work if you turn on your computer with the eGPU plugged in.
So what I am doing for this to work is to turn the MAC on, plug the eGPU (Thunderbolt cable), restart the MAC. (But not shutting it down, just restarting it)
Yes, it is strange, but it seems as if when the eGPU is plugged, my MAC seems to try to boot with it or something like that.
Hope it helps

@helloniklas
Copy link

So has anyone got this working on macOS 10.13.2

It fails to detect a CUDA device. I've got the eGPU up and running fine. But I suspect this is since NVIDIA specify macOS 10.12 for their CUDA 8 driver. And only CUDA 9 will work with macOS 10.13.2... which then won't work with Tensorflow or MXNET since they all require CUDA 8.

@climberjase
Copy link

climberjase commented Jan 14, 2018

I did get this running. You have to install the latest service patch for OSX from the Apple website, then install the most recent 378.10.10.10.25.104 CUDA drivers from NVIDIA.

screen shot 2018-01-13 at 11 27 03 pm

After I did this, the web drivers will install correctly. I made these changes with csrutil disable set in Recorvery mode.

I have managed to get a signal from the HDMI output on my GTX 1080 and Mantiz Venus to my TV, but the video quality is bad and the drivers are flakey for video on OSX.

However for the purposes of the current question, I was able to get the CUDA drivers working well on an eGPU and a MacBook Pro TouchBar late 2016. Now on to building an GPU enabled Tensorflow 1.2 from source.

Helpful resources:
https://devtalk.nvidia.com/default/topic/1025945/mac-cuda-9-0-driver-fully-compatible-with-macos-high-sierra-10-13-error-quot-update-required-quot-solved-/

@minakhan01
Copy link

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'MatMul': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0

did anyone get this error?

Would really appreciate your help :)

@hengyuan-hu
Copy link

hey, thank you for sharing this. Have you done any benchmark on speed compared against a desktop with the same graphics card? That would be very valuable.

@awwong1
Copy link

awwong1 commented Apr 12, 2018

Thanks for the share! Steps worked okay for me. Minor issues with setting up Tensorflow (I originally tried to build 1.6 from source, but am running into a host of GCC/Clang problems due to OSX quirks).

Do you know if you can re-enable csrutil and still use your GPU after the initial setup process?

@raeidsaqur
Copy link

Did anyone succeed with setting up Titan Xp with Mac OS High Sierra (i.e. >10.13) ? It seems that the web drivers don’t detect the Gpu (akitio node)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment