Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save pavelmalik/d51036d508c8753c86aed1f3ff1e6967 to your computer and use it in GitHub Desktop.
Save pavelmalik/d51036d508c8753c86aed1f3ff1e6967 to your computer and use it in GitHub Desktop.
Install Tensorflow 1.7 on macOS High Sierra 10.13.3 with CUDA and stock python

Tensorflow 1.7 with CUDA on macOS High Sierra 10.13.3 and default python 2.7

Largely based on the Tensorflow 1.6 gist, this should hopefully simplify things a bit. Mixing homebrew python2/python3 with pip ends up being a mess, so here's an approach to uses the built-in python27.

Requirements

  • NVIDIA Web-Drivers 387.10.10.10.25.156 for 10.13.3
  • CUDA-Drivers 387.178
  • CUDA 9.1 Toolkit
  • cuDNN 7.0.5 (latest release for mac os)
  • Python 2.7
  • XCode 8.3.2
  • bazel 0.10.0
  • Tensorflow 1.7

NVIDIA Graphics driver

Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us

NVIDIA Cuda driver

Download and install from http://www.nvidia.com/object/macosx-cuda-387.178-driver.html

Downgrade to XCode 8.3.2

I was able to compile all of it on XCode9, but tensorflow promptly segfaults if you actually try to do anything on the gpu. You may need a developer account to grab the old version https://developer.apple.com/download/more/

If you have newer Xcode installed, rename the XCode.app to something like Xcode9.app Unpack XCode 8.3.2 and switch the tool chain over to it:

sudo xcode-select -s /Applications/Xcode.app

Install Bazel 0.10

Download the binary here

chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh
./bazel-0.10.0-installer-darwin-x86_64.sh

Install CUDA Toolkit 9.1

Download CUDA-9.1

It should be something along the lines of cuda_9.1.128_mac.dmg

Set up your env paths

Edit ~/.bash_profile and add the following:

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib 
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin

you may have to run source ~/.bash_profile to verify the LD paths are set:

source .bash_profile 
echo $LD_LIBRARY

pmalik@MacPro:~$ echo $LD_LIBRARY_PATH
/Users/pmalik/lib:/usr/local/opt/libomp/lib:/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib

Compile Samples

We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.

cd /Developer/NVIDIA/CUDA-9.1/samples
chown -R YOURUSERNAMEHERE *
make -C 1_Utilities/deviceQuery
./Developer/NVIDIA/CUDA-9.1/samples/bin/x86_64/darwin/release/deviceQuery

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 6GB"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6144 MBytes (6442254336 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1709 MHz (1.71 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 195 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

NVIDIA cuDNN - Deep Learning Primitives

If not already done, register at https://developer.nvidia.com/cudnn Download cuDNN 7.0.5

Change into your download directory and follow the post installation steps.

tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*

Install pip for python 2.7

Download get-pip and run it in python. More info here

python get-pip.py

If I remeber correctly, pip will automatically install the tensorflow dependencies (wheel, six etc)

Clone TensorFlow from Repository

git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v1.7.0

Apply Patch

Apply the following patch to fix a couple build issues:

git apply xtensorflow17macos.patch

Configure Build

Except CUDA support, CUDA SDK version and Cuda compute capabilities, I left the other settings untouched.

./configure
You have bazel 0.10.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /Library/Python/2.7/site-packages
Please input the desired Python library path to use.  Default is [/Library/Python/2.7/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1


Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]6.1


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

Build Process

Takes about 20 minutes on my machine

bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

Create wheel file and install it

bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/
pip install ~/tensorflow-1.7.0-cp27-cp27m-macosx_10_13_intel.whl

It's useful to leave the .whl file lying around in case you want to install it for another environment.

Test Installation

See if everything got linked correctly

>>> import tensorflow as tf
>>> tf.Session()
2018-04-05 23:04:20.457912: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-05 23:04:20.458122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:05:00.0
totalMemory: 4.00GiB freeMemory: 2.75GiB
2018-04-05 23:04:20.458143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-05 23:04:20.821699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-05 23:04:20.821728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-05 23:04:20.821736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-05 23:04:20.821856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2467 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
<tensorflow.python.client.session.Session object at 0x10e186990>

Test GPU Acceleration

pip install keras
git clone https://github.com/fchollet/keras.git
cd keras/examples
python mnist_cnn.py
Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-04-05 22:38:30.156464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-05 22:38:30.156645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:05:00.0
totalMemory: 4.00GiB freeMemory: 2.98GiB
2018-04-05 22:38:30.156672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-05 22:38:30.519346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-05 22:38:30.519376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-05 22:38:30.519383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-05 22:38:30.519499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2697 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-04-05 22:38:30.649987: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-04-05 22:38:30.693399: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-04-05 22:38:30.761824: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
59648/60000 [============================>.] - ETA: 0s - loss: 0.2698 - acc: 0.91682018-04-05 22:38:42.071923: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered

You can use cuda-smi to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however.

pmalik@MacPro:~/cuda-smi$ ./cuda-smi 
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 2901.6 of 4095.8 MB (i.e. 70.8%) Free
pmalik@MacPro:~/cuda-smi$ ./cuda-smi 
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 2893.1 of 4095.8 MB (i.e. 70.6%) Free
pmalik@MacPro:~/cuda-smi$ ./cuda-smi 
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 223.86 of 4095.8 MB (i.e. 5.47%) Free
pmalik@MacPro:~/cuda-smi$ ./cuda-smi 
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 97.852 of 4095.8 MB (i.e. 2.39%) Free

Tested on a 2010 Mac Pro (Mid 2010) 10.13.3 (17D47) 2 x 2.93 GHz 6-Core Intel Xeon and NVIDIA GeForce GTX 1050 Ti 4 GB

Misc

If you'd like to build tensorflow with openmp (multi-cpu support), grab the open mp library via homebrew

brew install cliutils/apple/libomp

and uncomment the -lgomp line /third_party/gpus/cuda/BUILD.tpl

Also you can build the binary to your specific cpu architecure, run this to get a list

bazel build --config=cuda  --config=opt --copt=-march=native --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

You can run this command to see what instruction sets are getting built

echo | clang -E - -march=native -###
diff --git a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
index 0f7adaf24a..934ccbada6 100644
--- a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
+++ b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
@@ -69,7 +69,7 @@ __global__ void concat_variable_kernel(
IntType num_inputs = input_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
diff --git a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
index 94989089ec..1d26d4bacb 100644
--- a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
@@ -172,7 +172,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
assert(CanLaunchDepthwiseConv2dGPUSmall(args));
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -452,7 +452,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
assert(CanLaunchDepthwiseConv2dGPUSmall(args));
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1118,7 +1118,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.z));
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1388,7 +1388,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.x));
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
diff --git a/tensorflow/core/kernels/split_lib_gpu.cu.cc b/tensorflow/core/kernels/split_lib_gpu.cu.cc
index 393818730b..58a1294005 100644
--- a/tensorflow/core/kernels/split_lib_gpu.cu.cc
+++ b/tensorflow/core/kernels/split_lib_gpu.cu.cc
@@ -121,7 +121,7 @@ __global__ void split_v_kernel(const T* input_ptr,
int num_outputs = output_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index 0ce5cda517..d4dc2235ac 100644
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -361,11 +361,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
tf_http_archive(
name = "protobuf_archive",
urls = [
- "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
- "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
+ "https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
+ "https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
],
- sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
- strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
+ sha256 = "eb16b33431b91fe8cee479575cee8de202f3626aaf00d9bf1783c6e62b4ffbc7",
+ strip_prefix = "protobuf-50f552646ba1de79e07562b41f3999fe036b4fd0",
)
# We need to import the protobuf library under the names com_google_protobuf
diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl
index 2a37c65bc7..43446dd99b 100644
--- a/third_party/gpus/cuda/BUILD.tpl
+++ b/third_party/gpus/cuda/BUILD.tpl
@@ -110,7 +110,7 @@ cc_library(
".",
"cuda/include",
],
- linkopts = ["-lgomp"],
+ #linkopts = ["-lgomp"],
linkstatic = 1,
visibility = ["//visibility:public"],
)
@yunhwankim2
Copy link

Thank you for your tutorial. I successfully built and installed TF 1.7 following your guide.
One thing I see is the message, "Not found: TF GPU device with id 0 was not registered."
I got same message when I test my installation, but I didn't see the message in previous version of TF.
Can I just ignore the message? Or can you suggest additional guide to get rid of it?
Thank you.

@doncristobal
Copy link

@ZexuanTHU I had the same issue as you (missing libcudart.xy.dylib, don't remember the version number).
After trying out a lot of things, I could solve the problem by disabling SIP on the Mac. See for example https://www.howtogeek.com/230424/how-to-disable-system-integrity-protection-on-a-mac-and-why-you-shouldnt/

After disabling SIP, tensorflow compiled successfully.

@pavelmalik
Copy link
Author

pavelmalik commented Apr 24, 2018

@ZexuanTHU It would seem you're missing the libcuda env path - it's complaining about not being able to find the cuda library. Try running source ~/.bash_profile and then echo $LD_LIBRARY_PATH

source .bash_profile
echo $LD_LIBRARY

pmalik@MacPro:~$ echo $LD_LIBRARY_PATH
/Users/pmalik/lib:/usr/local/opt/libomp/lib:/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib

Here's where the libraries live on my box:

pmalik@MacPro:/usr/local/cuda/lib$ pwd
/usr/local/cuda/lib
pmalik@MacPro:/usr/local/cuda/lib$ ls libcudart*
libcudart.9.1.dylib libcudart.dylib libcudart_static.a

You shouldn't need to disable SIP at all; everything builds on my machine with SIP in place.

@pavelmalik
Copy link
Author

pavelmalik commented Apr 24, 2018

@yunhwankim2 I get the same message as well on my end, haven't had much time to dig into the source to see what it's actually complaining about. All the math still happens on the GPU regardless; GTX 1050 Ti is more than an order of magnitude faster compared to a build optimized for the 12 core xeon cpus.

@Norod
Copy link

Norod commented Apr 29, 2018

@pavelmalik Thank you. This info was very useful.
Following your instructions, I managed to build and install Tensorflow r1.8 RC-1 with only slight modifications.
https://gist.github.com/Norod/1f84448c9ab33dfc5b84787c11c9c100
Cheers 👍

@hstdt
Copy link

hstdt commented May 6, 2018

Thanks for this gist !

Success with macOS 10.12.4, Xcode 8.3.2 :

  1. ./Developer/NVIDIA/CUDA-9.1/samples/bin/x86_64/darwin/release/deviceQuery should be /Developer/NVIDIA/CUDA-9.1/samples/bin/x86_64/darwin/release/deviceQuery

    tensorflow-macos
  2. tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz should be tar -xzvf cudnn-9.1-osx-x64-v7-ga.tar

    tar -xzvf cudnn-9 1-osx-x64-v7-ga
  3. matplotlib must be installed: python -mpip install matplotlib

  4. wheel must be installed: pip install wheel

  5. disable SIP to avoid bazel build error: Library not loaded: @rpath/libcudart.9.1.dylib

  6. pip install ~/tensorflow-1.7.0-cp27-cp27m-macosx_10_13_intel.whl ===> pip install ~/tensorflow-1.7.0-cp27-cp27m-macosx_10_9_x86_64.whl

python2-tensorflow

For Python3:

brew install python3

  1. Replacing all python to python3, all pip to pip3
  2. Please specify the location of python: /usr/local/bin/python3
  3. pip install ~/tensorflow-1.7.0-cp27-cp27m-macosx_10_13_intel.whl ===> pip install ~/tensorflow-1.7.0-cp36-cp36m-macosx_10_12_x86_64.whl

python3-tensorflow

@toli-belo
Copy link

toli-belo commented Oct 10, 2018

Hello All,
I'm having some trouble getting the CUDA samples to run, getting the following error for "deviceQuery"
Any advice would be greatly appreciated!

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
MacOS 10.13.6 (MBP 2017)
NVIDIA Web-Drivers 387.10.10.10.40.105  
CUDA-Drivers 387.178
CUDA 9.1 Toolkit
XCode 8.3.2
GPU Card: MSI RTX 2080 Ti attached through eGPU RAZER X

CUDA preferences window shows:

  CUDA driver Version: 387.128 (no GPU Detected)
  GPU Driver Version: No version found

pwd: /Developer/NVIDIA/CUDA-9.1/samples

>>>>>>>>>>>>>>$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Tue_Dec_19_21:36:29_CST_2017
Cuda compilation tools, release 9.1, V9.1.128

>>>>>>>>>>>>>>$ kextstat | grep -i cuda
  185    0 0xffffff7f86eab000 0x2000     0x2000     com.nvidia.CUDA (1.1.0) 4329B052-6C8A-3900-8E83-744487AEDEF1 <4 1>

here is the output of deviceQuery

>>>>>>>>>>>>>>$ ./bin/x86_64/darwin/release/deviceQuery
./bin/x86_64/darwin/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

screen shot 2018-10-09 at 10 30 00 pm

@Referor
Copy link

Referor commented Oct 24, 2018

FAILED: Build did NOT complete successfully
This error comes because https://github.com/dtrebbien (from patch) delete his account and we can't download anything.
I finded solution tensorflow/tensorflow#17067 (comment) here
We need to replace tf_http_archive section in patch file with this text:

tf_http_archive(
      name = "protobuf_archive",
      urls = [
-          "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
-          "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
+          "https://mirror.bazel.build/github.com/dinever/protobuf/archive/188578878eff18c2148baba0e116d87ce8f49410.tar.gz",
+          "https://github.com/dinever/protobuf/archive/188578878eff18c2148baba0e116d87ce8f49410.tar.gz",
      ],
-      sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
-      strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
+      sha256 = "7a1d96ccdf7131535828cad737a76fd65ed766e9511e468d0daa3cc4f3db5175",
+      strip_prefix = "protobuf-188578878eff18c2148baba0e116d87ce8f49410",
  )

@colmantse
Copy link

hi can i confirm "Not found: TF GPU device with id 0 was not registered." can be ignored?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment