Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@Willian-Zhang
Last active May 12, 2022 04:40
Show Gist options
  • Star 23 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b to your computer and use it in GitHub Desktop.
Save Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b to your computer and use it in GitHub Desktop.
Install TensorFlow 1.8 on macOS High Sierra 10.13.4 with CUDA

Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.4

Largely based on the Tensorflow 1.6 gist, and Tensorflow 1.7 gist for xcode and Tensorflow 1.7 gist for eGPU, this should hopefully simplify things a bit.

Requirements

  • NVIDIA Web-Drivers 387.10.10.10.30.106 for 10.13.4 (17E199) (w/o Security Update)
  • CUDA-Drivers 387.128
  • CUDA 9.1 Toolkit
  • cuDNN 7.0.5 (latest for macOS)
  • NCCL 2.1.15 (latest for macOS)
  • Python 2.7
  • XCode 8.2
  • bazel stable 0.13.0 (latest on HomeBrew)
  • Tensorflow 1.8 Source Code

eGPU Only

Checkout eGPU setup before install (required for eGPU, ignore if other)

If you don't know how to setup eGPU on Mac checkout these step. Make sure you have eGPU working before installation. (You sould see your specific graphic card name in Apple > About this Mac > System Report ... > Graphics/Displays)

The rest steps are the same as normal GPU setup.

Prepare

Check and use pre-compiliation (Optional, Risky, Please Skip if you don't understand)

If you are like me using MacBook Pro (15-inch, 2016) runing 10.13.4 (17E199) and eGPU: NVIDIA GeForce GTX 1080 Ti 11 GiB (or any 6.1 compatible version in nvidia page). You could, at your own risk, skip the Prepare and Compile steps below, download .whl from here and install it:

pip install tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

And be sure to test after installation. But remember this is not safe.

Install Homwbrew (Optional)

For package management, ignore if you have your own python, wget or you want to download manually.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install wget

NVIDIA Graphics driver

Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us

NVIDIA Cuda driver

Download and install from http://www.nvidia.com/object/macosx-cuda-387.178-driver.html

Install XCode 8.2

Download and from XCode_8.2.xip. Or Find XCode 8.2 on https://developer.apple.com/download/more/

Unarchive and rename XCode.app to Xcode8.2.app in case you want to build and use it next time.

Install Bazel

If you have Homebrew installed

brew install bazel

or Download the binary here

chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh
./bazel-0.10.0-installer-darwin-x86_64.sh

Install CUDA Toolkit 9.1

Download CUDA-9.1

It should be something along the lines of cuda_9.1.128_mac.dmg

Install NCCL

Download NCCL 2.1.15 O/S agnostic and CUDA 9 from NVdia.

Unarchive it and move to a permanant place e.g. /usr/local/nccl.

sudo mkdir -p /usr/local/nccl
cd nccl_2.1.15-1+cuda9.1_x86_64
sudo mv * /usr/local/nccl
sudo mkdir -p /usr/local/include/third_party/nccl
sudo ln -s /usr/local/nccl/include/nccl.h /usr/local/include/third_party/nccl

Set up your env paths

Edit ~/.bash_profile and add the following:

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib 
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin

Compile Samples

We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.

cd /Developer/NVIDIA/CUDA-9.1/samples
chown -R $(whoami) *
make -C 1_Utilities/deviceQuery
./bin/x86_64/darwin/release/deviceQuery
 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11264 MBytes (11810963456 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1645 MHz (1.64 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 196 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS

NVIDIA cuDNN - Deep Learning Primitives

If not already done, register at https://developer.nvidia.com/cudnn Download cuDNN 7.0.5

Change into your download directory and follow the post installation steps.

tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*

Install pip for python 2.7 (Optional)

Skip if you have your own idea of which python/pip to use:

$ which python
/usr/local/bin/python
$ which pip
/usr/local/bin/pip

Or Download get-pip and run it in python. More info here

python get-pip.py

pip will automatically install the tensorflow dependencies (wheel, six etc), if don't you could install them manually.

Compile

Clone TensorFlow from Repository

cd /tmp
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v1.8.0

Apply Patch

Apply the following patch to fix a couple build issues:

wget https://gist.githubusercontent.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b/raw/xtensorflow18macos.patch
git apply xtensorflow18macos.patch

Configure Build

Except CUDA support, CUDA SDK version and Cuda compute capabilities, I left the other settings untouched.

Pay attension to Cuda compute capabilities, you might want to find your own according to guide.

./configure
You have bazel 0.10.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /Library/Python/2.7/site-packages
Please input the desired Python library path to use.  Default is [/Library/Python/2.7/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]:
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]:
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]:
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]:
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]:
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1


Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] (type your own, check on https://developer.nvidia.com/cuda-gpus, mine is 6.1 for GTX 1080 Ti)


Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

Build Process

Takes about 47 minutes on my machine.

bazel clean
bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

Create wheel file and install it

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
ls /tmp/tensorflow_pkg
tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

If you want to use virtualenv or something, now is the time. Or just:

pip install /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

Backup your wheel if nothing goes wrong (Optional)

Files in /tmp would be cleaned after reboot.

cp /tmp/tensorflow_pkg/*.whl ~/

It's useful to leave the .whl file lying around in case you want to install it for another environment.

Test Installation

See if everything got linked correctly

cd ~
python
>>> import tensorflow as tf
>>> tf.Session()
2018-04-08 03:25:15.740635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-08 03:25:15.741260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:c4:00.0
totalMemory: 11.00GiB freeMemory: 10.18GiB
2018-04-08 03:25:15.741288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-08 03:25:16.157590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-08 03:25:16.157614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-08 03:25:16.157620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-08 03:25:16.157753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9849 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
<tensorflow.python.client.session.Session object at 0x10968ef60>
Try out new Tensorflow feature (Optional)
python
import tensorflow as tf
tf.enable_eager_execution()
tf.executing_eagerly()        # => True

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))  # => "hello, [[4.]]"

Test GPU Acceleration

pip install keras
wget https://gist.githubusercontent.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py
python mnist_cnn.py
/usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-05-11 04:51:10.335377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-05-11 04:51:10.336052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:c4:00.0
totalMemory: 11.00GiB freeMemory: 9.37GiB
2018-05-11 04:51:10.336075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-11 04:51:11.063831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-11 04:51:11.063856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-05-11 04:51:11.063864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-05-11 04:51:11.064768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9065 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
2018-05-11 04:51:11.534095: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-05-11 04:51:11.579370: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-05-11 04:51:11.644835: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
59264/60000 [============================>.] - ETA: 0s - loss: 0.2604 - acc: 0.92082018-05-11 04:51:19.228205: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
60000/60000 [==============================] - 10s 159us/step - loss: 0.2588 - acc: 0.9213 - val_loss: 0.0561 - val_acc: 0.9829
Epoch 2/12
60000/60000 [==============================] - 4s 66us/step - loss: 0.0875 - acc: 0.9742 - val_loss: 0.0427 - val_acc: 0.9857
Epoch 3/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0662 - acc: 0.9803 - val_loss: 0.0356 - val_acc: 0.9875
Epoch 4/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0549 - acc: 0.9839 - val_loss: 0.0325 - val_acc: 0.9896
Epoch 5/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0471 - acc: 0.9859 - val_loss: 0.0309 - val_acc: 0.9901
Epoch 6/12
60000/60000 [==============================] - 4s 68us/step - loss: 0.0421 - acc: 0.9873 - val_loss: 0.0297 - val_acc: 0.9903
Epoch 7/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0377 - acc: 0.9884 - val_loss: 0.0259 - val_acc: 0.9908
Epoch 8/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0357 - acc: 0.9883 - val_loss: 0.0285 - val_acc: 0.9908
Epoch 9/12
60000/60000 [==============================] - 4s 68us/step - loss: 0.0315 - acc: 0.9904 - val_loss: 0.0327 - val_acc: 0.9901
Epoch 10/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0288 - acc: 0.9910 - val_loss: 0.0272 - val_acc: 0.9911
Epoch 11/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0282 - acc: 0.9912 - val_loss: 0.0248 - val_acc: 0.9920
Epoch 12/12
60000/60000 [==============================] - 4s 66us/step - loss: 0.0255 - acc: 0.9923 - val_loss: 0.0283 - val_acc: 0.9912
Test loss: 0.028254894825743667
Test accuracy: 0.9912

You can use cuda-smi to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however.

$ cuda-smi
Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 10350 of 11264 MB (i.e. 91.9%) Free
# when GPU
$ cuda-smi
Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 1181.1 of 11264 MB (i.e. 10.5%) Free

Tested on a MacBook Pro (15-inch, 2016) 10.13.4 (17E199) 2.7 GHz Intel Core i7 and NVIDIA GeForce GTX 1080 Ti 11 GiB

diff --git a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
index 0f7adaf24a..934ccbada6 100644
--- a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
+++ b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
@@ -69,7 +69,7 @@ __global__ void concat_variable_kernel(
IntType num_inputs = input_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
diff --git a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
index 94989089ec..1d26d4bacb 100644
--- a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
@@ -172,7 +172,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
assert(CanLaunchDepthwiseConv2dGPUSmall(args));
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -452,7 +452,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
assert(CanLaunchDepthwiseConv2dGPUSmall(args));
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1118,7 +1118,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.z));
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1388,7 +1388,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.x));
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
diff --git a/tensorflow/core/kernels/split_lib_gpu.cu.cc b/tensorflow/core/kernels/split_lib_gpu.cu.cc
index 393818730b..58a1294005 100644
--- a/tensorflow/core/kernels/split_lib_gpu.cu.cc
+++ b/tensorflow/core/kernels/split_lib_gpu.cu.cc
@@ -121,7 +121,7 @@ __global__ void split_v_kernel(const T* input_ptr,
int num_outputs = output_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index 0ce5cda517..d4dc2235ac 100644
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -361,11 +361,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
tf_http_archive(
name = "protobuf_archive",
urls = [
- "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
- "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
+ "https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
+ "https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
],
- sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
- strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
+ sha256 = "eb16b33431b91fe8cee479575cee8de202f3626aaf00d9bf1783c6e62b4ffbc7",
+ strip_prefix = "protobuf-50f552646ba1de79e07562b41f3999fe036b4fd0",
)
# We need to import the protobuf library under the names com_google_protobuf
diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl
index 2a37c65bc7..43446dd99b 100644
--- a/third_party/gpus/cuda/BUILD.tpl
+++ b/third_party/gpus/cuda/BUILD.tpl
@@ -110,7 +110,7 @@ cc_library(
".",
"cuda/include",
],
- linkopts = ["-lgomp"],
+ #linkopts = ["-lgomp"],
linkstatic = 1,
visibility = ["//visibility:public"],
)
@ambowater
Copy link

ambowater commented Jun 19, 2018

Hi, The patch throw an error on the latest line. Not sure why.
corrupt patch at line 99.

@requeima
Copy link

Add an extra line to the end of the file. This seemed to be the only difference between this patch and Willian-Zhang's tf1.7 patch (https://gist.github.com/Willian-Zhang/088e017774536880bd425178b46b8c17). Worked for me.

@alvaromuir
Copy link

getting
Symbol not found: _ncclAllReduce
tried to fix and recompile.
no dice.

@megagosha
Copy link

Will this work with all 1080ti GPUs? Gigabyte? Or this is not vendor specific solution?

@fmoo7
Copy link

fmoo7 commented Jul 7, 2018

Hi,

I am getting this error:
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/coredump/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/_nccl_ops.so, 6): Symbol not found: _ncclAllReduce
Referenced from: /Users/coredump/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/_nccl_ops.so
Expected in: flat namespace
in /Users/coredump/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/_nccl_ops.so

NCCL was install according written instructions.

@clarencejlee
Copy link

I got this working on macOS 10.13.5 (with Titan X Pascal and Sonnet Breakaway 550) with very little modification. Happy to share notes with you if you'd like.

When running the mnist-cnn.py script, I saw the grappler warning "Not found: TF GPU device with id 0 was not registered" as well. However, the GPU acceleration is definitely working as it's only taking 4s per epoch (as compared to minutes when doing it on CPU). One quick question: did you ever observe a "CUDA_ERROR_OUT_OF_MEMORY" warning?

2018-07-12 09:56:17.750631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11425 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:c4:00.0, compute capability: 6.1)
2018-07-12 09:56:17.750895: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 11.16G (11980989184 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-07-12 09:56:17.750995: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 10.04G (10782889984 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Things still run, but curious why this is happening and if there is an easy fix. Will share config if that's helpful.

@smoothdvd
Copy link

@clarencejlee Do you fix this issue? I also get this issue but script exit.

2018-07-30 10:34:38.688661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-07-30 10:34:38.688838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:c2:00.0
totalMemory: 8.00GiB freeMemory: 7.79GiB
2018-07-30 10:34:38.688865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-30 10:34:39.015266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-30 10:34:39.015288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-07-30 10:34:39.015292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-07-30 10:34:39.015385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7528 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:c2:00.0, compute capability: 6.1)
2018-07-30 10:34:39.015714: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 7.35G (7893910016 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-07-30 10:34:39.015861: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 6.62G (7104518656 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-07-30 10:34:39.872133: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
Segmentation fault: 11

@smoothdvd
Copy link

python crash report:

Process:               python3.6 [37659]
Path:                  /Users/USER/*/python3.6
Identifier:            python3.6
Version:               ???
Code Type:             X86-64 (Native)
Parent Process:        bash [3766]
Responsible:           python3.6 [37659]
User ID:               501

Date/Time:             2018-07-31 14:05:55.731 +0800
OS Version:            Mac OS X 10.13.6 (17G65)
Report Version:        12
Anonymous UUID:        9F87392A-5C0F-9213-42BD-FACC1EEE0FFB


Time Awake Since Boot: 5500 seconds

System Integrity Protection: disabled

Crashed Thread:        17

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       EXC_I386_GPFLT
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [0]

Thread 0:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120956f3b nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) + 283
4   libtensorflow_framework.so    	0x00000001209537b7 nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) + 423
5   libtensorflow_framework.so    	0x0000000120953f21 nsync::nsync_cv_wait(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*) + 49
6   _pywrap_tensorflow_internal.so	0x000000011373acfb tensorflow::DirectSession::WaitForNotification(tensorflow::Notification*, long long) + 155
7   _pywrap_tensorflow_internal.so	0x00000001137312d6 tensorflow::DirectSession::WaitForNotification(tensorflow::DirectSession::RunState*, tensorflow::CancellationManager*, long long) + 38
8   _pywrap_tensorflow_internal.so	0x0000000113730bba tensorflow::DirectSession::RunInternal(long long, tensorflow::RunOptions const&, tensorflow::CallFrameInterface*, tensorflow::DirectSession::ExecutorsAndKeys*, tensorflow::RunMetadata*) + 2090
9   _pywrap_tensorflow_internal.so	0x0000000113731981 tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, tensorflow::Tensor>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, tensorflow::Tensor> > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*) + 1473
10  _pywrap_tensorflow_internal.so	0x000000011082069e TF_Run_Helper(tensorflow::Session*, char const*, TF_Buffer const*, std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, tensorflow::Tensor>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, tensorflow::Tensor> > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, TF_Tensor**, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, TF_Buffer*, TF_Status*) + 942
11  _pywrap_tensorflow_internal.so	0x000000011082bf71 TF_SessionRun + 1313
12  _pywrap_tensorflow_internal.so	0x000000011051d2ea tensorflow::TF_SessionRun_wrapper_helper(TF_Session*, char const*, TF_Buffer const*, std::__1::vector<TF_Output, std::__1::allocator<TF_Output> > const&, std::__1::vector<_object*, std::__1::allocator<_object*> > const&, std::__1::vector<TF_Output, std::__1::allocator<TF_Output> > const&, std::__1::vector<TF_Operation*, std::__1::allocator<TF_Operation*> > const&, TF_Buffer*, TF_Status*, std::__1::vector<_object*, std::__1::allocator<_object*> >*) + 794
13  _pywrap_tensorflow_internal.so	0x000000011051d788 tensorflow::TF_SessionRun_wrapper(TF_Session*, TF_Buffer const*, std::__1::vector<TF_Output, std::__1::allocator<TF_Output> > const&, std::__1::vector<_object*, std::__1::allocator<_object*> > const&, std::__1::vector<TF_Output, std::__1::allocator<TF_Output> > const&, std::__1::vector<TF_Operation*, std::__1::allocator<TF_Operation*> > const&, TF_Buffer*, TF_Status*, std::__1::vector<_object*, std::__1::allocator<_object*> >*) + 40
14  _pywrap_tensorflow_internal.so	0x00000001104dafbb _wrap_TF_SessionRun_wrapper(_object*, _object*) + 1339
15  python                        	0x00000001048170e6 _PyCFunction_FastCallDict + 166
16  python                        	0x00000001048a018e call_function + 478
17  python                        	0x0000000104899173 _PyEval_EvalFrameDefault + 4851
18  python                        	0x00000001048a13b8 fast_function + 568
19  python                        	0x00000001048a0169 call_function + 441
20  python                        	0x0000000104899173 _PyEval_EvalFrameDefault + 4851
21  python                        	0x00000001048a0c66 _PyEval_EvalCodeWithName + 2566
22  python                        	0x0000000104897e37 PyEval_EvalCodeEx + 55
23  python                        	0x00000001047f5e8e function_call + 350
24  python                        	0x00000001047cdc15 PyObject_Call + 101
25  python                        	0x00000001048994c7 _PyEval_EvalFrameDefault + 5703
26  python                        	0x00000001048a0c66 _PyEval_EvalCodeWithName + 2566
27  python                        	0x00000001048a1459 fast_function + 729
28  python                        	0x00000001048a0169 call_function + 441
29  python                        	0x0000000104899173 _PyEval_EvalFrameDefault + 4851
30  python                        	0x00000001048a0c66 _PyEval_EvalCodeWithName + 2566
31  python                        	0x00000001048a1459 fast_function + 729
32  python                        	0x00000001048a0169 call_function + 441
33  python                        	0x0000000104899173 _PyEval_EvalFrameDefault + 4851
34  python                        	0x00000001048a13b8 fast_function + 568
35  python                        	0x00000001048a0169 call_function + 441
36  python                        	0x0000000104899173 _PyEval_EvalFrameDefault + 4851
37  python                        	0x00000001048a0c66 _PyEval_EvalCodeWithName + 2566
38  python                        	0x00000001048a1459 fast_function + 729
39  python                        	0x00000001048a0169 call_function + 441
40  python                        	0x0000000104899173 _PyEval_EvalFrameDefault + 4851
41  python                        	0x00000001048a13b8 fast_function + 568
42  python                        	0x00000001048a0169 call_function + 441
43  python                        	0x0000000104899173 _PyEval_EvalFrameDefault + 4851
44  python                        	0x00000001048a0c66 _PyEval_EvalCodeWithName + 2566
45  python                        	0x00000001048a1459 fast_function + 729
46  python                        	0x00000001048a0169 call_function + 441
47  python                        	0x00000001048991fd _PyEval_EvalFrameDefault + 4989
48  python                        	0x00000001048a0c66 _PyEval_EvalCodeWithName + 2566
49  python                        	0x0000000104897df0 PyEval_EvalCode + 48
50  python                        	0x00000001048cc0e9 PyRun_FileExFlags + 185
51  python                        	0x00000001048cb70d PyRun_SimpleFileExFlags + 285
52  python                        	0x00000001048e5c33 Py_Main + 3427
53  python                        	0x00000001047c2718 main + 248
54  libdyld.dylib                 	0x00007fff7efb4015 start + 1

Thread 1:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libopenblasp-r0.3.0.dev.dylib 	0x00000001057ff72b blas_thread_server + 187
3   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
4   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
5   libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 2:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libopenblasp-r0.3.0.dev.dylib 	0x00000001057ff72b blas_thread_server + 187
3   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
4   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
5   libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 3:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libopenblasp-r0.3.0.dev.dylib 	0x00000001057ff72b blas_thread_server + 187
3   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
4   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
5   libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 4:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012052fad8 Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 568
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 5:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 6:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 7:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 8:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 9:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012052fad8 Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 568
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 10:
0   libsystem_kernel.dylib        	0x00007fff7f0fb20a mach_msg_trap + 10
1   libsystem_kernel.dylib        	0x00007fff7f0fa724 mach_msg + 60
2   libcuda_387.10.10.10_mercury.dylib	0x000000014773c54e 0x1475dc000 + 1443150
3   libcuda_387.10.10.10_mercury.dylib	0x0000000147797d3c 0x1475dc000 + 1817916
4   libcuda_387.10.10.10_mercury.dylib	0x000000014773e0d9 0x1475dc000 + 1450201
5   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
6   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
7   libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 11:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libcuda_387.10.10.10_mercury.dylib	0x000000014773df37 0x1475dc000 + 1449783
3   libcuda_387.10.10.10_mercury.dylib	0x00000001476f578c 0x1475dc000 + 1152908
4   libcuda_387.10.10.10_mercury.dylib	0x000000014773e0d9 0x1475dc000 + 1450201
5   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
6   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
7   libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 12:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120956f3b nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) + 283
4   libtensorflow_framework.so    	0x00000001209537b7 nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) + 423
5   libtensorflow_framework.so    	0x0000000120953f21 nsync::nsync_cv_wait(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*) + 49
6   libtensorflow_framework.so    	0x0000000120951c8d tensorflow::EventMgr::PollLoop() + 157
7   libtensorflow_framework.so    	0x000000012053009f Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 2047
8   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
9   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
10  libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
11  libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
12  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 13:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 14:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 15:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 16:
0   libsystem_kernel.dylib        	0x00007fff7f104a16 __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff7f2cd589 _pthread_cond_wait + 732
2   libc++.1.dylib                	0x00007fff7cf08cb0 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
3   libtensorflow_framework.so    	0x0000000120530836 Eigen::EventCount::CommitWait(Eigen::EventCount::Waiter*) + 278
4   libtensorflow_framework.so    	0x00000001205304ac Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) + 828
5   libtensorflow_framework.so    	0x000000012053006a Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 1994
6   libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
7   libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
8   libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
9   libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
10  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 17 Crashed:
0   libtensorflow_framework.so    	0x0000000120952ad0 void tensorflow::gtl::InlinedVector<tensorflow::EventMgr::InUse, 4>::emplace_back<tensorflow::EventMgr::InUse const&>(tensorflow::EventMgr::InUse const&&&) + 176
1   libtensorflow_framework.so    	0x0000000120951e7c tensorflow::EventMgr::PollEvents(bool, tensorflow::gtl::InlinedVector<tensorflow::EventMgr::InUse, 4>*) + 300
2   libtensorflow_framework.so    	0x00000001208d7612 tensorflow::EventMgr::ThenExecute(perftools::gputools::Stream*, std::__1::function<void ()>) + 194
3   libtensorflow_framework.so    	0x00000001208d813e tensorflow::GPUUtil::CopyCPUTensorToGPU(tensorflow::Tensor const*, tensorflow::DeviceContext const*, tensorflow::Device*, tensorflow::Tensor*, std::__1::function<void (tensorflow::Status const&)>) + 718
4   libtensorflow_framework.so    	0x00000001208d9fd5 tensorflow::GPUDeviceContext::CopyCPUTensorToDevice(tensorflow::Tensor const*, tensorflow::Device*, tensorflow::Tensor*, std::__1::function<void (tensorflow::Status const&)>) const + 117
5   libtensorflow_framework.so    	0x00000001208fcc05 tensorflow::(anonymous namespace)::CopyHostToDevice(tensorflow::Tensor const*, tensorflow::Allocator*, tensorflow::Allocator*, tensorflow::StringPiece, tensorflow::Device*, tensorflow::Tensor*, tensorflow::DeviceContext*, std::__1::function<void (tensorflow::Status const&)>) + 437
6   libtensorflow_framework.so    	0x00000001208fbde2 tensorflow::CopyTensor::ViaDMA(tensorflow::StringPiece, tensorflow::DeviceContext*, tensorflow::DeviceContext*, tensorflow::Device*, tensorflow::Device*, tensorflow::AllocatorAttributes, tensorflow::AllocatorAttributes, tensorflow::Tensor const*, tensorflow::Tensor*, std::__1::function<void (tensorflow::Status const&)>) + 3378
7   libtensorflow_framework.so    	0x0000000120938022 tensorflow::IntraProcessRendezvous::SameWorkerRecvDone(tensorflow::Rendezvous::ParsedKey const&, tensorflow::Rendezvous::Args const&, tensorflow::Rendezvous::Args const&, tensorflow::Tensor const&, tensorflow::Tensor*, std::__1::function<void (tensorflow::Status const&)>) + 1074
8   libtensorflow_framework.so    	0x0000000120938acd std::__1::__function::__func<tensorflow::IntraProcessRendezvous::RecvAsync(tensorflow::Rendezvous::ParsedKey const&, tensorflow::Rendezvous::Args const&, std::__1::function<void (tensorflow::Status const&, tensorflow::Rendezvous::Args const&, tensorflow::Rendezvous::Args const&, tensorflow::Tensor const&, bool)>)::$_0, std::__1::allocator<tensorflow::IntraProcessRendezvous::RecvAsync(tensorflow::Rendezvous::ParsedKey const&, tensorflow::Rendezvous::Args const&, std::__1::function<void (tensorflow::Status const&, tensorflow::Rendezvous::Args const&, tensorflow::Rendezvous::Args const&, tensorflow::Tensor const&, bool)>)::$_0>, void (tensorflow::Status const&, tensorflow::Rendezvous::Args const&, tensorflow::Rendezvous::Args const&, tensorflow::Tensor const&, bool)>::operator()(tensorflow::Status const&, tensorflow::Rendezvous::Args const&, tensorflow::Rendezvous::Args const&, tensorflow::Tensor const&, bool&&) + 813
9   libtensorflow_framework.so    	0x000000012041a50c tensorflow::LocalRendezvousImpl::Send(tensorflow::Rendezvous::ParsedKey const&, tensorflow::Rendezvous::Args const&, tensorflow::Tensor const&, bool) + 380
10  libtensorflow_framework.so    	0x0000000120937a92 tensorflow::IntraProcessRendezvous::Send(tensorflow::Rendezvous::ParsedKey const&, tensorflow::Rendezvous::Args const&, tensorflow::Tensor const&, bool) + 162
11  _pywrap_tensorflow_internal.so	0x0000000113993e2a tensorflow::SendOp::Compute(tensorflow::OpKernelContext*) + 794
12  libtensorflow_framework.so    	0x0000000120945e56 tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) + 342
13  libtensorflow_framework.so    	0x000000012090bd7d tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) + 5181
14  libtensorflow_framework.so    	0x000000012091524a std::__1::__function::__func<std::__1::__bind<void (tensorflow::(anonymous namespace)::ExecutorState::*)(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long), tensorflow::(anonymous namespace)::ExecutorState*, tensorflow::(anonymous namespace)::ExecutorState::TaggedNode const&, long long&>, std::__1::allocator<std::__1::__bind<void (tensorflow::(anonymous namespace)::ExecutorState::*)(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long), tensorflow::(anonymous namespace)::ExecutorState*, tensorflow::(anonymous namespace)::ExecutorState::TaggedNode const&, long long&> >, void ()>::operator()() + 58
15  libtensorflow_framework.so    	0x000000012053009f Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) + 2047
16  libtensorflow_framework.so    	0x000000012052f79f std::__1::__function::__func<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'(), std::__1::allocator<tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()>, void ()>::operator()() + 47
17  libtensorflow_framework.so    	0x0000000120556530 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void ()> > >(void*) + 48
18  libsystem_pthread.dylib       	0x00007fff7f2cc661 _pthread_body + 340
19  libsystem_pthread.dylib       	0x00007fff7f2cc50d _pthread_start + 377
20  libsystem_pthread.dylib       	0x00007fff7f2cbbf9 thread_start + 13

Thread 17 crashed with X86 Thread State (64-bit):
  rax: 0x0000ffffffffffff  rbx: 0x000070000f82a2a8  rcx: 0x0000000000000004  rdx: 0x0000080000000903
  rdi: 0x000070000f82a2a8  rsi: 0x00007f9a104adae0  rbp: 0x000070000f82a0e0  rsp: 0x000070000f82a0b0
   r8: 0x00007f99faaa95f8   r9: 0x0000000000000040  r10: 0x00007f99faaa95f0  r11: 0xffffffffffffffff
  r12: 0x0000000000000000  r13: 0x0000000000000000  r14: 0x000070000f82a2a8  r15: 0x00007f9a104adae0
  rip: 0x0000000120952ad0  rfl: 0x0000000000010206  cr2: 0x0000000214017000
  
Logical CPU:     0
Error Code:      0x00000000
Trap Number:     13

@llv22
Copy link

llv22 commented Aug 5, 2018

@smoothdvd, I think you're using incorrect version of Xcode. Do you use xcode 8.3.x for building tensorflow? if not, probably it's the issue.

@shumink
Copy link

shumink commented Aug 10, 2018

I am having the same CUDA_ERROR_OUT_OF_MEMORY warning as well.. I'm building tensorflow with XCode 8.3.2

@teddymacn
Copy link

https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz

The file above returns 404, compilation failure coz of this, any idea?

@augustye
Copy link

augustye commented Sep 2, 2018

same problem. It seems https://github.com/dtrebbien is gone

@Ciyou
Copy link

Ciyou commented Sep 2, 2018

same problem with @teddymacn. https://github.com/dtrebbien is gone.
Did anyone figure out how to fix it?

@antoniopioricciardi
Copy link

I solved by directly installing the pre-compiled wheel from here:
https://storage.googleapis.com/74thopen/tensorflow_osx/index.html

@shevious
Copy link

shevious commented Oct 9, 2018

Great guide!
I've compiled and ran several examples successfully including mnist-cnn.

The missing dtrebbien's repo can be replaced by:

@@ -330,11 +330,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
   tf_http_archive(
       name = "protobuf_archive",
       urls = [
-          "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
-          "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
+          "https://mirror.bazel.build/github.com/dinever/protobuf/archive/188578878eff18c2148baba0e116d87ce8f49410.tar.gz",
+          "https://github.com/dinever/protobuf/archive/188578878eff18c2148baba0e116d87ce8f49410.tar.gz",
       ],
-      sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
-      strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
+      sha256 = "7a1d96ccdf7131535828cad737a76fd65ed766e9511e468d0daa3cc4f3db5175",
+      strip_prefix = "protobuf-188578878eff18c2148baba0e116d87ce8f49410",
   )

I've build with the following differences in addition to the above repo:

  • bazel 0.16.1 (bazel 0.17.x produced compile-error.)
  • xcode 8.3.3 (xcode9.2 produced the same SEGFAULT as @smoothdvd .)
  • macOS 10.13.6
  • python 3.6.5_1 (with brew)

@markedphillips
Copy link

I applied the above patch and I am still getting the error -
bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior. INFO: Build options have changed, discarding analysis cache. ERROR: /private/tmp/tensorflow-gpu-macosx/tensorflow/tools/pip_package/BUILD:166:1: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz, https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz] to /private/var/tmp/_bazel_mark.phillips/4d87763e897639d9afd39b187c93d110/external/protobuf_archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz: All mirrors are down: [GET returned 404 Not Found] and referenced by '//tensorflow/tools/pip_package:build_pip_package' ERROR: /private/tmp/tensorflow-gpu-macosx/tensorflow/tools/pip_package/BUILD:166:1: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz, https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz] to /private/var/tmp/_bazel_mark.phillips/4d87763e897639d9afd39b187c93d110/external/protobuf_archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz: All mirrors are down: [GET returned 404 Not Found] and referenced by '//tensorflow/tools/pip_package:build_pip_package' ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz, https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz] to /private/var/tmp/_bazel_mark.phillips/4d87763e897639d9afd39b187c93d110/external/protobuf_archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz: All mirrors are down: [GET returned 404 Not Found] INFO: Elapsed time: 2.659s INFO: 0 processes. FAILED: Build did NOT complete successfully (3 packages loaded) currently loading: tensorflow

@markedphillips
Copy link

I am getting this error:

WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files: /private/tmp/tensorflow/tools/bazel.rc Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=1 --terminal_columns=202 INFO: Reading rc options for 'clean' from /private/tmp/tensorflow/.tf_configure.bazelrc: Inherited 'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python --action_env PYTHON_LIB_PATH=/Library/Python/2.7/site-packages --force_python=py2 --host_force_python=py2 --python_path=/usr/bin/python --action_env TF_NEED_OPENCL_SYCL=0 --action_env TF_NEED_CUDA=1 --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda --action_env TF_CUDA_VERSION=9.1 --action_env CUDNN_INSTALL_PATH=/usr/local/cuda --action_env TF_CUDNN_VERSION=7 --action_env TF_CUDA_COMPUTE_CAPABILITIES=6.1,3.0 --action_env LD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib --action_env TF_CUDA_CLANG=0 --action_env GCC_HOST_COMPILER_PATH=/usr/bin/gcc --config=cuda --define grpc_no_ares=true --copt=-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK --host_copt=-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK ERROR: Config value cuda is not defined in any .rc file INFO: Invocation ID: 585a36cb-27ad-40ec-b6a5-625b9e77638a

Resolved with rechecking my xcode and python versions...moving on...

@farismismar
Copy link

I applied the above patch and I am still getting the error -
bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior. INFO: Build options have changed, discarding analysis cache. ERROR: /private/tmp/tensorflow-gpu-macosx/tensorflow/tools/pip_package/BUILD:166:1: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz, https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz] to /private/var/tmp/_bazel_mark.phillips/4d87763e897639d9afd39b187c93d110/external/protobuf_archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz: All mirrors are down: [GET returned 404 Not Found] and referenced by '//tensorflow/tools/pip_package:build_pip_package' ERROR: /private/tmp/tensorflow-gpu-macosx/tensorflow/tools/pip_package/BUILD:166:1: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz, https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz] to /private/var/tmp/_bazel_mark.phillips/4d87763e897639d9afd39b187c93d110/external/protobuf_archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz: All mirrors are down: [GET returned 404 Not Found] and referenced by '//tensorflow/tools/pip_package:build_pip_package' ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: error loading package 'tensorflow': Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz, https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz] to /private/var/tmp/_bazel_mark.phillips/4d87763e897639d9afd39b187c93d110/external/protobuf_archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz: All mirrors are down: [GET returned 404 Not Found] INFO: Elapsed time: 2.659s INFO: 0 processes. FAILED: Build did NOT complete successfully (3 packages loaded) currently loading: tensorflow

Same here. The repo https://github.com/dtrebbien/protobuf/ is no longer available. I think the patch file needs to be updated. Until then, I used the .whl pre-compiled file, and it is working for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment