Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save edc/57d125e50c4941bc8750691a4285d3ce to your computer and use it in GitHub Desktop.
Save edc/57d125e50c4941bc8750691a4285d3ce to your computer and use it in GitHub Desktop.
Install TensorFlow 1.8 on macOS High Sierra 10.13.6 with CUDA

Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.6

Forked from @William-Zhang's gist

Requirements

  • NVIDIA Web-Drivers 387.10.10.10.30.106 for 10.13.4 (17E199) (w/o Security Update)
  • CUDA-Drivers 387.128
  • CUDA 9.1 Toolkit
  • cuDNN 7.0.5 (latest for macOS)
  • NCCL 2.1.15 (latest for macOS)
  • Python 2.7
  • XCode 8.2
  • bazel stable 0.14.0 (latest on HomeBrew)
  • Tensorflow 1.8 Source Code

eGPU Only

Checkout eGPU setup before install (required for eGPU, ignore if other)

I used the eGPU setup called purge-wrangler. My setup is 2015 MacBook Pro 15".

Install Homwbrew (Optional)

For package management, ignore if you have your own python, wget or you want to download manually.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install wget

NVIDIA Graphics driver

Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us

NVIDIA Cuda driver

Download and install from http://www.nvidia.com/object/macosx-cuda-387.178-driver.html

Install XCode 8.2

Download and from XCode_8.2.xip. Or Find XCode 8.2 on https://developer.apple.com/download/more/

Unarchive and rename XCode.app to Xcode8.2.app into /Applications, and then use xcode-select to select it as the active Xcode version.

Install Bazel

If you have Homebrew installed

brew install bazel

Install CUDA Toolkit 9.1

Download CUDA-9.1

It should be something along the lines of cuda_9.1.128_mac.dmg

Install NCCL

Download NCCL 2.1.15 O/S agnostic and CUDA 9 from NVdia.

Unarchive it and move to a permanant place e.g. /usr/local/nccl.

sudo mkdir -p /usr/local/nccl
cd nccl_2.1.15-1+cuda9.1_x86_64
sudo mv * /usr/local/nccl
sudo mkdir -p /usr/local/include/third_party/nccl
sudo ln -s /usr/local/nccl/include/nccl.h /usr/local/include/third_party/nccl

Set up your env paths

Edit ~/.bash_profile and add the following:

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib 
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin

Compile Samples

We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.

cd /Developer/NVIDIA/CUDA-9.1/samples
chown -R $(whoami) *
make -C 1_Utilities/deviceQuery
./bin/x86_64/darwin/release/deviceQuery
 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11264 MBytes (11810963456 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1645 MHz (1.64 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 196 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS

NVIDIA cuDNN - Deep Learning Primitives

If not already done, register at https://developer.nvidia.com/cudnn Download cuDNN 7.0.5

Change into your download directory and follow the post installation steps.

tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*

Install mock and wheel for python 2.7

I just use easy_install since it comes with the system. Before you use easy_install, make sure it's from the same directory that python is from.

$ which python
/usr/bin/python
$ which easy_install
/usr/bin/easy_install
$ sudo easy_install mock wheel

Compile

Clone TensorFlow from Repository

cd /tmp
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v1.8.0

Apply Patch

Apply the following patch to fix a couple build issues:

wget https://gist.githubusercontent.com/edc/57d125e50c4941bc8750691a4285d3ce/raw/xtensorflow18macos.patch
git apply xtensorflow18macos.patch

Configure Build

Except CUDA support, CUDA SDK version and Cuda compute capabilities, I left the other settings untouched.

Pay attension to Cuda compute capabilities, you might want to find your own according to guide.

./configure
You have bazel 0.10.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /Library/Python/2.7/site-packages
Please input the desired Python library path to use.  Default is [/Library/Python/2.7/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]:
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]:
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]:
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]:
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]:
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1


Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] (type your own, check on https://developer.nvidia.com/cuda-gpus, mine is 6.1 for GTX 1080 Ti)


Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

Build Process

bazel clean
bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

The build will fail because of some issue with proto_buffer. Edit the file at ``, and delete the CreateMessageInternal definition from line 868 to 873:

868   template <typename T, typename Arg> GOOGLE_PROTOBUF_ATTRIBUTE_ALWAYS_INLINE
869   T* CreateMessageInternal(const Arg& arg) {
870     return InternalHelper<T>::Construct(
871         AllocateInternal<T>(InternalHelper<T>::is_destructor_skippable::value),
872         this, arg);
873   }

Now run the bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package again.

Create wheel file and install it

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
ls /tmp/tensorflow_pkg
tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

If you want to use virtualenv or something, now is the time. Or just:

pip install /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

Fix @rpath issue

Simply symlink all libcu*.dylib files from /usr/local/cuda/lib to $VIRTUALENV/lib/python2.7/site-packages/tensorflow

Fix NCCL issue

If you encounter -ncclAllReduce issue:

Download file here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/nccl/kernels/nccl_ops.cc Execute: gcc -c -fPIC nccl_ops.cc -o hello_world.o Execute: gcc hello_world.o -shared -o _nccl_ops.so Replace the file at Path: $VIRTUALENV/lib/python2.7/site-packages/tensorflow/contrib/nccl/python/ops/

h/t to @Orang-utan

Backup your wheel if nothing goes wrong (Optional)

Files in /tmp would be cleaned after reboot.

cp /tmp/tensorflow_pkg/*.whl ~/

It's useful to leave the .whl file lying around in case you want to install it for another environment.

Test Installation

See if everything got linked correctly

cd ~
python
>>> import tensorflow as tf
>>> tf.Session()
2018-04-08 03:25:15.740635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-08 03:25:15.741260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:c4:00.0
totalMemory: 11.00GiB freeMemory: 10.18GiB
2018-04-08 03:25:15.741288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-08 03:25:16.157590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-08 03:25:16.157614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-08 03:25:16.157620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-08 03:25:16.157753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9849 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
<tensorflow.python.client.session.Session object at 0x10968ef60>
Try out new Tensorflow feature (Optional)
python
import tensorflow as tf
tf.enable_eager_execution()
tf.executing_eagerly()        # => True

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))  # => "hello, [[4.]]"

Test GPU Acceleration

pip install keras
wget https://gist.githubusercontent.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py
python mnist_cnn.py
/usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-05-11 04:51:10.335377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-05-11 04:51:10.336052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:c4:00.0
totalMemory: 11.00GiB freeMemory: 9.37GiB
2018-05-11 04:51:10.336075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-11 04:51:11.063831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-11 04:51:11.063856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-05-11 04:51:11.063864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-05-11 04:51:11.064768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9065 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
2018-05-11 04:51:11.534095: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-05-11 04:51:11.579370: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-05-11 04:51:11.644835: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
59264/60000 [============================>.] - ETA: 0s - loss: 0.2604 - acc: 0.92082018-05-11 04:51:19.228205: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
60000/60000 [==============================] - 10s 159us/step - loss: 0.2588 - acc: 0.9213 - val_loss: 0.0561 - val_acc: 0.9829
Epoch 2/12
60000/60000 [==============================] - 4s 66us/step - loss: 0.0875 - acc: 0.9742 - val_loss: 0.0427 - val_acc: 0.9857
Epoch 3/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0662 - acc: 0.9803 - val_loss: 0.0356 - val_acc: 0.9875
Epoch 4/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0549 - acc: 0.9839 - val_loss: 0.0325 - val_acc: 0.9896
Epoch 5/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0471 - acc: 0.9859 - val_loss: 0.0309 - val_acc: 0.9901
Epoch 6/12
60000/60000 [==============================] - 4s 68us/step - loss: 0.0421 - acc: 0.9873 - val_loss: 0.0297 - val_acc: 0.9903
Epoch 7/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0377 - acc: 0.9884 - val_loss: 0.0259 - val_acc: 0.9908
Epoch 8/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0357 - acc: 0.9883 - val_loss: 0.0285 - val_acc: 0.9908
Epoch 9/12
60000/60000 [==============================] - 4s 68us/step - loss: 0.0315 - acc: 0.9904 - val_loss: 0.0327 - val_acc: 0.9901
Epoch 10/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0288 - acc: 0.9910 - val_loss: 0.0272 - val_acc: 0.9911
Epoch 11/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0282 - acc: 0.9912 - val_loss: 0.0248 - val_acc: 0.9920
Epoch 12/12
60000/60000 [==============================] - 4s 66us/step - loss: 0.0255 - acc: 0.9923 - val_loss: 0.0283 - val_acc: 0.9912
Test loss: 0.028254894825743667
Test accuracy: 0.9912

You can use cuda-smi to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however.

$ cuda-smi
Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 10350 of 11264 MB (i.e. 91.9%) Free
# when GPU
$ cuda-smi
Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 1181.1 of 11264 MB (i.e. 10.5%) Free

Tested on a MacBook Pro (15-inch, 2016) 10.13.4 (17E199) 2.7 GHz Intel Core i7 and NVIDIA GeForce GTX 1080 Ti 11 GiB

diff --git a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
index 0f7adaf24a..934ccbada6 100644
--- a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
+++ b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
@@ -69,7 +69,7 @@ __global__ void concat_variable_kernel(
IntType num_inputs = input_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
diff --git a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
index 94989089ec..1d26d4bacb 100644
--- a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
@@ -172,7 +172,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
assert(CanLaunchDepthwiseConv2dGPUSmall(args));
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -452,7 +452,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
assert(CanLaunchDepthwiseConv2dGPUSmall(args));
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1118,7 +1118,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.z));
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1388,7 +1388,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.x));
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
diff --git a/tensorflow/core/kernels/split_lib_gpu.cu.cc b/tensorflow/core/kernels/split_lib_gpu.cu.cc
index 393818730b..58a1294005 100644
--- a/tensorflow/core/kernels/split_lib_gpu.cu.cc
+++ b/tensorflow/core/kernels/split_lib_gpu.cu.cc
@@ -121,7 +121,7 @@ __global__ void split_v_kernel(const T* input_ptr,
int num_outputs = output_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl
index 2a37c65bc7..43446dd99b 100644
--- a/third_party/gpus/cuda/BUILD.tpl
+++ b/third_party/gpus/cuda/BUILD.tpl
@@ -110,7 +110,7 @@ cc_library(
".",
"cuda/include",
],
- linkopts = ["-lgomp"],
+ #linkopts = ["-lgomp"],
linkstatic = 1,
visibility = ["//visibility:public"],
)
@markedphillips
Copy link

I am getting the following error when I am trying to do the bazel build -

WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/private/tmp/tensorflow/tools/bazel.rc
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=193
INFO: Reading rc options for 'clean' from /private/tmp/tensorflow/.tf_configure.bazelrc:
  Inherited 'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python --action_env PYTHON_LIB_PATH=/Library/Python/2.7/site-packages --force_python=py2 --host_force_python=py2 --python_path=/usr/bin/python --action_env TF_NEED_OPENCL_SYCL=0 --action_env TF_NEED_CUDA=1 --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda --action_env TF_CUDA_VERSION=9.1 --action_env CUDNN_INSTALL_PATH=/usr/local/cuda --action_env TF_CUDNN_VERSION=7 --action_env TF_CUDA_COMPUTE_CAPABILITIES=6.1,3.0 --action_env TF_CUDA_CLANG=0 --action_env GCC_HOST_COMPILER_PATH=/usr/bin/gcc --config=cuda --define grpc_no_ares=true --copt=-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK --host_copt=-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
ERROR: Config value cuda is not defined in any .rc file
INFO: Invocation ID: 8ac66600-c480-4d3c-acae-5e0a186225f3

Do you know what is going on?

@markedphillips
Copy link

I also am unclear on where to make this change in the build section - The build will fail because of some issue with proto_buffer. Edit the file at ``, and delete the CreateMessageInternal definition from line 868 to 873 - thank you.

@pylearndl
Copy link

Hi, I'm getting below error. I have no clue what is wrong and where. Please help.

Aagnyas-Air:Nails Aagnya$ python3 custom.py train --dataset=customImages --weights=coco
Using TensorFlow backend.
Weights: coco
Dataset: customImages
Logs: /Users/Aagnya/Documents/AppCodes/Applications/logs

Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 2
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.9
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 2
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 14
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME nail
NUM_CLASSES 2
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 100
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001

Loading weights /Users/Aagnya/Documents/AppCodes/Applications/mask_rcnn_coco.h5
2019-01-06 22:25:18.813208: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2019-01-06 22:25:18.814461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1070 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:c3:00.0
totalMemory: 8.00GiB freeMemory: 7.24GiB
2019-01-06 22:25:18.814514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-06 22:25:20.799216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-06 22:25:20.799247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-06 22:25:20.799256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-06 22:25:20.801938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6990 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:c3:00.0, compute capability: 6.1)
2019-01-06 22:25:20.804368: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 6.83G (7329701632 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-01-06 22:25:22.579117: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2019-01-06 22:25:23.544083: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2019-01-06 22:25:25.234273: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
Training network heads
Traceback (most recent call last):
File "custom.py", line 365, in
train(model)
File "custom.py", line 201, in train
layers='heads')
File "/Users/Aagnya/Documents/AppCodes/Applications/untitled folder/Nails/mrcnn/model.py", line 2348, in train
histogram_freq=0, write_graph=True, write_images=False),
File "/usr/local/lib/python3.6/site-packages/keras/callbacks.py", line 745, in init
from tensorflow.contrib.tensorboard.plugins import projector
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/init.py", line 36, in
from tensorflow.contrib import distribute
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/distribute/init.py", line 22, in
from tensorflow.contrib.distribute.python.cross_tower_ops import *
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/cross_tower_ops.py", line 23, in
from tensorflow.contrib.distribute.python import cross_tower_utils
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/cross_tower_utils.py", line 23, in
from tensorflow.contrib import nccl
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/init.py", line 30, in
from tensorflow.contrib.nccl.python.ops.nccl_ops import all_max
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/nccl_ops.py", line 30, in
resource_loader.get_path_to_datafile('_nccl_ops.so'))
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/util/loader.py", line 56, in load_op_library
ret = load_library.load_op_library(path)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/_nccl_ops.so, 6): Symbol not found: _ncclAllReduce
Referenced from: /usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/_nccl_ops.so
Expected in: flat namespace
in /usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/_nccl_ops.so
Aagnyas-Air:Nails Aagnya$

@hvico
Copy link

hvico commented May 30, 2019

I also am unclear on where to make this change in the build section - The build will fail because of some issue with proto_buffer. Edit the file at ``, and delete the CreateMessageInternal definition from line 868 to 873 - thank you.

I tried to remove that block from arena.h (which it seems to be the file the produces this error), no luck. What it worked for me was to replace the block that defines protobuf-archive in tensorflow/tensorflow/workspace.bzl to this:

tf_http_archive(
name = "protobuf_archive",
urls = [
"https://mirror.bazel.build/github.com/dinever/protobuf/archive/188578878eff18c2148baba0e116d87ce8f49410.tar.gz",
"https://github.com/dinever/protobuf/archive/188578878eff18c2148baba0e116d87ce8f49410.tar.gz",
],
sha256 = "7a1d96ccdf7131535828cad737a76fd65ed766e9511e468d0daa3cc4f3db5175",
strip_prefix = "protobuf-188578878eff18c2148baba0e116d87ce8f49410",
)

After that I was able to build. Ref: tensorflow/tensorflow#17067

@SanderNotenbaert
Copy link

Hi, thanks a lot for the guide! I'm pretty new to all of this so forgive my ignorance. I eventually managed to make a build (after many hurdles) with your guide. The build gives me a wheel named tensorflow-1.8.0-cp27-cp27m-macosx_10_13_x86_64.whl, which would make sense I suppose given that it was built in python 2.7. However yours seems to be tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl, does that mean yours could be installed on python 3.6? Is that possible? How did you get there? Did I miss something?

@edc
Copy link
Author

edc commented Jul 14, 2020

@ SanderNotenbaert Sorry this gist was from over 2 years ago, and I don't have access to the same computer any more. My guess is that it was detecting the default python version.

@SanderNotenbaert
Copy link

@ SanderNotenbaert Sorry this gist was from over 2 years ago, and I don't have access to the same computer any more. My guess is that it was detecting the default python version.

So building it with the default python version, but with the Python library of 2.7? I'm going to give that a go. Thanks for answering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment