Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Build TensorFlow 1.3 with SSE4.1/SSE4.2/AVX/AVX2/FMA and NVIDIA CUDA support on macOS Sierra 10.12

Build TensorFlow 1.3 with SSE4.1/SSE4.2/AVX/AVX2/FMA and NVIDIA CUDA support on macOS Sierra 10.12 (updated October 5, 2017)

These instructions were inspired by Mistobaan's gist, ageitgey's gist, and mattiasarro's tutorial.

Background

I always encountered the following warnings when running my scripts using the precompiled TensorFlow Python package:

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

I realized I can make these warnings go away by compiling from source, in addition to improve training speed. It was not as easy and straightforward as I thought, but I finally succeeded in creating a working build. Here I outline the steps I took, in the hopes it may benefit those who have encountered similar challenges.

Machine setup

Hardware

  • Model: MacBook Pro (Retina, 15-inch, Mid 2014)
  • Processor: 2.5 GHz Intel Core i7
  • Memory: 16 GB 1600 MHz DDR3
  • Graphics: Intel Iris Pro 1536 MB RAM + NVIDIA GeForce GT 750M 2048 MB RAM

Software

  • OS: macOS Sierra 10.12.6
  • TensorFlow version: 1.3.1
  • Python version: 3.6.2 (conda)
  • Bazel version: 0.6.0-homebrew
  • CUDA/cuDNN version: 8.0/6.0

Prerequisites

macOS Sierra (10.12)

I tested on macOS Sierra 10.12. It may also work on Yosemite (10.10) and El Capitan (10.11), but I have not verified.

Xcode Command-Line Tools

I successfully compiled using Xcode 8.2.1 (Refer to http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html#system-requirements).

Disable SIP (System Integrity Protection) on Mac

For some reason I had to disable SIP in order for bazel build to build the TensorFlow pip package successfully. For security reasons, remember to re-enable SIP after your build.

Steps

Note: Many steps were based on https://www.tensorflow.org/install/install_sources ; I just happened to have a slightly different order that worked out for me.

  • Install homebrew
  • Install bazel
  • Install conda (I wanted a Python environment that will not mess with system Python. I downloaded Miniconda for Python 2.7 and intended to create a Python 3.6 environment)
  • Create and activate Python 3.6 environment
    conda create --name compiletf python=3
    # wheel 0.29.0 will already be installed
    source activate compiletf
    conda install numpy six
    # numpy 1.13.1 and six 1.10.0 will have been installed
    Alternatively, you can do:
    conda create --name compiletf python=3 anaconda
    # numpy 1.12.1, six 1.10.0, and wheel 0.29.0 will already be installed
    source activate compiletf
    conda update numpy
    # numpy 1.13.1 will have been installed
  • Verify that the following packages are installed:
    • six
    • numpy
    • has to be at least 1.13 so you don't get a ModuleNotFoundError: No module named 'numpy.lib.mixins' error later on during bazel build
    • wheel
  • Install CUDA support prerequisites
    • Install GNU coreutils and swig
      brew install coreutils swig
    • Refer to this for more detailed CUDA installation instructions.
    • Install CUDA Toolkit 8.0
    • Install cudNN 6.0
    • Set environment variable DYLD_LIBRARY_PATH
      export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
  • Clone the TensorFlow repository (instructions): be sure to checkout the r1.3 release
    git clone https://github.com/tensorflow/tensorflow
    cd tensorflow
    git checkout r1.3
  • Configure the installation
    bazel clean
    ./configure
    My configure settings (Enter N for CUDA support if you do not want CUDA support or do not have a NVIDIA GPU):
    Please specify the location of python. [Default is /Users/phil.wee/miniconda2/envs/compiletf/bin/python]:
    Found possible Python library paths:
      /Users/phil.wee/miniconda2/envs/compiletf/lib/python3.6/site-packages
    Please input the desired Python library path to use.  Default is [/Users/phil.wee/miniconda2/envs/compiletf/lib/python3.6/site-packages]
    
    Using python library path: /Users/phil.wee/miniconda2/envs/compiletf/lib/python3.6/site-packages
    Do you wish to build TensorFlow with MKL support? [y/N]
    No MKL support will be enabled for TensorFlow
    Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
    Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
    No Google Cloud Platform support will be enabled for TensorFlow
    Do you wish to build TensorFlow with Hadoop File System support? [y/N]
    No Hadoop File System support will be enabled for TensorFlow
    Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
    No XLA support will be enabled for TensorFlow
    Do you wish to build TensorFlow with VERBS support? [y/N]
    No VERBS support will be enabled for TensorFlow
    Do you wish to build TensorFlow with OpenCL support? [y/N]
    No OpenCL support will be enabled for TensorFlow
    Do you wish to build TensorFlow with CUDA support? [y/N] Y
    CUDA support will be enabled for TensorFlow
    Do you want to use clang as CUDA compiler? [y/N]
    nvcc will be used as CUDA compiler
    Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]:
    Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
    Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
    Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]:
    Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
    ./configure: line 669: /usr/local/cuda/extras/demo_suite/deviceQuery: No such file or directory
    Please specify a list of comma-separated Cuda compute capabilities you want to build with.
    You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
    Please note that each additional compute capability significantly increases your build time and binary size.
    [Default is: "3.5,5.2"]: 3.0
    Do you wish to build TensorFlow with MPI support? [y/N]
    MPI support will not be enabled for TensorFlow
    Configuration finished
    
  • Comment out linkopts = ["-lgomp"], (line 112) in tensorflow/third_party/gpus/cuda/BUILD.tpl (Refer to https://medium.com/@mattias.arro/installing-tensorflow-1-2-from-sources-with-gpu-support-on-macos-4f2c5cab8186)
  • Build the pip package (reference: https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions). It took around 35 minutes on my MacBook Pro.
    export CUDA_HOME=/usr/local/cuda
    export DYLD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib
    export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
    bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.1 --copt=-msse4.2 --config=cuda --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH --verbose_failures -k //tensorflow/tools/pip_package:build_pip_package
  • Refer to tensorflow/tensorflow#6729 if you run into any other problems
  • Build the wheel (.whl) file
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
  • Install the pip package
    pip install --upgrade --ignore-installed /tmp/tensorflow_pkg/tensorflow-1.3.1-cp36-cp36m-macosx_10_7_x86_64.whl
  • Validate your installation (instructions)
    • Change directory to any directory on your system other than the tensorflow subdirectory from which you ran ./configure
      cd ~
    • Invoke python interactive shell
      python
    • Type in the following script
      import tensorflow as tf
      with tf.device('/gpu:0'):
          a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
          b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
          c = tf.matmul(a, b)
      
      with tf.Session() as sess:
          print (sess.run(c))
      If you have a supported NVIDIA CUDA GPU, the script should run without a problem and display something similar to this:
      2017-10-05 22:22:27.025606: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] OS X does not support NUMA - returning NUMA node zero
      2017-10-05 22:22:27.025798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
      name: GeForce GT 750M
      major: 3 minor: 0 memoryClockRate (GHz) 0.9255
      pciBusID 0000:01:00.0
      Total memory: 2.00GiB
      Free memory: 873.57MiB
      2017-10-05 22:22:27.025819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
      2017-10-05 22:22:27.025826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y
      2017-10-05 22:22:27.025842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
      [[ 22.  28.]
       [ 49.  64.]]
      

Have fun training your models!

@andrescabana86
Copy link

andrescabana86 commented Sep 4, 2017

I Comment out linkopts = ["-lgomp"], (line 112) in tensorflow/third_party/gpus/cuda/BUILD.tpl but still getting:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.1 --copt=-msse4.2 --config=cuda --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH -k //tensorflow/tools/pip_package:build_pip_package ERROR: Inconsistent crosstool configuration; no toolchain corresponding to 'local_darwin' found for cpu 'darwin'. INFO: Elapsed time: 20,340s

Can you help me please

@philster
Copy link
Author

philster commented Sep 7, 2017

@andrescabana86 can you try the suggested solutions from tensorflow/tensorflow#9072 ? Btw what bazel version do you use?

Also instead of using clang, use /usr/bin/gcc as compiler (https://stackoverflow.com/a/45509798) as your compiler will actually use Clang that comes with macOS/Xcode.

@arvindnrbt
Copy link

arvindnrbt commented Apr 12, 2018

1.3 did not work for me because of problems with newer bazel versions. For anyone who is trying this out, try git checkout r1.8 which is the latest version as of the day I write this comment. Then it worked like a charm. Thanks for the Gist. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment