Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Build Tensorflow from source, for better performance

Building Tensorflow from source on Linux for maximum performance:

TensorFlow is now distributed under an Apache v2 open source license on GitHub.

Step 1. Install NVIDIA CUDA:

To use TensorFlow with NVIDIA GPUs, the first step is to install the CUDA Toolkit.

Step 2. Install NVIDIA cuDNN:

Once the CUDA Toolkit is installed, download cuDNN v5.1 Library for Linux (note that you will need to register for the Accelerated Computing Developer Program).

Once downloaded, uncompress the files and copy them into the CUDA Toolkit directory (assumed here to be in /usr/local/cuda/):

$ sudo tar -xvf cudnn-8.0-* -C /usr/local

Step 3. Install and upgrade PIP:

Here. we are using a custom built Python binary, loaded via the modules system. We will handle its' installation from there.

TensorFlow itself can be installed using the pip package manager. First, make sure that your system has pip installed and updated:

$ sudo apt-get install python-pip python-dev
$ pip install --upgrade pip

Step 4. Install Bazel:

To build TensorFlow from source, the Bazel build system must first be installed as follows.

$ sudo apt-get install software-properties-common swig
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ echo "deb http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install bazel

Step 5. Install TensorFlow

To obtain the best performance with TensorFlow we recommend building it from source.

First, clone the TensorFlow source code repository:

$ git clone https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ git reset --hard a23f5d7 

Then run the configure script as follows:

$ ./configure

Output:

Please specify the location of python. [Default is /usr/bin/python]: [enter]
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]: [enter]
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: [enter]
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: [enter]
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 5.2,6.1 [see https://developer.nvidia.com/cuda-gpus]
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
Configuration finished

Then call bazel to build the TensorFlow pip package:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda //tensorflow/tools/pip_package:build_pip_package


bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

This will build the package with optimizations for FMA, AVX and SSE.

And finally install the TensorFlow pip package

For Python 2.7:

$ sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-*.whl

Python 3.4:

$ sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-*.whl

Step 5. Upgrade protobuf:

Upgrade to the latest version of the protobuf package:

For Python 2.7:

$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl

For Python 3.4:

$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp34-none-linux_x86_64.whl

Step 6. Test your installation:

To test the installation, open an interactive Python shell and import the TensorFlow module:

$ cd
$ python

import tensorflow as tf tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

With the TensorFlow module imported, the next step to test the installation is to create a TensorFlow Session, which will initialize the available computing devices and provide a means of executing computation graphs:

>>> sess = tf.Session()

This command will print out some information on the detected hardware configuration. For example, the output on a system containing a Tesla M40 GPU is:

>>> sess = tf.Session()
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:04:00.0
Total memory: 11.25GiB
Free memory: 11.09GiB

To manually control which devices are visible to TensorFlow, set the CUDA_VISIBLE_DEVICES environment variable when launching Python. For example, to force the use of only GPU 0:

$ CUDA_VISIBLE_DEVICES=0 python

You should now be able to run a Hello World application:

>>> hello_world = tf.constant("Hello, TensorFlow!")
>>> print sess.run(hello_world)
Hello, TensorFlow!
>>> print sess.run(tf.constant(123)*tf.constant(456))
56088

Nachiket2605 commented Jun 8, 2017

Hi, thanks a lot for the steps. I used expunge_async for the configuration which speeds up the process quite a bit. Unfortunately post configuration, I am taking way too long for the bagel to perform the build. Any suggestions on that?
Thanks!

Owner

Brainiarc7 commented Jun 12, 2017

Hey there @Nachiket2605,

That's most likely dependent on your storage type (SSD vs HDD) and the I/O scheduler in use.

Also, what are the specifications for the build machine?

When a new version of Tensorflow/CUDA/cudnn comes out do you have to rebuild everything again?

Owner

Brainiarc7 commented Jun 26, 2017

Hello there,

You'd have to rebuild from source again.

Remember that the binary versions are compiled with generic optimization flags, a case that produces unoptimized binaries.

So, if you;re on Skylake, for example, you won't get AVX - enabled builds, for example.

I ran the ./configure step and hit this error from bazel:
WARNING: Output base '/usr2/username/.cache/bazel/_bazel_username/920d9f5954f35059efc25c5cf6e13b64' is on NFS. This may lead to surprising failures and undetermined behavior.
Extracting Bazel installation...

Googled online pointed out that if I can pass the following option, it would solve the problem: --output_user_root=/local/mnt/workspace/tmp/bazel/

Do you know how could I pass the --output_user_root option to bazel when calling the ./configure?

Thanks
RLE

Owner

Brainiarc7 commented Aug 5, 2017

Hello there,

It'd be ideal to pass that to the Bazel build step, unless its' failing at the ./configure phase:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda //tensorflow/tools/pip_package:build_pip_package --output_user_root={your_path}

Hi, I'm wondering how much improvement over official Tensorflow release with GPU support you can get by doing this ? Why would this be better than the official release ?

Owner

Brainiarc7 commented Oct 2, 2017

This build improves on supporting native processor instructions on your platform.

The default build does not, unless built from source.

Hi,

I got this error while running ./configure file :
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
........
ERROR:

/home/hamadazahera/.cache/bazel/_bazel_hamadazahera/1d40efb9cd64d5ba27f112122670e970/external/io_bazel_rules_closure/closure/private/defs.bzl:27:16: The set constructor for depsets is deprecated and will be removed. Please use the depset constructor instead. You can temporarily enable the deprecated set constructor by passing the flag --incompatible_disallow_set_constructor=false
ERROR: error loading package '': Extension file 'closure/private/defs.bzl' has errors
ERROR: error loading package '': Extension file 'closure/private/defs.bzl' has errors
Building: no action running


Could you please help me to fix this error ? I am using Bazel 0.6.1 on Ubuntu 17

wedesoft commented Nov 3, 2017

Thanks a lot. That should be the Tensorflow main page.

Owner

Brainiarc7 commented Nov 20, 2017

You're welcome, @wedesoft

Owner

Brainiarc7 commented Nov 20, 2017

@hamadazahera,

Consider re-fetching your source.

Owner

Brainiarc7 commented Nov 23, 2017

Thanks @wedesoft

Great tutorial, @Brainiarc7. Just take a look at the formatting in step 6 because it is a bit misconfigured.

RyanRosario commented Dec 15, 2017

Heads up for others. I had a lot of problems using bazel to build TensorFlow. No matter what I did, it seemed that bazel could not find libcublas.so.9.0.

I did two things at the same time and unfortunately I do not know which one solved it. Pretty sure it was:
sudo echo "/usr/local/cuda-9.0/lib64" > /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

from https://devtalk.nvidia.com/default/topic/845363/libcublas-so-7-0-cannot-open-shared-object-file/

I also ran bazel with sudo. One of them made it work.

Owner

Brainiarc7 commented Dec 16, 2017

Hello @RyanRosario,

That step you mentioned is mandatory for all projects that require access to the CUDA shared libraries and runtimes, and should be a part of installing the CUDA SDK documentation.

On distributions where a repository based installation method is provided, that step may even be done for you automatically. Ubuntu, for one, does this. However, with manual installation methods, be it from the binary installer for a single user or for multiple users on a HPC cluster (for loading via environment-modules , that step is mandatory.

MarkSonn commented Jan 2, 2018

Thanks for the great read @Brainiarc7

I was just wondering how you find out what the best Bazel (or other) optimisations are for any given hardware setup. Also, are these optimisations just for the CPU or also for other components such as the GPU too?

Here's my system:

  • GPU: 2x NVIDIA GTX 1080 Ti
  • CPU: Intel i7-6850K
  • RAM: 64GB
  • OS: Ubuntu 16.04
Owner

Brainiarc7 commented Jan 5, 2018

Hello @MarkSonn,

These optimizations are to enable native instruction support for the compiler running on the host when building the project.

For example, with the stock binaries Google builds, they lack AVX support because their default build target is for a generic processor without these instructions. However, most modern CPUs beyond Skylake have support for AVX, etc, and depending on the workload type, binaries compiled with these instructions enabled will run faster.

For the GPU, it's ideal to target the SM level of your card's CUDA architectures, and to build as few of these PTX targets as possible.

Run the configuration as explained here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/install/install_sources.md

MarkSonn commented Jan 7, 2018

These optimizations are to enable native instruction support for the compiler running on the host when building the project.

binaries compiled with these instructions enabled will run faster.

This confused me a bit, do these optimisations speed up training/inference, or do they only affect the build time of Tensorflow with Bazel?


What are PTX targets? Are they the Bazel flags such as "--copt=-mavx" or the config options such as "Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n"?


it's ideal to target the SM level of your card's CUDA architectures

Also, how do you do this?


Sorry for asking so many questions, I really appreciate your help :)

Owner

Brainiarc7 commented Jan 8, 2018

Hey @MarkSonn,

These optimizations will speed up training performance. However, that also increases the initial build time. To what extent depends on your exact setup (Processor, compiler versions, etc).

CUDA PTX targets are dictated by options passed to the NVCC compiler, which are offered as part of the configuration step when building the project. Refer to this document for more details: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html

By default, NVCC will automatically pick up one PTX version targeted to your GPU's SM architecture, such as 6.1 for Pascal, etc. Only override this if you're building the target for a different host.

For further GCC optimizations (use with care), see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options

Owner

Brainiarc7 commented Jan 8, 2018

And for available PTX targets depending on your CUDA compiler and the installed GPU, please refer to NVIDIA's documentation here: https://developer.nvidia.com/cuda-gpus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment