Skip to content

Instantly share code, notes, and snippets.

@Brainiarc7
Last active July 29, 2023 21:28
Show Gist options
  • Save Brainiarc7/6d6c3f23ea057775b72c52817759b25c to your computer and use it in GitHub Desktop.
Save Brainiarc7/6d6c3f23ea057775b72c52817759b25c to your computer and use it in GitHub Desktop.
Build Tensorflow from source, for better performance on Ubuntu.

Building Tensorflow from source on Ubuntu 16.04LTS for maximum performance:

TensorFlow is now distributed under an Apache v2 open source license on GitHub.

On Ubuntu 16.04LTS+:

Step 1. Install NVIDIA CUDA:

To use TensorFlow with NVIDIA GPUs, the first step is to install the CUDA Toolkit as shown:

wget -c -v -nc https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.2.88-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu1604_9.2.88-1_amd64.deb

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

sudo apt-get update

sudo apt-get install cuda

Keep checking the NVIDIA CUDA webpage for new releases as applicable. This article is accurate as at the time of writing.

Ensure that you have the latest driver:


sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update && sudo apt-get -y upgrade

On Ubuntu 18.04LTS, this should be enough for the device driver:

sudo apt-get install nvidia-kernel-source-396 nvidia-driver-396

Failure to do this will result in a broken driver installation.

When done, create a library configuration file for cupti:

/etc/ld.so.conf.d/cupti.conf

With the content:

/usr/local/cuda/extras/CUPTI/lib64 

Confirm that the library configuration file for CUDA libraries also exists with the correct settings:

/etc/ld.so.conf.d/cuda.conf

The content should be:

/usr/local/cuda/lib64

When done, load the new configuration:

sudo ldconfig -vvvv

Useful environment variables for CUDA:

Edit the /etc/environment file and append the following:

CUDA_HOME=/usr/local/cuda

Now, append the PATH variable with the following:

/usr/local/cuda/bin:$HOME/bin

When done, remember to source the file:

source /etc/environment

You can also install CUDA manually. However, take care not to install its' bundled driver.

Step 2. Install NVIDIA cuDNN:

Once the CUDA Toolkit is installed, download the latest cuDNNN Library for Linux, based on the CUDA version you're using. In this case, we're on CUDA 9.1, so we will refer to the version name below (note that you will need to register for the Accelerated Computing Developer Program).

Once downloaded, uncompress the files and copy them into the CUDA Toolkit directory (assumed here to be in /usr/local/cuda/ for Ubuntu 16.04LTS):

$ sudo tar -xvf cudnn-9.1-* -C /usr/local

Step 3. Install and upgrade PIP:

TensorFlow itself can be installed using the pip package manager. First, make sure that your system has pip installed and updated:

$ sudo apt-get install python-pip python-dev
$ pip install --upgrade pip

Step 4. Install Bazel:

To build TensorFlow from source, the Bazel build system (and the latest available openjdk) must first be installed as follows.

$ sudo apt-get install software-properties-common swig
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install bazel

Step 5. Install TensorFlow

To obtain the best performance with TensorFlow we recommend building it from source.

First, clone the TensorFlow source code repository:

$ git clone https://github.com/tensorflow/tensorflow
$ cd tensorflow

The last step is no longer needed:

$ git reset --hard a23f5d7 

Then run the configure script as follows:

$ ./configure

Output:

Please specify the location of python. [Default is /usr/bin/python]: [enter]
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]: [enter]
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: [enter]
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: [enter]
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 5.2,6.1 [see https://developer.nvidia.com/cuda-gpus]
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
Configuration finished

Then call bazel to build the TensorFlow pip package:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

This will build the package with optimizations for FMA, AVX and SSE.

To build the C library as a tarball (which you can install as needed) with the optimizations above:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda //tensorflow/tools/lib_package:libtensorflow

Which should produce an archive in:

bazel-bin/tensorflow/tensorflow/tools/lib_package/libtensorflow.tar.gz

A stock build would be as such:

bazel build //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

You can use the stock build as shown above if you had passed the configuration flags (for optimization) directly to the configure script above. Use this string:

--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2

Which will replace --march=native (the default).

If you're on Skylake to Coffee lake, this is what you need.

And finally install the TensorFlow pip package

For Python 2.7:

$ sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-*.whl

Python 3.4:

$ sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-*.whl

Step 5. Upgrade protobuf:

Upgrade to the latest version of the protobuf package:

For Python 2.7:

$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl

For Python 3.4:

$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp34-none-linux_x86_64.whl

Step 6. Test your installation:

To test the installation, open an interactive Python shell and import the TensorFlow module:

   $ cd
   $ python


>>> import tensorflow as tf
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

With the TensorFlow module imported, the next step to test the installation is to create a TensorFlow Session, which will initialize the available computing devices and provide a means of executing computation graphs:

>>> sess = tf.Session()

This command will print out some information on the detected hardware configuration. For example, the output on a system containing a Tesla M40 GPU is:

>>> sess = tf.Session()
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:04:00.0
Total memory: 11.25GiB
Free memory: 11.09GiB

To manually control which devices are visible to TensorFlow, set the CUDA_VISIBLE_DEVICES environment variable when launching Python. For example, to force the use of only GPU 0:

$ CUDA_VISIBLE_DEVICES=0 python

You should now be able to run a Hello World application:

    >>> hello_world = tf.constant("Hello, TensorFlow!")
    >>> print sess.run(hello_world)
    Hello, TensorFlow!
    >>> print sess.run(tf.constant(123)*tf.constant(456))
    56088
    

Tips:

To achieve similar results without building the packages, you can deploy nvidia-docker and install tensorflow from NVIDIA's NGC registry.

Use this to deploy nvidia-docker on Ubuntu: https://gist.github.com/Brainiarc7/a8ab5f89494d053003454efc3be2d2ef

Use the NGC to deploy the preconfigured containers. Optimized builds for Tensorflow, Caffe, Torch, etc are also available: https://www.nvidia.com/en-us/gpu-cloud/deep-learning-containers/

Also see the NGC panel: https://ngc.nvidia.com/registry

@Brainiarc7
Copy link
Author

Hey @MarkSonn,

These optimizations will speed up training performance. However, that also increases the initial build time. To what extent depends on your exact setup (Processor, compiler versions, etc).

CUDA PTX targets are dictated by options passed to the NVCC compiler, which are offered as part of the configuration step when building the project. Refer to this document for more details: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html

By default, NVCC will automatically pick up one PTX version targeted to your GPU's SM architecture, such as 6.1 for Pascal, etc. Only override this if you're building the target for a different host.

For further GCC optimizations (use with care), see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options

@Brainiarc7
Copy link
Author

And for available PTX targets depending on your CUDA compiler and the installed GPU, please refer to NVIDIA's documentation here: https://developer.nvidia.com/cuda-gpus

@tim37021
Copy link

Have everyone tried this with r1.2?

I got the following error, however incompatible_disallow_uncalled_set_constructor does not work for me. it still output the same error

/home/tim/.cache/bazel/_bazel_tim/439bea30bd5a24f814bf00e0ff130e68/external/io_bazel_rules_closure/closure/stylesheets/closure_css_library.bzl:27:13: The function 'set' has been removed in favor of 'depset', please use the latter. You can temporarily refer to the old 'set' constructor from unexecuted code by using --incompatible_disallow_uncalled_set_constructor=false

@eddywm
Copy link

eddywm commented Apr 15, 2018

I am getting this error

$HOME/.cache/bazel/_bazel_eddwm/a970f6278411c8a88863559be597c15c/external/io_bazel_rules_closure/closure/stylesheets/closure_css_library.bzl:27:13: The function 'set' has been removed in favor of 'depset', please use the latter. You can temporarily refer to the old 'set' constructor from unexecuted code by using --incompatible_disallow_uncalled_set_constructor=false
The same error as @tim37021

@peshmerge
Copy link

Thanks for this gist! But one thing for installing bazel on Ubunut 18.04 you will need to specify the arch deb [arch=amd64]
The command should looks like this:
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
https://docs.bazel.build/versions/master/install-ubuntu.html

@Brainiarc7
Copy link
Author

You're correct. I'll update the documentation.

Also, if you'd prefer not to build from source but want similar performance, deploy the instances from NVIDIA's NGC cloud.

See how to get started on nvidia-docker here: https://gist.github.com/Brainiarc7/a8ab5f89494d053003454efc3be2d2ef

@pascalwhoop
Copy link

pascalwhoop commented Jun 15, 2018

For anyone coming here and wondering "how much longer" it will take:

I just built TF for Cuda 9.2 on my i7 quad core 7700k @4.4ghz with 16GB

$ bazel build --config=opt --config=cuda --action_env PATH="$PATH" //tensorflow/tools/pip_package:build_pip_package
INFO: Elapsed time: 2900.785s, Critical Path: 126.77s
INFO: 7823 processes, local.
INFO: Build completed successfully, 10065 total actions

took about 50 minutes. I'lll install this and then try again with some optimisations to see the difference in build time and performance

EDIT 1:

Cannot load tensorflow:

Python 3.6.5 (default, May 11 2018, 04:00:52) 
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/__init__.py", line 22, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 81, in <module>
    from tensorflow.python import keras
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py", line 24, in <module>
    from tensorflow.python.keras import activations
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/activations/__init__.py", line 22, in <module>
    from tensorflow.python.keras._impl.keras.activations import elu
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/__init__.py", line 21, in <module>
    from tensorflow.python.keras._impl.keras import activations
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/activations.py", line 23, in <module>
    from tensorflow.python.keras._impl.keras import backend as K
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py", line 38, in <module>
    from tensorflow.python.layers import base as tf_base_layers
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 25, in <module>
    from tensorflow.python.keras.engine import base_layer
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/__init__.py", line 21, in <module>
    from tensorflow.python.keras.engine.base_layer import InputSpec
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 33, in <module>
    from tensorflow.python.keras import backend
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend/__init__.py", line 22, in <module>
    from tensorflow.python.keras._impl.keras.backend import abs
ImportError: cannot import name 'abs'

I have protobuf 3.6.0, built TF 1.9 with CUDA compute 6.1. Will continue to investigate.

related issue

Working now! I had to manually delete the tensorflow folder inside my site-packages folder of the virtualenv. 1.8.0 -> 1.9.0 changed the location of the keras files massively and there was something still cached somewhere. If this doesn't help, also delete the __pycache__ file, that might also help.

@dwSun
Copy link

dwSun commented Jun 26, 2018

Is it possible to build multpile tensorflow versions with a config script without the interactive configure progress?

@Brainiarc7
Copy link
Author

Hey @dwSun,

I'll research on that later.

@Brainiarc7
Copy link
Author

Hello @dwSun,

To automate the build via a script, some environment variables must be initialized (so as to respond to the interactive configuration utility), such as these shown here: https://gist.github.com/PatWie/0c915d5be59a518f934392219ca65c3d

All credit goes to @PatWie

@third-meow
Copy link

Hello

thanks for this guide, this worked for me :)
Can I delete the source folder and package folder after pip installing?

@baregawi
Copy link

baregawi commented Oct 2, 2018

Thank you so much! This helped me get through many installation bugs.

I'd like to mention that if you do the git reset hard step to that commit then you will need to specifically install bazel 0.5.4 to makes this work. You cannot install that version via apt-get at the moment so you have to do:

export BAZEL_VERSION=0.5.4 && sudo apt-get install -y --no-install-recommends    bash-completion    g++    zlib1g-dev  && curl -LO "https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel_${BAZEL_VERSION}-linux-x86_64.deb"  && sudo dpkg -i bazel_*.deb

Which I got from a kind user @drigz on this issue: bazelbuild/continuous-integration#128 (comment)

@baregawi
Copy link

baregawi commented Oct 2, 2018

Is it possible to build multpile tensorflow versions with a config script without the interactive configure progress?

@dwSun
If you haven't figured out yet, you can do that by setting environment variables such as $TF_NEED_JEMALLOC, $TF_ENABLE_XLA, $TF_NEED_CUDA, etc. to 0 for "no" and 1 for "yes".
That way the script won't ask you. You can find the name of those variables in the configure script.

@baregawi
Copy link

baregawi commented Oct 2, 2018

Hello

thanks for this guide, this worked for me :)
Can I delete the source folder and package folder after pip installing?
@third-meow All you need is the .whl file to install it again in the future so just make sure not to lose that if you decide to delete the source folder or you'll have to repeat the build.

@motiteux
Copy link

Hi,

Thanks a lot for the guide!
Would you happen to have benchmark to support the claim (I think it is true, but how much do you gain between installing from Pip, conda and build)?

Benchmark between pip and conda was explain lately (https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c) but what about the performance gain from building from source?

@patelprateek
Copy link

'''
Use this string:

--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2
Which will replace --march=native (the default).
'''

Can you please clarify the above statement ?
IIUC , if i am building on a machine that deosn't support avx with flags --copt=-mavx and --march=native , will that throw error or it wont use avx instructions since it is not supported by native arch ?
is it neccessary to provide both --copt and --march both ? or does tensorflow automatically chooses all possible best instructions for a given architecture ?
what if i am building for an architecture that supports avx and avx2 , but i dont provide --copt=-mavx --copt=-mavx2 , but i do provide --march=native ? is this an optimized built or not ?

@patelprateek
Copy link

on a similar note , i do see option for march=native , when i do ./configure command , then i see march=native in some .bazelrc file and then the third option is providing -march=native during bazel build command ? Can you please explain the relation and which one takes precedence ?
i see ./configure probably generates some bazelrc file that is read by the bazel build as an input , but not entirely sure .

@Brainiarc7
Copy link
Author

@motiteux,

Conda builds are indeed better optimized for performance (on Intel platforms), since they build with the Intel MKL libraries.
Building from source overcomes that liimitation for scenarios where MKL is either undesired or of no practical use (non Intel processor scenarios, for one).

@patelprateek,

On virtually all modern processors (post Skylake+, for Intel), these instructions are provided.
For Pre-Skylake (up to Haswell and Broadwell), you can omit mavx2.

For the native target, confirm what optimizations are availed to you:

gcc -march=native -Q --help=target

To see target specific optimizations:

gcc -march=native -Q --help=target | grep enable

Should march native fail for you, try adding --cxxopt=-march=native instead and retest.

@alex-petrenko
Copy link

After I built this my Tensorflow still complains that I don't use AVX512F. Is there a way to build with AVX512F support?

@Brainiarc7
Copy link
Author

@alex-petrenko,

What processor are you using?
And on what Linux-based system? What version of GCC is available to you?
Show me the output of:

gcc -march=native -Q --help=target

With that information present, assuming that your GCC version is up to date (and implements AVX512F support), pass the appropriate flags to bazel at the configuration page:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --copt=mavx512f --config=cuda //tensorflow/tools/pip_package:build_pip_package

For information on tuning options available for x86, see this: http://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

@CovertKoala
Copy link

@Brainiarc7,

Do you have any numbers to show just how much of a speed up you achieve by doing your own build?

@Brainiarc7
Copy link
Author

@CovertKoala,

I can provide these stats if needed.
The biggest jump observed so far was on a Xeon Platinum 8160, where these AVX-512F enablements really paid off.

@CovertKoala
Copy link

@Brainiarc7,

I'm a total noob to ML and tensorflow, the question is more out of my own curiousity (no need to go out of your way to get them if you don't have them). As I play around with different NN architectures, I'm wishing things were a bit faster as I make my tweaks.

I've got an NVIDIA GTX 1080 and Intel 8th gen i7. While it definitely is quick, it's no Xeon Platinum.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment