stevexyz/tensorflow_linux_sse41_sse42_avx_avx2_fma.md

## tensorflow_linux_sse41_sse42_avx_avx2_fma.md

      
    Raw
  

              tensorflow_linux_sse41_sse42_avx_avx2_fma.md
            
          
    Build TensorFlow 1.8 with SSE4.1/SSE4.2/AVX/AVX2/FMA on Linux

These instructions were inspired by Mistobaan's gist, ageitgey's gist, and mattiasarro's tutorial.
Background

I always encountered the following warnings when running my scripts using the precompiled TensorFlow Python package:
I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

I realized I can make these warnings go away by compiling from source, in addition to improve training speed. It was not as easy and straightforward as I thought, but I finally succeeded in creating a working build. Here I outline the steps I took, in the hopes it may benefit those who have encountered similar challenges.
Machine setup

Hardware


Model: Dell XPS
Processor: ...
Memory: ...
Graphics: ...

Software


OS: Linux 16.04
TensorFlow version: 1.8.0rc1
Python version: 3.5.2
Bazel version: 0.12.0
CUDA/cuDNN version: ...

Prerequisites

_
Steps

Note: Many steps were based on https://www.tensorflow.org/install/install_sources

Verify that the following packages are installed:

six
numpy
has to be at least 1.13 so you don't get a ModuleNotFoundError: No module named 'numpy.lib.mixins' error later on during bazel build
wheel


Clone the TensorFlow repository (instructions): be sure to checkout the r1.3 release
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout r1.8

Configure the installation
bazel clean
./configure
My configure settings (Enter N for CUDA support if you do not want CUDA support or do not have a NVIDIA GPU):
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Found possible Python library paths:  
  /usr/local/lib/python3.5/dist-packages
  /usr/lib/python3/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.5/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: 
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n    
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: 
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: 
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: 
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: 
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: 
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished


Build the pip package (reference: https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions). It took a lot!
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.1 --copt=-msse4.2 --verbose_failures -k //tensorflow/tools/pip_package:build_pip_packa

Refer to tensorflow/tensorflow#6729 if you run into any other problems
Build the wheel (.whl) file
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Install the pip package
pip3 install --upgrade --ignore-installed /tmp/tensorflow_pkg/tensorflow-1.8.0rc1-cp35-cp35m-linux_x86_64.whl

Validate your installation (instructions)

Change directory to any directory on your system other than the tensorflow subdirectory from which you ran ./configure
cd ~

Invoke python interactive shell
python

Type in the following script
import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))


Have fun training your models!