Skip to content

Instantly share code, notes, and snippets.

@ruslanmv
Forked from taylorpaul/README.md
Created December 16, 2020 17:17
Show Gist options
  • Save ruslanmv/c74ac7cff7005796401084c56336d585 to your computer and use it in GitHub Desktop.
Save ruslanmv/c74ac7cff7005796401084c56336d585 to your computer and use it in GitHub Desktop.
Installing Tensorflow on CENTOS 6.8 Cluster without Root Access

Environment:

OS: CENTOS 6.8 (No root access)

GCC: locally installed 5.2.0 (Cluster default is 4.4.7)

Bazel: 0.4.0-2016-11-06 (@fa407e5)

Tensorflow: v0.11.0rc2

CUDA: 8.0

CUDNN: 5.1.5

Steps:

You should be able to modify the script (buildtf.sh) below to do these steps automatically, but I list out details here as well.

Installing Java Locally:

Follow this Tutorial or download prefered version of JDK 8.0 and set proper environment variables as described in the tutorial.

Compiling Bazel, Compiling and Installing Tensorflow:

Great Tutorial that got me to the error below!

Note: After change the linker line to your local or module GCC, If you get errors about finding ld, or other executables that are stored in /usr/bin here is the work around I used (it isn't pretty and you might not need it, but just in case):

  1. Copy your compiler directory (/opt/gcc/5.2.0) to a local directory that you have permissions to modify.

  2. Then run:

cp `which ld` /opt/gcc/5.2.0/bin/ld (repeat for any command listed in the crosstools that doesn't already reside in your gcc /bin directory)

Note2: I downloaded a newer release of bazel and tensorflow as noted above and there are fewer changes required in the latest versions of the crosstool then described in the tutorial.

  1. modify /tensorflow/third_party/gpus/crosstool/CROSSTOOL.tpl as described in tutorial above

  2. modify /tensorflow/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl as described in tutorial above. I did not change the first line: #!/usr/bin/env python (but the tutorial does!)

Again these steps led to the below error which took me forever to get past:

GLIBCXX_3.4.18 not found error

Getting Past GBLICXX_3.4.18 Error:

As described in gbkedar's comment from Jul 12. You have to find this file:

$INSTALL_PATH/tensorflow/bazel-tensorflow/external/protobuf/protobuf.bzl

But, until the compile fails this file is harder to find. (The buildtf.sh re-runs the compile after modifying the file after the first failure). The failure creates the shortcut in the /tensorflow directory. I was running into issues re-attempting the compile and had to run ./configure almost everytime. Therefore, I had to find this file before the first failure of my compile attempt. The file should be located somewhere similar to this after running ./configure from the /tensorflow directory:

~/.cache/bazel/_bazel_YOURUSERNAME/YOURHASH(i.e. f81f1107f96c7515450fc43e0dbb6ed5)/external/protobuf/protobuf.bzl

If you have several hashes, check the files that were modified at the time corresponding to your ./configure run.

As described in the error link above, search for ctx.action and add env=ctx.configuration.default_shell_env, at the bottom of the call like so:

  if args:
    ctx.action(
        inputs=inputs,
        outputs=ctx.outputs.outs,
        arguments=args + import_flags + [s.path for s in srcs],
        executable=ctx.executable.protoc,
        mnemonic="ProtoCompile",
        env=ctx.configuration.default_shell_env,
    )

You will then likely hit error trying to exec 'as': execvp: No such file or directory. Since I am a self-confessing linux noob, you have to use the few tricks you know as much as possible(I didn't follow gbkedar's 2nd comment):

cp `which as` /opt/gcc/5.2.0/bin/as

After this change, tensorflow finally compiled successfully for me!

Building .whl file:

Going back to our tutorial I ran this command:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

and received the bdist_wheel not found error... I solved this by using pip install to install a new version of wheel locally:

pip install --target=/home/thpaul/python27-packages wheel

and then added that directory to my $PYTHONPATH variable:

export PYTHONPATH=/home/thpaul/python27-packages/:$PYTHONPATH

Re-running the command builds the proper .whl file which you can install via pip.

Hope this helps anyone trying to compile tensorflow from source!

#installs TF and all required dependencies except CUDNN* without root!
#*Requires signing up for account to download! (Pretty easy, but do this first!)
#https://developer.nvidia.com/cudnn
#Original Environment: CENTOS 6.8, non-standard GCC = 5.2.0
#To note, I copied every binary (ld, as, etc..) required by BAZEL (see tensorflow CROSSTOOL.tpl)
#into my GCC_DIR!
#TODO: There are a couple TODO's listed that will be system specific!
# Ensure we can load CUDA drivers.
module load cuda/8.0 || { echo 'Failed to load CUDA drivers. Are you not on a compute node?' ; exit 1; }
#TODO: GCC_DIR/LOCAL_INCLUDE/LOCAL_LIBRARY if not standard system gcc (which gcc)
STARTDIR=`pwd`/tf_tools
GCC_DIR=/work/thpaul/gcc/5.2.0
BAZEL_BIN_DIR=/work/thpaul/bin #/bin where to copy bazel binary
PYTHON_INSTALL_DIR=python27
JAVA_DIR=jdk1.8.0_102 #Directory you jdk.tar file extracts too (depends on which version you DL)
LOCAL_INCLUDE=$STARTDIR/include
LOCAL_LIBRARY=$STARTDIR/lib
#VERSIONS
PYTHON_VERSION=2.7.12
JAVA_FILE=jdk-8u102-linux-x64 #Update Java version in DOWNLOADS too...
BAZEL_VERSION=0.4.0 #TAG from github https://github.com/bazelbuild/bazel, don't use if latest release
TF_VERSION=v0.11.0rc2 #TAG from https://github.com/tensorflow/tensorflow/releases, don't use if latest release
#DOWNLOADS
https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tgz
wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz
wget --no-check-certificate https://pypi.python.org/packages/source/s/setuptools/setuptools-1.4.2.tar.gz -O setuptools-1.4.2.tar.gz
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u102-b14/jdk-8u102-linux-x64.tar.gz
wget https://sqlite.org/2016/sqlite-autoconf-3150100.tar.gz #TODO: update if newer version needed (3.8.6)
echo "Buidling Directories"
mkdir -p $STARTDIR
cd $STARTDIR
mkdir -p $PYTHON_INSTALL_DIR
cd $PYTHON_INSTALL_DIR
PYTHON_INSTALL_DIR=`pwd`
cd ..
# Set tmp directory to userspace
mkdir -p tmp
cd tmp
TMPDIR=`pwd`
cd ..
#Unzip archives
echo "Decompressing archives"
tar zxvf ../Python-$PYTHON_VERSION.tgz
tar --totals -xvf ../setuptools-1.4.2.tar.gz
tar --totals -xvf ../$JAVA_FILE.tar.gz
tar --totals -xvf ../sqlite-autoconf-3150100.tar.gz
cd sqlite-autoconf-3150100
SQLITE_INSTALL_DIR=`pwd`
echo "Installing sqlite3 libs at `pwd`!"
./configure --enable-shared --prefix=$SQLITE_INSTALL_DIR
make
make install
cp ./include/* $LOCAL_INCLUDE
cp ./lib/* $LOCAL_LIBRARY
cd ..
cd Python-$PYTHON_VERSION
echo "Installing python at $PYTHON_INSTALL_DIR"
#TODO: Have to change setup.py to look in local include file for sqlite3 libraries
sed -i 's#/usr/local/include/sqlite3#'$LOCAL_INCLUDE'#g' ./setup.py
./configure --enable-shared --prefix=$PYTHON_INSTALL_DIR --enable-loadable-sqlite-extensions #TODO: need sqlite3 for nltk and others
make
make altinstall
export PATH=$PYTHON_INSTALL_DIR/bin:$PATH
cd ..
echo "----- Installing Pip"
cd setuptools-1.4.2
export LD_LIBRARY_PATH=$PYTHON_INSTALL_DIR/lib:$LD_LIBRARY_PATH
$PYTHON_INSTALL_DIR/bin/python2.7 setup.py install
curl https://bootstrap.pypa.io/get-pip.py | $PYTHON_INSTALL_DIR/bin/python2.7 -
pip install --no-cache-dir numpy
pip install -U nltk
cd ..
cd $JAVA_DIR
echo "Installing JAVA at `pwd`"
#Save JAVA variables:
JAVA_INSTALL_DIR=`pwd`
export JAVA_HOME=$JAVA_INSTALL_DIR
export JAVA_JRE=$JAVA_INSTALL_DIR/jdk1.8.0_102/jre
export PATH=$PATH:$JAVA_INSTALL_DIR/jdk1.8.0_102/bin:$JAVA_INSTALL_DIR/jdk1.8.0_102/jre/bin
cd ..
echo "Compiling bazel in `pwd`/bazel"
git clone https://github.com/bazelbuild/bazel.git
git checkout $BAZEL_VERSION #TODO: Format Specific to your git version, only need if not using bazel latest-release
cd bazel
./compile.sh
wait #TODO: Seems to want to compile twice???
cp ./output/bazel $BAZEL_BIN_DIR/bazel
cd ..
echo "Compiling tensorflow in `pwd`/tensorflow"
git clone https://github.com/tensorflow/tensorflow.git
git checkout $TF_VERSION #TODO: Format Specific to your git version if not latest-release
cd tensorflow
TF_INSTALL_DIR=`pwd`
# TODO: Adjust the configure file only if .cache is on an NFS and clean fails:
cp configure configure_orig #just in case
sed -i 's/bazel clean --expunge/bazel clean --expunge_async/g' configure
#Modify tensorflow CROSSTOOL.tpl file:
cp $TF_INSTALL_DIR/third_party/gpus/crosstool/CROSSTOOL.tpl ./third_party/gpus/crosstool/CROSSTOOL_ORIG.tpl
sed -i 's#/usr/bin#'$GCC_DIR'/bin#g' $TF_INSTALL_DIR/third_party/gpus/crosstool/CROSSTOOL.tpl
#Modify tensorflow crosstool_wrapper_driver_is_not_gcc.tpl file
cp $TF_INSTALL_DIR/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl \
$TF_INSTALL_DIR/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc_ORIG.tpl
sed -i 's#/usr/bin/gcc/#'$GCC_DIR'/bin/gcc#g' \
$TF_INSTALL_DIR/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
./configure
wait
#Can't just use the basic call from tensorflow.org install directions:
bazel build -c opt --config=cuda --genrule_strategy=standalone --spawn_strategy=standalone //tensorflow/tools/pip_package:build_pip_package
wait
#TODO: When/If fails with GBLICXX... error, run this afterwards:
PROTOFILE=$(readlink -f -- "$TF_INSTALL_DIR/bazel-tensorflow/external/protobuf/protobuf.bzl")
cp $TF_INSTALL_DIR/bazel-tensorflow/external/protobuf/protobuf.bzl $TF_INSTALL_DIR/bazel-tensorflow/external/protobuf/ORIG_protobuf.bzl
sed -i 's/mnemonic="ProtoCompile",/mnemonic="ProtoCompile", env=ctx.configuration.default_shell_env,/g' \
$PROTOFILE
bazel build -c opt --config=cuda --genrule_strategy=standalone --spawn_strategy=standalone //tensorflow/tools/pip_package:build_pip_package
wait
bazel-bin/tensorflow/tools/pip_package/build_pip_package $TMPDIR/tensorflow_pkg
#Get name of the created whl file:
for filename in $TMPDIR/tensorflow_pkg/*;
do
export TF_WHEEL_FILE=$filename
done
#Finally install TF!
pip install $TF_WHEEL_FILE
echo "====================CAVEATS=============================="
echo "Don't forget to update necessary Environment Variables for in .bash_profile!"
echo 'echo "export PATH='$JAVA_INSTALL_DIR'/bin:'$JAVA_INSTALL_DIR'jre/bin:$PATH" >> ~/.bash_profile'
echo 'echo "export PATH='$PYTHON_INSTALL_DIR'/bin:'$BAZEL_BIN_DIR'/bin:$PATH" >> ~/.bash_profile'
echo 'echo "export JAVA_HOME='$JAVA_INSTALL_DIR'" >> ~/.bash_profile'
echo 'echo "export JAVA_JRE='$JAVA_INSTALL_DIR'/jre" >> ~/.bash_profile'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment