Skip to content

Instantly share code, notes, and snippets.

@sjs7007
Last active September 13, 2016 16:12
Show Gist options
  • Save sjs7007/feb1ae72b7c440ed410e1998158bdb06 to your computer and use it in GitHub Desktop.
Save sjs7007/feb1ae72b7c440ed410e1998158bdb06 to your computer and use it in GitHub Desktop.
Installing and testing distributed tensor flow.

Instructions are based on this link as base source. I have given modified instructions for steps which didn't work for me.

Installing dependencies

tensorflow

 sudo apt-get install pkg-config zip g++ zlib1g-dev unzip swig git

java

 sudo apt-get install software-properties-common
 sudo add-apt-repository ppa:webupd8team/java
 sudo apt-get update
 sudo apt-get install oracle-java8-installer

bazel

depending on whether you are using version 0.8 or 0.10 of tensorflow, you need different version. source for 0.10 tensorflow -> 0.3 bazel for 0.8 tensorflow -> < 0.3 bazel

for 0.2

 wget https://github.com/bazelbuild/bazel/releases/download/0.2.0/bazel_0.2.0-linux-x86_64.deb
 sudo dpkg -i bazel_0.2.0-linux-x86_64.deb

for 0.3 : source

 echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
 curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
 sudo apt-get update && sudo apt-get install bazel

cuda and cudnn The method specified on the base source didn't work for me. The commands given here for installing cuda and cudnn works well.

gRPC Server

TensorFlow uses gRPC for inter-process communication. To build the server binary, first clone TensorFlow repository :

Code
$ git clone –recurse-submodules https://github.com/tensorflow/tensorflow
NOTE: The initial commit of the open-source distributed TensorFlow runtime is 00986d48bb646daab659503ad3a713919865f32d.

Then, cd into the TensorFlow repository and run the ./configure script. Now, you can build the server binary with :

Code
$ bazel build -c opt –config=cuda //tensorflow/core/distributed_runtime/rpc:grpc_tensorflow_server

In case you get an error like this "ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path", it could be because of an caching issue and re-running "./configure" fixes it.

"libcudart.so.7.5: cannot open shared object file: No such file or directory " tensorflow/tensorflow#1501

@sjs7007
Copy link
Author

sjs7007 commented Sep 6, 2016

to fix libcuda issue, tensorflow/tensorflow#2986 "sudo ldconfig /usr/local/cuda/lib64" worked.
to test

import tensorflow as tf
c = tf.constant("Hello World !")
sess = tf.Session("grpc://localhost:2222")
sess.run(c)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment