Skip to content

Instantly share code, notes, and snippets.

@rcompton
Forked from erikbern/install-tensorflow.sh
Last active June 8, 2016 17:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rcompton/d366a80a3eb9f0030cc20a42869b8b38 to your computer and use it in GitHub Desktop.
Save rcompton/d366a80a3eb9f0030cc20a42869b8b38 to your computer and use it in GitHub Desktop.
Installing TensorFlow on EC2
# Note – this is not a bash script (some of the steps require reboot)
# I named it .sh just so Github does correct syntax highlighting.
#
# This is also available as an AMI in us-east-1 (virginia): ami-cf5028a5
#
# The CUDA part is mostly based on this excellent blog post:
# http://tleyden.github.io/blog/2014/10/25/cuda-6-dot-5-on-aws-gpu-instance-running-ubuntu-14-dot-04/
# Install various packages
sudo apt-get update
sudo apt-get upgrade -y # choose “install package maintainers version”
sudo apt-get install -y build-essential python-pip python-dev git python-numpy swig python-dev default-jdk zip zlib1g-dev
# Blacklist Noveau which has some kind of conflict with the nvidia driver
echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot # Reboot (annoying you have to do this in 2015!)
# Some other annoying thing we have to do
sudo apt-get install -y linux-image-extra-virtual
sudo reboot # Not sure why this is needed
# Install latest Linux headers
sudo apt-get install -y linux-source linux-headers-`uname -r`
# Install CUDA 7.0 (note – don't use any other version)
wget http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_7.0.28_linux.run
chmod +x cuda_7.0.28_linux.run
./cuda_7.0.28_linux.run -extract=`pwd`/nvidia_installers
cd nvidia_installers
sudo ./NVIDIA-Linux-x86_64-346.46.run
sudo modprobe nvidia
sudo ./cuda-linux64-rel-7.0.28-19326674.run
cd ~
# Install CUDNN 6.5 (note – don't use any other version)
wget http://developer.download.nvidia.com/compute/redist/cudnn/v2/cudnn-6.5-linux-x64-v2.tgz
tar -xzf cudnn-6.5-linux-x64-v2.tgz
sudo cp cudnn-6.5-linux-x64-v2/libcudnn* /usr/local/cuda/lib64
sudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda/include/
# At this point the root mount is getting a bit full
# I had a lot of issues where the disk would fill up and then Bazel would end up in this weird state complaining about random things
# Make sure you don't run out of disk space when building Tensorflow!
sudo mkdir /mnt/tmp
sudo chmod 777 /mnt/tmp
sudo rm -rf /tmp
sudo ln -s /mnt/tmp /tmp
# Note that /mnt is not saved when building an AMI, so don't put anything crucial on it
# Install Bazel
#either set JAVA_HOME or upgrade to java 8
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer # warning, interactive prompt...
cd /mnt/tmp
git clone https://github.com/bazelbuild/bazel.git
cd bazel
git checkout tags/0.1.0
./compile.sh
sudo cp output/bazel /usr/bin
# Install TensorFlow
cd /mnt/tmp
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow
# Patch to support older K520 devices on AWS
# wget "https://gist.githubusercontent.com/infojunkie/cb6d1a4e8bf674c6e38e/raw/5e01e5b2b1f7afd3def83810f8373fbcf6e47e02/cuda_30.patch"
# git apply cuda_30.patch
# According to https://github.com/tensorflow/tensorflow/issues/25#issuecomment-156234658 this patch is no longer needed
# Instead, you need to run ./configure like below (not tested yet)
TF_UNOFFICIAL_SETTING=1 ./configure
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
# Build Python package
# Note: you have to specify --config=cuda here - this is not mentioned in the official docs
# https://github.com/tensorflow/tensorflow/issues/25#issuecomment-156173717
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
# Test it!
cd tensorflow/models/image/cifar10/
python cifar10_multi_gpu_train.py
# On a g2.2xlarge: step 100, loss = 4.50 (325.2 examples/sec; 0.394 sec/batch)
# On a g2.8xlarge: step 100, loss = 4.49 (337.9 examples/sec; 0.379 sec/batch)
# doesn't seem like it is able to use the 4 GPU cards unfortunately :(
@rcompton
Copy link
Author

rcompton commented Jun 8, 2016

also

(v2) ubuntu@ip-10-216-229-126:~/tensorflow$ TF_UNOFFICIAL_SETTING=1 ./configure
Please specify the location of python. [Default is /home/ubuntu/v2/bin/python]: 
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] 
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 7.0
Please specify the location where CUDA 7.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 6.5
Please specify the location where cuDNN 6.5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.5
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
Configuration finished

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment