Skip to content

Instantly share code, notes, and snippets.

@pangyuteng
Forked from yochze/Dockerfile
Last active January 28, 2021 14:24
Show Gist options
  • Save pangyuteng/ff4c0b7b8eef2555383edfdf7b1cca7c to your computer and use it in GitHub Desktop.
Save pangyuteng/ff4c0b7b8eef2555383edfdf7b1cca7c to your computer and use it in GitHub Desktop.
Dockerfile to build Tensorflow 1.15.0 using nvidia-docker

Motivation for building tensorflow...

pre-built tensorflow that is packaged with tf-node-gpu is built to support GPU with compute capability of 6.0 if you have an older GPU with compute capability < 6, TF will ignore your GPU and output below warning message!!

tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Ignoring visible gpu device (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0) with Cuda compute capability 5.0. The minimum required Cuda capability is 6.0.

related issues:

tensorflow/tensorflow#36853

https://github.com/tensorflow/tensorflow/issues/38971

Follow below instructions (work-in-progress) so that tensorflow will be build with your GPU's compute capability!

below instructions and this Dockerfile is used to build tensorflow with your own GPU.

versions used:

tensorflow_gpu-1.15.0 py 3.7 GCC 7.3.1 Bazel 0.26.1 cudnn7.4 cuda 10.0

cuda compute capability can be set in Dockerfile, set to 5.0 for the gpu i'm using (GeForce 940MX).

INSTRUCTIONS

  • (optional) increase swap file to 16gb [3,4]
  • run below commands
docker build -t mytensorflow .
  • OR... go in to container and run below.
docker run --gpus all -it -w /tensorflow -v $PWD:/mnt mytensorflow bash
bazel build --config=opt --config=cuda //tensorflow:libtensorflow.so
  • go for a surf/watch a movie/clean the house, since above will take a long long time, maybe 4+ hours [10].

  • expect to update Dockerfile, rinse and repeat...at least we are using docker these days.

  • expect to abadon above attempt, and just go buy a gpu with compute capabiliyt of 6.0 :D

  • copy built files out

docker run --gpus all -it -w /tensorflow -v $PWD:/mnt mytensorflow bash
cp bazel-bin/tensorflow/libtensorflow.so.1.15.0 /mnt
cp bazel-bin/tensorflow/libtensorflow_framework.so.1.15.0 /mnt

REFERENCES

[1] https://www.tensorflow.org/install/source#tested_build_configurations

[2] https://stackoverflow.com/questions/9727688/how-to-get-the-cuda-version

[3] tensorflow/tensorflow#25965

[4] https://askubuntu.com/questions/1075505/how-do-i-increase-swapfile-in-ubuntu-18-04

[5] https://gist.github.com/yochze/3898e1405bb3a024acfb9bb9eef132c3

[6] tensorflow/tensorflow#21531

[7] https://launchpad.net/~jonathonf/+archive/ubuntu/python-3.6

[8] NVIDIA/nvidia-docker#595

[9] tensorflow/tensorflow#25865

[10] https://stackoverflow.com/questions/54541969/tensorflow-compile-runs-forever

[11] https://gist.github.com/Brainiarc7/6d6c3f23ea057775b72c52817759b25c

HISTORY

  • started using dockerfile from [5],

  • ppa:jonathonf/python-3.6 is no longer available [7], thus swapped py 3.6 to 3.7, and tensorflow version.

  • getting multiple errors [6], due to versions of gcc/cuda/tensorflow

  • finally, decided to go with official build instruction [1,8] and using the below version guideline

   version               Python version  Compiler    Build tools     cuDNN   CUDA
   tensorflow_gpu-1.13.1 2.7, 3.3-3.7    GCC 4.8     Bazel 0.19.2    7.4     10.0
                                         using GCC 7.x instead.
                                         only because I'm lazy.
  • not building in docker, since we want gpu access, during compilation, thus following [1] to build by first, going into docker container via docker run --gpus all ... then, bazel build ....

  • (turns out you can just use docker build, and there is no need to use docker run to build, see below last bullet point)

  • final dockerfile and instructions is a blend from [1,5,9,11]

  • attemping to build within docker (not using 'docker run') and switched tf to v1.15.0. finally got TF to built successfully.

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
# install python 3.7 and pip
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
build-essential \
software-properties-common \
pkg-config \
rsync \
curl \
git \
unzip \
zip \
zlib1g-dev \
wget \
vim \
&& add-apt-repository ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
python3.7 python3.7-dev python3-pip python3.7-venv \
&& python3.7 -m pip install pip --upgrade \
&& python3.7 -m pip install six numpy wheel mock
RUN ln -s /usr/bin/python3.7 /usr/bin/python
# Set up Bazel.
# Running bazel inside a `docker build` command causes trouble, cf:
# https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
# https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
>>/etc/bazel.bazelrc
# Install the most recent bazel release.
ENV BAZEL_VERSION 0.26.1
WORKDIR /
RUN mkdir /bazel && \
cd /bazel && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
chmod +x bazel-*.sh && \
./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
cd / && \
rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
# Download and build TensorFlow.
ENV TENSORFLOW_VERSION v1.15.0
WORKDIR /tensorflow
RUN git clone --branch=${TENSORFLOW_VERSION} --depth=1 https://github.com/tensorflow/tensorflow.git .
# check for cuda, cudnn version
# docker run -it mytensorflow /bin/bash
# nvcc --version
# cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
# Configure the build for our CUDA configuration.
ENV CI_BUILD_PYTHON python3.7
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV TF_NEED_CUDA 1
#ENV TF_NEED_TENSORRT 1
ENV TF_NEED_TENSORRT 0
#ENV TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0
ENV TF_CUDA_COMPUTE_CAPABILITIES=5.0
ENV TF_CUDA_VERSION=10.0
ENV TF_CUDNN_VERSION=7.6
WORKDIR /tensorflow
RUN yes '' | ./configure
RUN export LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH && \
ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1
ENV TMP=/tmp
RUN bazel build --config=opt --config=cuda //tensorflow:libtensorflow.so
@pangyuteng
Copy link
Author

pangyuteng commented Jul 7, 2020

A tip for those must go down this path, go with the tested configuration/versions listed in the below "test_build_configurations" section.
https://www.tensorflow.org/install/source#tested_build_configurations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment