pliablepixels/GPU steps.md

## GPU steps.md

      
    Raw
  

              GPU steps.md
            
          
    Getting dlandon's docker image to work reliably with a GPU using a nvidia's pre-built CUDA/CuDNN image

Raw notes on what I did to get GPU/CUDA/CuDNN working in dlandon's docker image. Not optimized.
So I don't forget later.
There are two ways:


Use his docker image which derives from phusion and manually install CUDA/cuDNN. This did not work for me - I got all sorts of segfaults, install errors, driver mismatch errors (even though my host/docker cuda versions were identical). When things worked after various pacakge errors installing cuda, the image failed after a restart. I've come to the conclusion that CuDNN/CUDA install is very sensitive to environments and what works for one person may dramatically fail for the other especially when we are trying to link a host GPU to a docker container. May as well leave this to the experts (nvidia). So the next option:


Modify his Dockerfile to derive from a pre-installed version of CuDNN/CUDA from nvidia and put the rest of his steps in (and a step or two needed to get the init process working, which his docker relies on). Pro: No need to mess with CUDA/CuDNN, no need to remove/recompile packages post GPU software install. Con: Possibly larger/less optimized docker image (haven't compared) but something that can be cleaned up for those who want to.


Step 1: clone his repo
git clone https://github.com/dlandon/zoneminder

Step 2: Grab phusion's my_init script that is used by dlandon in his docker file
wget https://raw.githubusercontent.com/phusion/baseimage-docker/master/image/bin/my_init
chmod a+x ./my_init

Step 3: Modify his Dockerfile
let's call the new one Dockerfile.nvidia:
FIXME: I did not bother figuring out if we could avoid my_init completely. Like I said, my goal was to get things working. So the easiest way for me was to look at how phusion did service init and borrow that script over and any deps (runit-systemd)
I made the following changes:

Switched base from phusion to the pre-created CUDA+CuDNN dev package from NVIDIA. This is needed to compile apps that need CUDA/CuDNN (like OpenCV)
Copied the downloaded ./my_init script to /usr/sbin
Added a couple of packages we need that phusion builds in that nvidia does not so compilation would work

FIXME: Better to put a mark on cuda packages

Removed the apt-get clean and apt-get autoremove parts for now. Note that apt-get autoremove removes packages that have no dependents. Not a good idea for CUDA/CuDNN libs because when this builds, there are no dependents and they get removed. So when we come to compiling OpenCV, it obviously doesn't find dev libraries. What was odd was even apt-get clean was removing various cuda libs and had the same issue. A better solution would likely be to mark all cuda/cuddn libs to be excluded.

VERY IMPORTANT: You can't pick a random CUDA version. You MUST match your GPU driver version to the CUDA version from here. In my case, my driver version, 430.26 requires cuda 10.1. I can't make it work with cuda 10.2 - compilation etc will work, but when you try and actually use an app that needs to use the GPU via CUDA, it will fail.

# Use the right cuda version for your driver. change to 10.2 or others as needed
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04

LABEL maintainer="dlandon"

ENV     DEBCONF_NONINTERACTIVE_SEEN="true" \
        DEBIAN_FRONTEND="noninteractive" \
        DISABLE_SSH="true" \
        HOME="/root" \
        LC_ALL="C.UTF-8" \
        LANG="en_US.UTF-8" \
        LANGUAGE="en_US.UTF-8" \
        TZ="Etc/UTC" \
        TERM="xterm"

ENV     PHP_VERS="7.4" \
        ZM_VERS="1.34" \
        ZMEVENT_VERS="5.7.4" \
        SHMEM="50%" \
        PUID="99" \
        PGID="100"

COPY init/ /etc/my_init.d/
COPY defaults/ /root/
COPY ./my_init  /sbin/

RUN     apt-get update && \
        apt-get -y install --no-install-recommends software-properties-common runit-systemd && \
        add-apt-repository -y ppa:iconnor/zoneminder-$ZM_VERS && \
        add-apt-repository ppa:ondrej/php && \
        apt-get update && \
        apt-get -y upgrade -o Dpkg::Options::="--force-confold" && \
        apt-get -y dist-upgrade -o Dpkg::Options::="--force-confold" && \
        apt-get -y install apache2 mariadb-server && \
        apt-get -y install ssmtp mailutils net-tools wget sudo make && \
        apt-get -y install php$PHP_VERS php$PHP_VERS-fpm libapache2-mod-php$PHP_VERS php$PHP_VERS-mysql php$PHP_VERS-gd && \
        apt-get -y install libcrypt-mysql-perl libyaml-perl libjson-perl libavutil-dev && \
        apt-get -y install --no-install-recommends libvlc-dev libvlccore-dev vlc

RUN     apt-get -y install zoneminder

RUN     rm /etc/mysql/my.cnf && \
        cp /etc/mysql/mariadb.conf.d/50-server.cnf /etc/mysql/my.cnf && \
        adduser www-data video && \
        a2enmod php$PHP_VERS proxy_fcgi ssl rewrite expires headers && \
        a2enconf php$PHP_VERS-fpm zoneminder && \
        echo "extension=apcu.so" > /etc/php/$PHP_VERS/mods-available/apcu.ini && \
        echo "extension=mcrypt.so" > /etc/php/$PHP_VERS/mods-available/mcrypt.ini && \
        perl -MCPAN -e "force install Net::WebSocket::Server" && \
        perl -MCPAN -e "force install LWP::Protocol::https" && \
        perl -MCPAN -e "force install Config::IniFiles" && \
        perl -MCPAN -e "force install Net::MQTT::Simple" && \
        perl -MCPAN -e "force install Net::MQTT::Simple::Auth"
        RUN     cd /root && \
        chown -R www-data:www-data /usr/share/zoneminder/ && \
        echo "ServerName localhost" >> /etc/apache2/apache2.conf && \
        sed -i "s|^;date.timezone =.*|date.timezone = ${TZ}|" /etc/php/$PHP_VERS/apache2/php.ini && \
        service mysql start && \
        mysql -uroot < /usr/share/zoneminder/db/zm_create.sql && \
        mysql -uroot -e "grant all on zm.* to 'zmuser'@localhost identified by 'zmpass';" && \
        mysqladmin -uroot reload && \
        mysql -sfu root < "mysql_secure_installation.sql" && \
        rm mysql_secure_installation.sql && \
        mysql -sfu root < "mysql_defaults.sql" && \
        rm mysql_defaults.sql

RUN     mv /root/zoneminder /etc/init.d/zoneminder && \
        chmod +x /etc/init.d/zoneminder && \
        service mysql restart && \
        sleep 5 && \
        service apache2 restart && \
        service zoneminder start

RUN     systemd-tmpfiles --create zoneminder.conf && \
        mv /root/default-ssl.conf /etc/apache2/sites-enabled/default-ssl.conf && \
        mkdir /etc/apache2/ssl/ && \
        mkdir -p /var/lib/zmeventnotification/images && \
        chown -R www-data:www-data /var/lib/zmeventnotification/ && \
        chmod -R +x /etc/my_init.d/ && \
        cp -p /etc/zm/zm.conf /root/zm.conf && \
        echo "#!/bin/sh\n\n/usr/bin/zmaudit.pl -f" >> /etc/cron.weekly/zmaudit && \
        chmod +x /etc/cron.weekly/zmaudit

RUN     rm -rf /tmp/* /var/tmp/* && \
        chmod +x /etc/my_init.d/*.sh

VOLUME \
        ["/config"] \
        ["/var/cache/zoneminder"]

EXPOSE 80 443 9000

CMD ["/sbin/my_init"]

Step 4: Build the modified dockerfile
docker build -t zoneminder -f Dockerfile.nvidia .

Step 5: Now run it:
Note the --gpus option, you need nvidia-docker set up. See this
Also note since the docker image starts with CuDNN+CUDA, installing face recognition should automatically use the GPU. I turned it off in my experiments just for setup speed.
DATADIR_BASE=/home/pp/fiddle/docker/appdata/Zoneminder
docker run -d --name="Zoneminder" \
--gpus all \
--net="bridge" \
--privileged="true" \
-p 8443:443/tcp \
-p 9990:9000/tcp \
-e TZ="America/New_York" \
-e SHMEM="70%" \
-e PUID="99" \
-e PGID="100" \
-e INSTALL_HOOK="1" \
-e INSTALL_FACE="0" \
-e INSTALL_TINY_YOLO="1" \
-e INSTALL_YOLO="1" \
-v "${DATADIR_BASE}/config":"/config":rw  \
-v "${DATADIR_BASE}/data":"/var/cache/zoneminder":rw  \
zoneminder

Monitor first run progress by monitoring logs:
docker logs -f Zoneminder

Don't rush trying to compile OpenCV. A lot of development tools are setup when this image is first run. Monitor the logs, make sure its done installing all packages
Step 6: Compile OpenCV with GPU support
Once run, ssh into it:
docker exec -it Zoneminder /bin/bash

You can run nvidia-smi and you should see both CUDA (ignore the version it reports: ref) and your GPU. If not, boo. You're screwed. Don't pass Go.
Note that the CUDA version shown in nvidia-smi does not indicate the CUDA toolchain/lib version in the container. To make sure you are using the right cuda library, check version using nvcc --version
In my case:
root@a611be0d9502:/# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

If you already have the CPU version of opencv installed, remove it (the default docker setup installs the cpu version)
pip3 uninstall opencv-contrib-python

You can now proceed to install OpenCV as per dlandon's repo. Note that since we are using a pre-packaged CUDA+CuDNN install, you'll have to specify a CUDA_ARCH_BIN. Go here to get the value (you are looking for the number under "Compute Capability" for your GPU card)
I've not listed exact commands, look at his opencv.sh script and directly skip to the part where the support libraries of openCV are installed (ignore all the cuda/cudnn stuff as we already have it)
My modified cmake command was: (note the extra CUDA_ARCH_BIN)
cmake -v -D CMAKE_BUILD_TYPE=RELEASE \
	-D CMAKE_INSTALL_PREFIX=/usr/local \
	-D INSTALL_PYTHON_EXAMPLES=OFF \
	-D INSTALL_C_EXAMPLES=OFF \
	-D OPENCV_ENABLE_NONFREE=ON \
	-D WITH_CUDA=ON \
	-D WITH_CUDNN=ON \
	-D OPENCV_DNN_CUDA=ON \
	-D CUDA_ARCH_BIN=6.1 \
	-D ENABLE_FAST_MATH=1 \
	-D CUDA_FAST_MATH=1 \
	-D WITH_CUBLAS=1 \
	-D OPENCV_EXTRA_MODULES_PATH=/<path>/<to>/opencv_contrib/modules/ \
	-D HAVE_opencv_python3=ON \
	-D PYTHON_EXECUTABLE=/usr/bin/python3 \
	-D BUILD_EXAMPLES=OFF ..
        make
        make install

Step 7: Test GPU and cv2
Compilation can work, but things can break if you actually try using the GPU.
root@5d4231e625c3:~# python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> print(cv2.getBuildInformation())

This should print out gobs of info including: (your arch/ver may be different)

  NVIDIA CUDA:                   YES (ver 10.1, CUFFT CUBLAS FAST_MATH)
    NVIDIA GPU arch:             61
    NVIDIA PTX archs:

  cuDNN:                         YES (ver 7.6.5)

While the message only means OpenCV was compiled with CUDA/CuDNN, which we already know, now we also know importing cv2 doesn't segfault. Test 1 passed.
Now check DNN GPU support:
import cv2
config_file_abs_path='/var/lib/zmeventnotification/models/yolov3/yolov3.cfg'
weights_file_abs_path='/var/lib/zmeventnotification/models/yolov3/yolov3.weights'
net = cv2.dnn.readNet(weights_file_abs_path,config_file_abs_path)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
input ("Now do nvidia-smi and you should see this process consume GPU memory")
print ("Bye\n")

If you get errors, for example error: (-217:Gpu API call) system has unsupported display driver / cuda driver combination in function 'getCudaEnabledDeviceCount' you used the wrong cuda version. You need to match your cuda version to the GPU card version.
If everything runs fine and you see your GPU memory being consumed, test 2 passed. You're all done.