Smarker/CNTK_FasterRCNN_Linux_CUDA_Setup.md

## CNTK_FasterRCNN_Linux_CUDA_Setup.md

      
    Raw
  

              CNTK_FasterRCNN_Linux_CUDA_Setup.md
            
          
    Run CNTK Faster R-CNN on Your Own Images with Ubuntu 16.04 and CUDA 8.0

We will be referring to the tutorial at Microsoft CNTK Docs.
📹 Download the cntk-setup video or view a slightly lower quality version here.
Requirements


Ubuntu 16.04
CNTK 2.2 (GPU) docker image - This image will be ~9 GB, so make sure you have at least that, plus space for your images. - in progress
Standard NC6 (6 vcpus, 56 GB memory) gpu on Azure (or CPU)
Docker
Nvidia-Docker
CUDA 8.0
CuDNN 6.0
Python 3.6

Build and Run the Container with nvidia-docker

nvidia-docker build -t <path> .
nvidia-docker run -it <container id> bash
nvidia-docker start <container id>

To see if CNTK is installed:
python -c "import cntk; print(cntk.__version__)"

Verify GPU works in container by running:
nvidia-smi

If you see No running processes found (https://devtalk.nvidia.com/default/topic/539632/k20-with-high-utilization-but-no-compute-processes-/)
sudo nvidia-persistenced --persistence-mode

Check what version of CUDA you have installed:
nvcc -V

If this command is not found then you can refer to this github resource:
sudo apt-get nvidia-375 nvidia-modprobe

Get Ready to Run FastRCNN on a Custom Dataset


Make sure you have set up all the requirements above
Follow the FasterRCNN guide to run CNTK with your own images or look at the summary below:

Summary of the FasterRCNN guide


Prepare your image data by annotating it with bounding boxes (I would recommend the VOTT tagging tool)
Store your custom images in Examples/Image/DataSets/<custom images directory>
Download the AlexNet model from Examples/Image/Detection/FastRCNN

python install_data_and_model.py


Edit CNTK/Examples/Image/Detection/utils/annotations/annotations_helper.py.

Change from the default Grocery Image Data Set
data_set_path = os.path.join(abs_path, "../../../DataSets/Grocery")

To Your Custom Image Data Set
data_set_path = os.path.join(abs_path, "../../../DataSets/<custom images directory>")


Run Examples/Image/Detection/utils/annotations/annotations_helper.py

python annotations_helper.py


Create a configuration file for your own dataset in Examples/Image/Detection/utils/configs called <custom image dataset name>_config.py.

Edit these parameters:
__C.DATA.DATASET
__C.DATA.MAP_FILE_PATH
__C.DATA.NUM_TRAIN_IMAGES
__C.DATA.NUM_TEST_IMAGES
Example Data Set Config: CustomImageDataSet_config.py

# data set config
__C.DATA.DATASET = <custom image dataset name>
__C.DATA.MAP_FILE_PATH = "/cntk/Examples/Image/DataSets/<Custom image data folder>"
__C.DATA.CLASS_MAP_FILE = "class_map.txt"
__C.DATA.TRAIN_MAP_FILE = "train_img_file.txt"
__C.DATA.TRAIN_ROI_FILE = "train_roi_file.txt"
__C.DATA.TEST_MAP_FILE = "test_img_file.txt"
__C.DATA.TEST_ROI_FILE = "test_roi_file.txt"
__C.DATA.NUM_TRAIN_IMAGES = <number of images to train>
__C.DATA.NUM_TEST_IMAGES = <number of images to test>
__C.DATA.PROPOSAL_LAYER_SCALES = [4, 8, 12]


Change the dataset_cfg in get_configuration() method of CNTK/Examples/Image/Detection/FasterRCNN/run_faster_rcnn.py to

from utils.configs.<custom image dataset name>_config import cfg as dataset_cfg


Edit CNTK/Examples/Image/Detection/FasterRCNN/FasterRCNN_config.py. (More details here)

__C.CNTK.MAKE_MODE = False
__C.CNTK.DEBUG_OUTPUT = True
__C.VISUALIZE_RESULTS = True
__C.USE_GPU_NMS = True
__C.RESULTS_NMS_CONF_THRESHOLD = 0.82

Train and Test with FastRCNN


Make sure to have followed all steps in Get Ready to Run FastRCNN on a Custom Dataset
Make sure that MAKE_MODE is False in CNTK/Examples/Image/Detection/FasterRCNN/FasterRCNN_config.py

__C.CNTK.MAKE_MODE = False

python run_faster_rcnn.py

Test with FastRCNN and a Pretrained Model


Make sure you obtained a trained model by following the steps in Train and Test with FastRCNN
Edit CNTK/Examples/Image/Detection/FasterRCNN/FasterRCNN_config.py to skip testing. If MAKE_MODE is set to True, training will be skipped if a trained model already exists.

__C.CNTK.MAKE_MODE = True


Edit Examples/Image/Detection/utils/configs/<custom image dataset name>_config.py to point to your trained model path. In this case, the trained model is faster_rcnn_eval_AlexNet_e2e.model.

__C.DATA.MODEL_PATH="/cntk/Examples/Image/Detection/FasterRCNN/Output/faster_rcnn_eval_AlexNet_e2e.model"

python run_faster_rcnn.py

Common Issues Faced when Running CNTK


Could not load uvm kernel module. is nvidia-modprobe installed?

sudo apt-get nvidia-375 nvidia-modprobe 


error looking up volume plugin nvidia-docker

sudo nvidia-docker-plugin


no module named cython_modules

Use python 3.4. The versions that are currently contained in the repository are Python 3.5 for Windows and only Python 3.4 for Linux, all 64 bit. If you need a different version you can compile it following the steps described at
Linux: https://github.com/rbgirshick/py-faster-rcnn
There are CNTK docker images for Python 3.5+, but those images only work unless you use Python 3.4. If you don't use Python 3.4,  you will get tons of errors with the .so files -- Supposedly they added the cython dependencies for linux python 3.5 and 3.6, but we still get errors when running other versions of python other than 3.4. We were using the latest CNTK docker image 2.1-gpu-python3.5-cuda8.0-cudnn6.0

ImportError: libSM.so.6: cannot open shared object file

apt-get install libqt4-core libqt4-dev libqt4-gui qt4-dev-tools


libnvidia-ml.so.1: cannot open shared object file: No such file or directory

This error comes from missing NVIDIA libraries in your docker container. Use nvidia-docker to run the container, and verify GPU works in container by running nvidia-smi first.

ImportError: No module named 'PIL'

pip install image

 File "/cntk/Examples/Image/Detection/FasterRCNN/../utils/plot_helpers.py", line 145, in plot_test_set_results
    img_path = img_file_names[i]
IndexError: list index out of range

Make sure the number of test images in your config files are correct! Namely,
/cntk/Examples/Image/Detection/utils/configs we had created a custom Reverb_config.py - verify that the value of __C.DATA.NUM_TEST_IMAGES matches the test_img_file.txt in /cntk/Examples/Image/DataSets/Reverb/labelled-guitars (labelled-guitars is in the folder structure specified by CNTK with positive, negative, etc.) which was generated by the annotations_helper.py script.

File "/cntk/Examples/Image/Detection/FasterRCNN/../utils/plot_helpers.py", line 134, in plot_test_set_results from matplotlib.pyplot import imsave it's due to python import matplotlib.pyplot not working. To get it to work, make sure matplotlib is updated. I had an issue where matplotlib was stuck at an older version, 1.5.0`:

conda install pyqt
pip uninstall -y matplotlib && pip install -U matplotlib

FasterRCNN CNTK 2.1 Issues to Propose to CNTK Team


CNTK examples in 2.1 does not match the CNTK guide, which only covers CNTK 2.1
CNTK FasterRCNN config file should be cleaned up on CNTK 2.1
CNTK 2.2 uses a certain format for annotated source images, CNTK 2.1 does NOT use annotated images, so we have no way of knowing how to set up our images to get CNTK running on 2.1
No makefile in CNTK 2.1

FasterRCNN CNTK 2.2 Issues to Propose to CNTK Team


Dockerfiles updates to support Python 3.5+ on linux - the binaries for cython_modules only come prepackaged for windows builds not linux builds.
Dockerfiles updates to include all the required pip packages -- we had to manually install pip packages
Cleanup CNTK libs so loading your own data is taken from the environment: Change the folder in that script to your data folder after storing your images in the described folder structure and annotating them please run python Examples/Image/Detection/utils/annotations/annotations_helper.py
Cleanup CNTK libs so loading your own data is taken from the environment: Change the dataset_cfg in the get_configuration() method of run_faster_rcnn.py to from utils.configs.MyDataSet_config import cfg as dataset_cfg
CNTK forces you to have at least one image saved in the training data folders (negative and positive), even though you may only want to test on images given a pre-trained model. - If you have no images in training, it will fail and not include the correct class labels in class_map.txt necessary to run annotations_helper.py
Remove the specification of the number of training images and the number of test images config file. CNTK should handle any number of training/test images you provide.
Update Readme for CNTK 2.2. It should mention that an Output folder is created after training and what we should expect to see after training (explain more about what new files are generated after training)
Printed progress status stays at 0.0% even though it is training images
Output descriptive error messages. When I tried to run CNTK with different test images:

Evaluating Faster R-CNN model for 3 images.
Traceback (most recent call last):
  File "run_faster_rcnn.py", line 34, in <module>
    eval_results = compute_test_set_aps(trained_model, cfg)
  File "/cntk/Examples/Image/Detection/FasterRCNN/FasterRCNN_eval.py", line 86, in compute_test_set_aps
    mb_data = minibatch_source.next_minibatch(1, input_map=input_map)
  File "/cntk/Examples/Image/Detection/FasterRCNN/../utils/od_mb_source.py", line 70, in next_minibatch
    img_data, roi_data, img_dims, proposals, label_targets, bbox_targets, bbox_inside_weights = self.od_reader.get_next_input()
  File "/cntk/Examples/Image/Detection/FasterRCNN/../utils/od_reader.py", line 57, in get_next_input
    index = self._get_next_image_index()
  File "/cntk/Examples/Image/Detection/FasterRCNN/../utils/od_reader.py", line 206, in _get_next_image_index
    next_image_index = self._reading_order[self._reading_index]
IndexError: index 0 is out of bounds for axis 0 with size 0


## cntk-setup.mp4

      
    Raw
  

              cntk-setup.mp4
            
          
      This file has been truncated, but you can view the full file.
    

            View raw