Skip to content

Instantly share code, notes, and snippets.

@iandees
Last active May 30, 2018 19:07
Show Gist options
  • Star 31 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save iandees/f773749c47d088705199 to your computer and use it in GitHub Desktop.
Save iandees/f773749c47d088705199 to your computer and use it in GitHub Desktop.
Detecting Road Signs in Mapillary Images with dlib C++

image

I've been interested in computer vision for a long time, but I haven't had any free time to make any progress until this holiday season. Over Christmas and the New Years I experimented with various methodologies in OpenCV to detect road signs and other objects of interest to OpenStreetMap. After some failed experiments with thresholding and feature detection, the excellent /r/computervision suggested using the dlib C++ module because it has more consistently-good documentation and the pre-built tools are faster.

After a day or two figuring out how to compile the examples, I finally made some progress:

Compiling dlib C++ on a Mac with Homebrew

  1. Clone dlib from Github to your local machine:

    git clone git@github.com:davisking/dlib.git
  2. Install the libjpeg dependency:

    brew install libjpeg
  3. As of this writing, dlib won't compile due to weirdness with the system-installed libjpeg, so the developer suggests modifying line 277 of dlib/CMakeLists.txt to look like this:

    if (JPEG_FOUND AND LIBJPEG_IS_GOOD AND NOT APPLE)
    
  4. Compile the example programs that come with dlib (one of which is the classifier training program):

    mkdir dlib/examples/build
    cd dlib/examples/build
    cmake ..
    cmake --build .
  5. You'll also want to compile the imglab tool so you can mark up images to tell the system what you're searching for:

    mkdir dlib/tools/imglab/build
    cd dlib/tools/imglab/build
    cmake --build .

Train a classifier for road signs

  1. Download at least a dozen images that contain the object you're trying to recognize. For road signs I used Wikimedia commons, Mapillary, or a Google Image Search. Put these images in one directory on your computer. I found that they all had to be converted to JPEG (and I used convert from ImageMagick to do it) for the next step.

  2. Run the imglab tool once to create an XML list of files you downloaded:

    dlib/tools/imglab/build/imglab -c signs.xml Downloads/sign*.jpg
  3. The file that imglab just created is a very simple XML file that lists relative paths for all the images. The next step is to specify where the objects are in the images. Run the imglab tool once more, but this time only specify the XML file you created above:

    dlib/tools/imglab/build/imglab signs.xml
  4. This will open a window via XWindows/XQuartz:

    image

    Now, for each image you hold down Shift and drag a bounding box around the object to detect. As in the screenshot here, I found that selecting the region immediately inside the black border of the sign resulted in a better model. If you accidentally create a bounding box that you didn't want, double click the border of the bounding box and press Delete. There are more interface details in the Help menu.

  5. When you finish highlighting the regions of interest, save the changes (File -> Save) and exit the imglab tool. Your XML file now contains extra markup that specifies the bounding boxes for the objects of interest.

  6. Next, we'll use the XML file you created to train a classifier:

    dlib/examples/build/train_object_detector -tv signs.xml

    This will run some processing tasks to build the model based on your XML file and then it will test the model against the images you gave it. If the model is excellent, it will match 100% of the original bounding boxes. The output below has 100% recall indicated by the 1 1 1:

     Saving trained detector to object_detector.svm
     Testing detector on training data...
     Test detector (precision,recall,AP): 1 1 1
     
     Parameters used:
       threads:                 4
       C:                       1
       eps:                     0.01
       target-size:             5000
       detection window width:  65
       detection window height: 77
       upsample this many times : 0

    Your model is now stored in the object_detector.svm file and can be used to predict the location of similar objects in completely new images.

Detect signs in new images

  1. Find an image that you didn't train with. Run the object detector again with the new image specified as an argument:

    dlib/examples/build/train_object_detector Downloads/new_sign_image.jpg

    This time around the program will use the model you trained to highlight any objects on an XWindows/XQuartz window with the image as a background:

    image

@manasdalal
Copy link

I used the imglab exe to make the file with the boxes. while running the code to build the svm file on certain occasions it fails somewhere so i checked i changed the width and the height to random value it worked but that will increase the chances of misclassifications. How is it the bounding boxes are affecting this process of training?

Theres absolutely no error message the last check point is when it counts the no of images and then the crash

so is there a certain aspect ratio to maintained while drawing the bounding box over the object? because certain occasions the default window size 80 x 80 does not seem to work unless changed to 50 x 50. What features should be common? similar height, width , aspect ratio , area etc..

@lordsutch
Copy link

A few issues in the dlib compilation instructions for OS X:

In step 2, cmake should also be installed along with libjpeg.
You also will need XQuartz from http://xquartz.macosforge.org/ if you don't already have X installed.
Also, in step 5, "cmake .." is required before "cmake --build ."

@lordsutch
Copy link

Now I have the tools working, one practical note: you need a lot of high resolution pictures to make this work which makes it fairly problematic for improving the speed tagging in OSM compared to the old "mark waypoints on the GPS at speed limit changes and take notes" approach.

Using my Garmin Virb Elite my success rate at getting interval photos at its maximum rate (30 frames/minute) with 16 megapixel images that have enough resolution to reliably find a speed limit sign in images taken at anything over 30 mph is hit or miss at best. Picture 1 will often be too small and you've blown by the sign by the time picture 2 shows up. Alas it won't go faster than 30 frames/minute (1/2 fps) except in video mode, which is limited to 1080p (2 MP) but will time-lapse up to 2 fps.

My dashcam will get me the frame rate (up to 30 fps) but not the image quality (1-2 MP, and subjectively much worse than the Virb even in video mode). Plus it's a ton of image files to deal with and I'd have to correct the lens distortion adding extra processing to the mix.

My only other ideas are to train the classifier some more with crappier pictures or pointing the camera off-axis. Then I'll have to hack on train_object_detector to have do batch output rather than being interactive (should be easy enough) to make it a more practical tool.

Edit: after playing around a bit more, I've found that upsampling the images (using the -u command line parameter) from either the Virb or the dashcam improves the recognition rate at a distance substantially. So apparently the sign images don't have to be quite as spectacular as I thought and thus my initial pessimism may not be justified. 😄

@lordsutch
Copy link

FYI, I now have a working (but slow) implementation using the Python interface to dlib at https://github.com/lordsutch/MUTCDSpeedSigns

@olympum
Copy link

olympum commented Jul 23, 2015

thanks a bunch for the guided steps. in the compilation instructions #5 for building the imglab tool, i think you are missing cmake .. before make --build .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment