A Single Shot Detector repo, written in Keras. The Confluence page with technical details and design decisions for our implementation of SSD repo can be found here.
- Setup
- User-facing files
- Config files
- Training
- Evaluation
- Running on a video or webcam stream
- Trained models
- Clone this repo and
cd
into the root directory:
git clone https://github.com/mythic-ai/ssd-keras.git
cd ssd-keras
- Install all the packages from the
requirements.txt
file, via:
sudo pip install -r requirements.txt
- Copy all the large files (e.g., the base model .hdf5/.h5 files and dataset .pkl files) that aren't checked in into your repo. These live on Puget1 (192.168.101.113), in the directory
/ssd/single-shot-detector-files/
. You can copy them (or scp them) via:
scp -r <your_username>@192.168.101.113:/ssd/single-shot-detector-files/* your/local/path/to/ssd-keras/
- OPTIONAL: Download the pre-trained networks (if you want evaluate a trained SSD model). See the trained models section for links to these models.
There are four user-facing files you should care about. Instructions for how to use them will be covered in following sections. Those files are:
This is the inference "API" of the SSD model. It is used by several other files. You can supply it with an image and it will return final output predictions (bounding boxes and their associated class labels).
To use ssd_api.py
, you must first initialize an instance of the class in the following manner:
ssd_model = SSD_API(config_path, input_shapes, model_path=<path/to/.h5/file>)
where the parameters are:
model_path
: The path to the saved SSD model weights (an .h5 or .hdf5 file)config_path
: The path to the config file that contains hyperparameters and information about deploying this particular SSD model. Config files are further covered in this section.input_shapes
: The list of input image shapes, in (height, width, num_channels), e.g., (300, 300, 3) that we want to run through the SSD model for every inference.
There is one public function in ssd_api.py
:
def predict(self, img, threshold_output=True)
: This is the external API method to predict boxes for an image or image(s). It resizes each image toSSD_API.input_shapes
(the original input shape and optionally some additional high resolutions) before passing the image through the model.predict()
can take in multiple different types of inputs, and its output varies based on what type its input is:- If
img
is a string, then it represents a path to the single image we want to predict on ==> the returnedpreds
will be a np.ndarray of shape(num_boxes, 4 + num_classes - 1)
. - If
img
is a 3Dnp.ndarray
, then it's an image array of shape(height, width, num_channels)
==> the returnedpreds
will be a np.ndarray of shape(num_boxes, 4 + num_classes - 1)
. - If
img
is a 4Dnp.ndarray
, then it's an images array of shape(num_images, height, width, num_channels)
==> the returnedpreds
is a list of np.ndarrays, each of shape(num_boxes, 4 + num_classes - 1)
. preds has to be a list in this case becausenum_boxes
is different for each image.
- If
Use this file to train a new SSD model, either from scratch, from pre-trained weights (e.g., VGG16 trained on ImageNet), or re-start a previously training instance. For information on how to use this file, see the training section.
Use this file to evaluate a trained SSD model, by plotting the predicted detections and/or evaluate the mean average precision (mAP) on a particular dataset (hopefully its completely held-out test set!) For information on how to use this file, see the evaluation section.
This file runs a pre-trained SSD model on a pre-recorded video (.mp4 file) or on the livestream coming from a connected webcam. It will use any GPU available to increase framerate. It plots the color-coded predicted bounding boxes and their class labels, above a certain confidence threshold (e.g., 0.6). For information on how to use this file, see the video and webcam section.
Config files contain the hyperparameters and information about deploying a particular SSD model. They are used for both training and inference. See an example of a config file here.
Config files are all stored in the configs/
directory. Beneath that, there is the following directory structure:
--- configs
|---<dataset_name>, e.g, pascal_voc/
|--- ssd<image_size>_<base_model>_config.py, e.g., "ssd300_vgg16_config.py"
The values in a config file are fairly self-explanatory, are detailed with comments, and follow most parameters in the Caffe .prototxt files. My suggestion for making a new config file (e.g., for a new dataset, in order to tweak hyperparameters, or even use a new base model) is to copy the configs/pascal_voc/ssd300_vgg16_config.py
file and modify it.
To train, run the command:
python ssd_training.py
There are a number of optional arguments to this simple command:
--num_gpus
: an optional parameter to specify the number of GPUs for training. It defaults to 1 if unspecified.--config
: an optional parameter to specify which model config file to use for training. It defaults toconfigs/pascal_voc/ssd300_vgg16_config.py
, which is the config file for SSD300 trained on PASCAL VOC 2007+12 trainval and evaluated on PASCAL VOC 2007 test, with VGG16 as its base model.--saved_model_weights
: an optional parameter to specify the path to the saved model weights to initialize training from, if any. Note: this is NOT the weights of the base model (e.g., VGG16), which are specified in the config file and are always used if --saved_model_weights is not specified.)--freeze_layers
: an optional list of names of network layers whose weights to freeze. It is None by default (i.e., do not freeze any layers). You can specify layers by name, e.g.,--freeze_layers conv1_1 conv1_2 conv2_1 conv2_2
.
To evaluate a model, run the command:
python ssd_eval.py
There are a number of optional arguments to this simple command:
--config
: an optional parameter to specify which model config file to use for training. It defaults toconfigs/pascal_voc/ssd300_vgg16_config.py
, which is the config file for SSD300 trained on PASCAL VOC 2007+12 trainval and evaluated on PASCAL VOC 2007 test, with VGG16 as its base model.--saved_model_weights
: an optional parameter to specify the path to the saved model weights to initialize training from, if any. Note: this is NOT the weights of the base model (e.g., VGG16), which are specified in the config file and are always used if --saved_model_weights is not specified.)calculate_map
: A boolean that, if True, means that we will calculate the mean average precision over the dataset.--plot_images
: A boolean that, if True, means that we will plot predicted bounding boxes and class labels.--plot_gt
: A boolean that, if True, means that we will plot ground-truth bounding boxes and class labels.--accepted_class_ids
: The optional list of object class IDs (e.g.,--accepted_class_ids 1 4 15
) to output and plot. If this parameter is unspecified, the program will plot all object classes in the dataset.--`highres_input_shape
: The optional, additional high-resolution input shape (e.g.,--highres_input_shape 700 700
) at which to resize the input image to and pass through the model. If this parameter is unspecified, the program will only evaluate the image at is original resolution (usually (300, 300)). If only one integer is provided, the program will resize the input image to a square image at this resolution. If two integers are provided, they will be interpreted as width first and then height.--test_images_dir
: The optional path containing the test images to plot. You must specify one of--test_images_dir
or--test_dataset_name
. NOTE: if you specify this argument (and not--test_dataset_name
), then you cannot evaluate the mAP on these images because you are essentially saying that these images do not have ground-truth labels.--test_dataset_name
: The optional name of the dataset whose test images you wish to evaluate the mAP and/or plot. It must be one of the members ofdataset_utils.dataset_names.DatasetName
(an Enum). NOTE: if you use this argument (and not--test_images_dir
), then the dataset being specified must have a correspondingpkl_files/<gt_file.pkl>
.--num_images
: A integer that, if specified, the program will only evaluate the first<num_images>
images in the dataset.
To run an SSD model using a pre-recorded video (.mp4 file), run the command:
python videotest_run.py \
--config=<path_to_config_file> \
--saved_model_weights=<path_to_saved_saved_model_weights> \
--video=<path_to_mp4_file>
To run an SSD model using the live feed from the webcam, run the command:
python videotest_run.py \
--config=<path_to_config_file> \
--saved_model_weights=<path_to_saved_saved_model_weights> \
There are a number of additional optional command-line arguments for either case:
--accepted_class_ids
: The optional list of object class IDs (e.g.,--accepted_class_ids 1 4 15
) to output and plot. If this parameter is unspecified, the program will plot all object classes in the dataset.--highres_input_shape
: The optional, additional high-resolution input shape (e.g.,--highres_input_shape 700 700
) at which to resize the input image to and pass through the model. If this parameter is unspecified, the program will only evaluate the image at is original resolution (usually (300, 300)). If only one integer is provided, the program will resize the input image to a square image at this resolution. If two integers are provided, they will be interpreted as width first and then height.
Below are the best pre-trained SSD models we have to date:
- Udacity self-driving car:
- ssd300v5_vgg16-epoch_190-val_loss_1.69.hdf5 (SSD300 with base model VGG16, trained on one Udacity self-driving car dataset and evaluated on a different Udacity self-driving car dataset).
- PASCAL VOC:
- ssd300v5_vgg16-epoch_220-val_loss_2.31.hdf5 (SSD300 with base model VGG16, trained on PASCAL VOC 2007+12 trainval and evaluated on PASCAL VOC 2007 test).