YashasSamaga/D0_NOTICE.md

## D0_NOTICE.md

      
    Raw
  

              D0_NOTICE.md
            
          
    DISCLAIMER

This gist is unofficial. It was created for personal use but have kept it public in case it would be of use to others. This document is not updated regularly and may not reflect the current status of the CUDA backend.

  
## D1_Requirements.md

      
    Raw
  

              D1_Requirements.md
            
          
    Internal Dependencies

The minimum set of dependencies required to use the CUDA backend in OpenCV DNN is:
cudev
opencv_core
opencv_dnn
opencv_imgproc

You might also require the following to read/write/display images and videos:
opencv_imgcodecs
opencv_highgui
opencv_videoio

You will require the following to run the tests:
opencv_ts
opencv_videoio

You also have to set BUILD_TESTS and BUILD_PERF_TESTS.
External Dependencies

The CUDA backend requires CUDA Toolkit (min: 9.2) and cuDNN (min: 7.5) to be installed on the system. CMake will automatically detect CUDA Toolkit and cuDNN when the following options are set:

WITH_CUDA
WITH_CUDNN

The CUDA backend is enabled by setting the following option:

OPENCV_DNN_CUDA

Running tests


Clone opencv_extra repository
cd opencv_extra/testdata/dnn
python3 download_models.py
cd path/to/opencv/repository
cd build
export OPENCV_TEST_DATA_PATH=/path/to/opencv_extra/testdata
Run bin/opencv_test_dnn
Refer to this guide to use perf tests to compare performance between versions

Usage

The CUDA backend can be selected by choosing one of the following backend/target options:


Backend
Target


DNN_BACKEND_CUDA
DNN_TARGET_CUDA


DNN_BACKEND_CUDA
DNN_TARGET_CUDA_FP16


A CC 5.3+ device is required to use DNN_TARGET_CUDA_FP16. Note that not all CUDA devices offer high FP16 thoughput. Hence, DNN_TARGET_CUDA_FP16 may perform worse than DNN_TARGET_CUDA. You can check if your device supports high FP16 throughput in the CUDA Programming Guide.
Examples


YOLOv4
YOLOv4 mAP evaluation


## D2_SupportMatrix.md

      
    Raw
  

              D2_SupportMatrix.md
            
          
    Support Matrix

The CUDA backend uses OpenCV's CPU backend as a fallback for unsupported layers and partially supported layers with unsupported configurations.


Layer
Status
Note


Slice
✔️


Split
✔️


Concat
✔️


Reshape
✔️


Flatten
✔️


Resize, Interp (nearest neighbor, bilinear)
✔️


CropAndResize
✔️


Convolution 1D
✔️(OpenCV 4.5.2)


Convolution 2D
✔️


Convolution 3D
✔️


Deconvolution 2D
broken


Deconvolution 3D
broken


MaxPooling 1D
✔️ (OpenCV 4.5.2)


MaxPooling 2D
✔️


MaxPooling 3D
✔️


AveragePooling 1D
✔️ (OpenCV 4.5.2)


AveragePooling 2D
✔️


AveragePooling 3D
✔️


MaxPoolingWithIndices 2D
✔️


MaxPoolingWithIndices 3D
✔️


MaxUnpool 2D
✔️


MaxUnpool 3D
✔️


ROI Pooling
✔️


PSROI Pooling
❌


LRN
✔️


InnerProduct (constant weights)
✔️


MatMul (runtime blobs)
✔️ (OpenCV 4.5.3)


Softmax
✔️


LogSoftmax
✔️


MVN
✔️ (OpenCV 4.5.0)


ReLU (with configurable negative slope)
✔️


ReLU6 (with configurable ceil and floor)
✔️


Channelwise Paramteric ReLU
✔️


Sigmoid
✔️


TanH
✔️


Swish
✔️


Mish
✔️


ELU
✔️


BNLL
✔️


Abs
✔️


Power (configurable exp, scale and shift)
✔️


Batch Normalization
✔️


Const
✔️


Crop
✔️


Eltwise (sum, product, div, max)
✔️


Weighted Eltwise (sum)
✔️


Shortcut (sum)
✔️ (OpenCV 4.3.0)


Permute
✔️


ShuffleChannel
✔️


PriorBox
✔️


Reorg
✔️


Region
✔️
scale_xy parameter added in OpenCV 4.4.0


DetectionOutput
✔️ (OpenCV 4.5.0)


Normalization (L1, L2)
✔️


Shift
✔️


Padding (constant padding, reflection101 padding)
✔️


Proposal
❌


Scale
✔️


DataAugmentation
❌


Correlation
❌


Accum
❌


FlowWarp
❌


LSTM Layer
❌


RNN Layer
❌
Backend	Target
`DNN_BACKEND_CUDA`	`DNN_TARGET_CUDA`
`DNN_BACKEND_CUDA`	`DNN_TARGET_CUDA_FP16`
Layer	Status	Note
Slice	✔️
Split	✔️
Concat	✔️
Reshape	✔️
Flatten	✔️
Resize, Interp (nearest neighbor, bilinear)	✔️
CropAndResize	✔️
Convolution 1D	✔️(OpenCV 4.5.2)
Convolution 2D	✔️
Convolution 3D	✔️
Deconvolution 2D	broken
Deconvolution 3D	broken
MaxPooling 1D	✔️ (OpenCV 4.5.2)
MaxPooling 2D	✔️
MaxPooling 3D	✔️
AveragePooling 1D	✔️ (OpenCV 4.5.2)
AveragePooling 2D	✔️
AveragePooling 3D	✔️
MaxPoolingWithIndices 2D	✔️
MaxPoolingWithIndices 3D	✔️
MaxUnpool 2D	✔️
MaxUnpool 3D	✔️
ROI Pooling	✔️
PSROI Pooling	❌
LRN	✔️
InnerProduct (constant weights)	✔️
MatMul (runtime blobs)	✔️ (OpenCV 4.5.3)
Softmax	✔️
LogSoftmax	✔️
MVN	✔️ (OpenCV 4.5.0)
ReLU (with configurable negative slope)	✔️
ReLU6 (with configurable ceil and floor)	✔️
Channelwise Paramteric ReLU	✔️
Sigmoid	✔️
TanH	✔️
Swish	✔️
Mish	✔️
ELU	✔️
BNLL	✔️
Abs	✔️
Power (configurable exp, scale and shift)	✔️
Batch Normalization	✔️
Const	✔️
Crop	✔️
Eltwise (sum, product, div, max)	✔️
Weighted Eltwise (sum)	✔️
Shortcut (sum)	✔️ (OpenCV 4.3.0)
Permute	✔️
ShuffleChannel	✔️
PriorBox	✔️
Reorg	✔️
Region	✔️	`scale_xy` parameter added in OpenCV 4.4.0
DetectionOutput	✔️ (OpenCV 4.5.0)
Normalization (L1, L2)	✔️
Shift	✔️
Padding (constant padding, reflection101 padding)	✔️
Proposal	❌
Scale	✔️
DataAugmentation	❌
Correlation	❌
Accum	❌
FlowWarp	❌
LSTM Layer	❌
RNN Layer	❌