YimianDai/Previewer-for-Multi-Scale-Object-Detector.md

## Previewer-for-Multi-Scale-Object-Detector.md

      
    Raw
  

              Previewer-for-Multi-Scale-Object-Detector.md
            
          
    small-size object 性能不好表现在会有很多 small-size false positives, 背后的根源在于 the inadequacy of low-level features
the inadequacy of low-level features 具体又表现为: small receptive field sizes 和 weak semantic capabilities
这篇论文的贡献在于 demonstrates independent predictions from different feature layers on the same region is beneficial for reducing false positives.
We propose a novel light-weight previewer block, which previews the objectness probability for the potential regression region of each prior box, using the stronger features with larger receptive fields and more contextual information for better predictions.
The lack of contextual information leads to unsatisfactory performance of multi-scale detectors on detecting small objects.
useful contextual information is missed due to the weakness and small receptive field sizes of low-level features.
The lack of contextual information leads to unsatisfactory performance of multi-scale detectors on detecting small objects.
Moreover, the number of small object priors is large (accounts for over 66% of the whole prior boxes) due to the large resolution of low-level feature layers.
为什么 small object 会是瓶颈?

首先是 small object 通常对应的层数浅语义不强, 感受野小上下文信息不足
small object 对应的 anchor 数目本身就巨大 (over 66% of the whole prior boxes), 因为浅层的 feature map 更大, 因此 most of false positives tend to lie on small priors in multi-scale detectors

提高 small object detection 的路子:

增强语义

Top-down architecture 也就是 Feature Pyramid 将 high-level feature map 的信息传递给 low-level feature map,
incorporating contextual information


本文具体的做法是 explicitly and independently predict all object prior boxes twice with different receptive field sizes by leveraging the CNN’s feature hierarchy, 具体流程为:

we first preview the objectness probabilities for the potential regression regions of prior boxes on the deeper feature layers with sufficiently larger receptive fields that involve enough contextual information.
Then we further classify and relocate them on the layers/receptive fields in conventional settings.

哪些论文认为 contextual information indeed plays a great role in object detection, especially for small objects
[6] Xinlei Chen and Abhinav Gupta. 2017. Spatial memory for context reasoning in object detection. In ICCV. IEEE, 2980–2988.
[20] Peiyun Hu and Deva Ramanan. 2017. Finding tiny faces. In CVPR. IEEE, 1522–1530.
[27] Guo-JunQi.2016.Hierarchicallygateddeepnetworksforsemanticsegmentation. In CVPR. 2267–2275.
[28] Guo-Jun Qi, Xian-Sheng Hua, Yong Rui, Jinhui Tang, and Hong-Jiang Zhang. 2010. Image classification with kernelized spatial-context. IEEE TMM 12, 4 (2010), 278–287.
用 Top-down architecture 来增强低层特征图语义的论文
[23] Tsung-YiLin,PiotrDollár,RossGirshick,KaimingHe,BharathHariharan,and Serge Belongie. 2017. Feature pyramid networks for object detection. In CVPR, Vol. 1. 4.
[12] Cheng-YangFu,WeiLiu,AnanthRanga,AmbrishTyagi,andAlexanderCBerg. 2017. DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017).
[21] TaoKong,FuchunSun,AnbangYao,HuapingLiu,MingLu,andYurongChen. 2017. RON: Reverse connection with objectness prior networks for object detec- tion. In CVPR, Vol. 1. 2.
[24] Songtao Liu, Di Huang, and Yunhong Wang. 2017. Receptive Field Block Net for Accurate and Fast Object Detection. arXiv preprint arXiv:1711.07767 (2017).
[36] Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, and Abhinav Gupta. 2016. Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016).
论文说 Top-down feature fusion mechanism 的缺点是 it is time consuming due to the heavy structure, 但我不觉得, 跟 image Pyramid 比, Feature Pyramid 计算量小多了

Feature pyramid networks for object detection, CVPR 2017.

大名鼎鼎的 MS-CNN 是指这篇文章
[3] Zhaowei Cai, Quanfu Fan, Rogerio Feris, and Nuno Vasconcelos. 2016. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. In ECCV.
本文中的 Object Detection 小历史

R-CNN 最早将 CNN 引入 Object Detection (我有点存疑, OverFeat 貌似好像更早点? 记不清了, 但 OverFeat 应该是最早的 One-Stage 应该没错)
Faster R-CNN 中的 region proposal network (RPN) and prior box (anchors) 的想法来自于 MultiBox [39] 这篇文章

SSD 的 anchor 策略是 对于不同 feature layers (receptive field 不同) 分配不同 scale 的 anchor, 以此实现 anchor 和 receptive field 的匹配
我觉得这里把 anchor 叫作 prior 很好啊, anchor 的确是我们对目标大小的设想的体现, 的确就是 prior
SSD [25] allocates priors at different scales on differ- ent feature layers to fitly match the sizes of corresponding receptive field