YimianDai/YOLOV3TargetMerger.md

## YOLOV3TargetMerger.md

      
    Raw
  

              YOLOV3TargetMerger.md
            
          
    目录


概述
代码解读

2.1 YOLOV3TargetMerger
2.2 YOLOV3DynamicTargetGeneratorSimple


1. 概述

YOLOV3TargetMerger 这个类的实例是怎么被创建的? 是在 YOLOV3 中被创建的, 如下所示, 其中 ignore_iou_thresh 的默认值是 0.7
        if pos_iou_thresh >= 1:
            self._target_generator = YOLOV3TargetMerger(len(classes), ignore_iou_thresh)
        else:
            raise NotImplementedError(
                "pos_iou_thresh({}) < 1.0 is not implemented!".format(pos_iou_thresh))

ignore_iou_thresh 的默认值是 0.7

在 YOLOV3 的 hybrid_forward 中被调用:
                all_targets = self._target_generator(box_preds, *args)
2. 代码解读

2.1 YOLOV3TargetMerger

class YOLOV3TargetMerger(gluon.HybridBlock):
    """YOLOV3 target merger that merges the prefetched targets and dynamic targets.

    Parameters
    ----------
    num_class : int
        Number of foreground classes.
    ignore_iou_thresh : float
        Anchors that has IOU in `range(ignore_iou_thresh, pos_iou_thresh)` don't get
        penalized of objectness score.

    """
    def __init__(self, num_class, ignore_iou_thresh, **kwargs):
        super(YOLOV3TargetMerger, self).__init__(**kwargs)
        self._num_class = num_class
        self._dynamic_target = YOLOV3DynamicTargetGeneratorSimple(num_class, ignore_iou_thresh)
        self._label_smooth = False    

Anchors 的 IOU 介于 ignore_iou_thresh (0.7) 和 pos_iou_thresh (1.0) 之间的 objectness 不会被惩罚


    def hybrid_forward(self, F, box_preds, gt_boxes, obj_t, centers_t, scales_t, weights_t, clas_t):
输入:

box_preds 是 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 4) 的 mx.ndarray
gt_boxes 是 (B, M_max, 4) 的 mxnet.ndarray，是 [xmin, ymin, xmax, ymax] 的 Corner 编码
obj_t 是 (B, H_3 x W_3 x 3 + H_2 x W_2 x 3 + H_1 x W_1 x 3, 1) 的 mx.ndarray, 在不用 mixup 的情况下，匹配 anchor 的数值为 1
centers_t 是 (B, H_3 x W_3 x 3 + H_2 x W_2 x 3 + H_1 x W_1 x 3, 2) 的 mx.ndarray
scales_t 是 (B, H_3 x W_3 x 3 + H_2 x W_2 x 3 + H_1 x W_1 x 3, 2) 的 mx.ndarray
weights_t 是 (B, H_3 x W_3 x 3 + H_2 x W_2 x 3 + H_1 x W_1 x 3, 2) 的 mx.ndarray
clas_t 是 (B, H_3 x W_3 x 3 + H_2 x W_2 x 3 + H_1 x W_1 x 3, num_class) 的 mx.ndarray

        with autograd.pause():
            dynamic_t = self._dynamic_target(box_preds, gt_boxes)

with autograd.pause(): 里面的代码不会去计算梯度
all_targets 是一个 tuple, (objness_t, class_t, scale_t, weight_t, class_t), 具体内容是:

objness_t 是 (B, N, 1) 的 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
center_t 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
scale_t 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
weight_t 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
class_t 是一个 (B, N, num_class) 的 mx.ndarray, 数值都是 -1, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors


            # use fixed target to override dynamic targets
            obj, centers, scales, weights, clas = zip(
                dynamic_t, [obj_t, centers_t, scales_t, weights_t, clas_t])

dynamic_t 这个 Tuple 是由 YOLOV3DynamicTargetGeneratorSimple 产生的 (objness_t, center_t, scale_t, weight_t, class_t)
[obj_t, centers_t, scales_t, weights_t, clas_t] 是被调用时传入的 List
zip 函数的作用是将彼此对应的 mx.ndarray 打包成一个 List, 具体如下:

obj 是一个 List of mx.ndarray, 第一个元素是 YOLOV3DynamicTargetGeneratorSimple 产生的 objness_t, 第二个是传入的 obj_t
centers 是一个 List of mx.ndarray, 第一个元素是 YOLOV3DynamicTargetGeneratorSimple 产生的 centers_t, 第二个是传入的 centers_t
scales 是一个 List of mx.ndarray, 第一个元素是 YOLOV3DynamicTargetGeneratorSimple 产生的 scale_t, 第二个是传入的 scales_t
weights 是一个 List of mx.ndarray, 第一个元素是 YOLOV3DynamicTargetGeneratorSimple 产生的 weight_t, 第二个是传入的 weights_t
clas 是一个 List of mx.ndarray, 第一个元素是 YOLOV3DynamicTargetGeneratorSimple 产生的 class_t, 第二个是传入的 clas_t


            mask = obj[1] > 0
            objectness = F.where(mask, obj[1], obj[0])

obj[0] 是 YOLOV3DynamicTargetGeneratorSimple 产生的 (B, N, 1) 的 mx.ndarray, 是由 box_preds 和 gt_boxes 计算 iou (B, N, M) 后, 取出每个 pred box 对应最大的 iou 数值得到 (B, N, 1) 的 mx.ndarray, 然后比较 self._ignore_iou_thresh (默认是 0.7), 如果 iou 数值大于这个阈值就将其设置为 -1, 其他为 0
obj[1] 是 YOLOV3PrefetchTargetGenerator 产生的 objectness, 是由预设的 anchors 和 gt_boxes 计算 iou 后, 取出每个 anchor 对应对应最大的 iou 数值得到 (B, N, 1) 的 mx.ndarray, 对应的 anchor 设置为 1, 其余都为 0
mask 是以 obj[1] 为标准, 如果 obj[1] 中的数值为 1 或者 大于 0, 那么 mask 的数值就为 1, 否则就为 0
objectness 也就是说 当 YOLOV3PrefetchTargetGenerator 产生的 objectness 大于 0 时, 就选用 YOLOV3PrefetchTargetGenerator 的 (anchors 和 gt_boxes), 其余就采用 YOLOV3DynamicTargetGeneratorSimple 产生的 (box_preds 和 gt_boxes), 这种做法其实就是注释上写得 use fixed target to override dynamic targets, 对于 YOLOV3PrefetchTargetGenerator 中标记为 1 的覆盖 YOLOV3DynamicTargetGeneratorSimple 产生的对应位置上的数值, 其他还是用 YOLOV3DynamicTargetGeneratorSimple 产生的
objness 中的 1 表示是最匹配的 anchor, 为 0 表示 iou 数值小于 ignore thresh, -1 表示并非 iou 最高但是大于 ignore_iou_thresh 的 anchor, 具体如下图所示


看一下 YOLOv3 论文中对于 anchor 的 objectness 的策略:

This should be 1 if the bounding box prior overlaps a ground truth object by more than any other bounding box prior. (这是 YOLOV3DynamicTargetGeneratorSimple 负责的, ious_max = batch_ious.max(axis=-1, keepdims=True)  # (B, N, 1))
If the bounding box prior is not the best but does overlap a ground truth object by more than some threshold we ignore the prediction, follow faster r-cnn (这也是 YOLOV3DynamicTargetGeneratorSimple 负责的, objness_t = (ious_max > self._ignore_iou_thresh) * -1  # use -1 for ignored 这里把所有大于 thresh 的设置成了 -1 )
Only assigns one bounding box prior for each ground truth object. If a bounding box prior is not assigned to a ground truth object it incurs no loss for coordinate or class predic- tions, only objectness (这部分用 override 来实现, override 保证了 Only assigns one bounding box prior for each ground truth object, 具体哪个 anchor 由 YOLOV3PrefetchTargetGenerator 来决定)

            mask2 = mask.tile(reps=(2,))
            center_targets = F.where(mask2, centers[1], centers[0])
            scale_targets = F.where(mask2, scales[1], scales[0])
            weights = F.where(mask2, weights[1], weights[0])

同理, 将 YOLOV3PrefetchTargetGenerator 中 objness 标记为 1 的数值 override YOLOV3DynamicTargetGeneratorSimple 产生的 center_targets, scale_targets 和 weights 对应位置上的数值

            mask3 = mask.tile(reps=(self._num_class,))
            class_targets = F.where(mask3, clas[1], clas[0])

传入的 clas[1] 是由 YOLOV3PrefetchTargetGenerator 产生的 class_targets, 与 gt box 最匹配的 anchor 是 1, 同一个 feature map 点上的其他尺寸 anchor 是 0, 其余没有背景区域的 anchors 都是 -1
clas[0] 是由 YOLOV3DynamicTargetGeneratorSimple 产生的 class_t, 里面的数值都是 -1
同理, 将 YOLOV3PrefetchTargetGenerator 中 objness 标记为 1 的数值 override YOLOV3DynamicTargetGeneratorSimple 产生的 class_targets 对应位置上的数值, 具体的说是 objness 为 1 的 anchor 为对应的 cls id, 其余全部都为 -1.

            smooth_weight = 1. / self._num_class
            if self._label_smooth:
                smooth_weight = min(1. / self._num_class, 1. / 40)
                class_targets = F.where(
                    class_targets > 0.5, class_targets - smooth_weight, class_targets)
                class_targets = F.where(
                    (class_targets < -0.5) + (class_targets > 0.5),
                    class_targets, F.ones_like(class_targets) * smooth_weight)

self._label_smooth 默认是 False 的, 就暂时先不解读了.

            class_mask = mask.tile(reps=(self._num_class,)) * (class_targets >= 0)
            return [F.stop_gradient(x) for x in [objectness, center_targets, scale_targets,
                                                 weights, class_targets, class_mask]]

class_targets >= 0 这个掩膜跟 mask.tile(reps=(self._num_class,)) 一样啊

输出:

objectness 是 (B, N, 1) 的 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 数值: 1 表示是最匹配的 anchor, 为 0 表示 iou 数值小于 ignore thresh, -1 表示并非 iou 最高但是大于 ignore_iou_thresh 的 anchor
center_targets 是一个 (B, N, 2) 的 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 数值: 在 objness 为 1 的 anchor 上数值为在原图上 gt box 中心与所属 cell 左上角的归一化距离 (以 cell 长或宽的归一化距离), 在 objness 不为 0 的 anchor 上的数值为 0
scale_targets 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
weights 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
class_targets 是一个 (B, N, num_class) 的 mx.ndarray, 数值都是 -1, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
class_mask 是 (B, N, num_class) 的 mx.ndarray, 匹配 anchor 上的数值为 1, 其余为 0, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors

2.2 YOLOV3DynamicTargetGeneratorSimple

    def hybrid_forward(self, F, box_preds, gt_boxes):
输入:

box_preds 是 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 4) 的 mx.ndarray
gt_boxes 是 (B, M, 4) 的 mxnet.ndarray，是 [xmin, ymin, xmax, ymax] 的 Corner 编码

        with autograd.pause():
            box_preds = box_preds.reshape((0, -1, 4))
            objness_t = F.zeros_like(box_preds.slice_axis(axis=-1, begin=0, end=1))     
            center_t = F.zeros_like(box_preds.slice_axis(axis=-1, begin=0, end=2))        
            scale_t = F.zeros_like(box_preds.slice_axis(axis=-1, begin=0, end=2))
            weight_t = F.zeros_like(box_preds.slice_axis(axis=-1, begin=0, end=2))
            class_t = F.ones_like(objness_t.tile(reps=(self._num_class))) * -1    
            batch_ious = self._batch_iou(box_preds, gt_boxes)  # (B, N, M)
            ious_max = batch_ious.max(axis=-1, keepdims=True)  # (B, N, 1)            
            objness_t = (ious_max > self._ignore_iou_thresh) * -1  # use -1 for ignored            
        return objness_t, center_t, scale_t, weight_t, class_t            

reshape 之后的 box_preds 还是 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 4) 的 mx.ndarray
objness_t 是一个 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 1) 的 全零 mx.ndarray
center_t 是一个 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 2) 的 全零 mx.ndarray
scale_t 是一个 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 2) 的 全零 mx.ndarray
weight_t 是一个 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 2) 的 全零 mx.ndarray
class_t 是一个 (B, H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, num_class) 的 mx.ndarray, 数值都是 -1
batch_ious 后得到的是 (B, N, M) 的 mx.ndarray, N 是 H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
ious_max 是 (B, N, 1) 的 mx.ndarray
objness_t 是 (B, N, 1) 的 mx.ndarray, self._ignore_iou_thresh 的默认值是 0.7, 大于 0.7 的会被设置成 -1, 代表 ignore,  Anchors that has IOU in range(ignore_iou_thresh, pos_iou_thresh) don't get penalized of objectness score, ignore 表示这个 anchor 不会被受惩罚

整理一下, 输出:

objness_t 是 (B, N, 1) 的 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors, 大于 ignore_iou_thresh 的为 -1, 其余全为 0
center_t 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
scale_t 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
weight_t 是一个 (B, N, 2) 的 全零 mx.ndarray, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors
class_t 是一个 (B, N, num_class) 的 mx.ndarray, 数值全部都是 -1, N = H_1 x W_1 x num_anchors + ... + H_3 x W_3 x num_anchors