YimianDai/Class-SSD.md

## Class-SSD.md

      
    Raw
  

              Class-SSD.md
            
          
__init__

            self.cls_decoder = MultiPerClassDecoder(len(self.classes) + 1, thresh=0.01)
这个输出跟 training loss 没关系，不会影响 training
hybrid_forward

    def hybrid_forward(self, F, x):
        """Hybrid forward"""
        features = self.features(x)

得到的 features 是 List, 里面的每个元素都是 (B, C, H_i, W_i) 的 feature cube

        cls_preds = [F.flatten(F.transpose(cp(feat), (0, 2, 3, 1)))
                     for feat, cp in zip(features, self.class_predictors)]

这里的 cp(feat) 是每层 ConvPredictor 的输出，(B, num_anchors x (fg_cls + 1), H_i, W_i)，只不过 H 和 W 每层因为下采样的关系不一样
F.transpose(cp(feat), (0, 2, 3, 1)) 将其变换成 (B, H_i, W_i, num_anchors x (fg_cls + 1))
F.flatten 之后将其变为 (B, H_i x W_i x num_anchors x (fg_cls + 1))
最后得到的 cls_preds 是一个 List，里面的每个元素都是 (N, H_i x W_i x num_anchors x (fg_cls + 1)) 矩阵，i 表示第 i 层输出

        box_preds = [F.flatten(F.transpose(bp(feat), (0, 2, 3, 1)))
                     for feat, bp in zip(features, self.box_predictors)]

每一个 feat 是 (B, C, H_i, W_i) 的 MXNet.NDArray, bp(feat) 是 (B, num_anchors x 4, H_i, W_i) 的 MXNet.NDArray, F.transpose(bp(feat), (0, 2, 3, 1)) 则是  (B, H_i, W_i, num_anchors x 4) 的 MXNet.NDArray, 再经过 flatten 后得到的是 (B, H_i x W_i x num_anchors x 4) 的 MXNet.NDArray
因此，最后得到的 box_preds 是一个 List，里面的每个元素都是 (B, H_i x W_i x num_anchors x 4) 矩阵，i 表示第 i 层输出

        anchors = [F.reshape(ag(feat), shape=(1, -1))
                   for feat, ag in zip(features, self.anchor_generators)]

输入的 features 是 List, 里面的每个元素都是 (B, C, H_i, W_i) 的 feature cube
ag(feat) 输出的是 (1, H_i x W_i x num_anchors, 4) 的 MXNet.NDArray
经过 F.reshape(ag(feat) 后得到的是 (1, H_i x W_i x num_anchors x 4) 的 MXNet.NDArray
最后得到的 anchors 是 List of MXNet.NDArray，里面的每个元素是 (1, H_i x W_i x num_anchors x 4) 的 MXNet.NDArray

        cls_preds = F.concat(*cls_preds, dim=1).reshape((0, -1, self.num_classes + 1))

F.concat(*cls_preds, dim=1) 这一步得到 (B, H_1 x W_1 x num_anchors x (fg_cls + 1) + ... + H_6 x W_6 x num_anchors x (fg_cls + 1)) 的矩阵
reshape((0, -1, self.num_classes + 1)) 得到的 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, fg_cls + 1)
因此 cls_preds 是 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, fg_cls + 1) 的矩阵

        box_preds = F.concat(*box_preds, dim=1).reshape((0, -1, 4))

F.concat(*box_preds, dim=1) 得到的是 (N, H_1 x W_1 x num_anchors x 4 + H_2 x W_2 x num_anchors x 4 + ... + H_6 x W_6 x num_anchors x 4) 的 MXNet.NDArray
经过 reshape((0, -1, 4)) 后得到的 box_preds 是 (N, H_1 x W_1 x num_anchors + H_2 x W_2 x num_anchors + ... + H_6 x W_6 x num_anchors, 4) 的 MXNet.NDArray

        anchors = F.concat(*anchors, dim=1).reshape((1, -1, 4))

F.concat(*anchors, dim=1) 得到的是 (1, H_1 x W_2 x num_anchors x 4 + ... + H_6 x W_6 x num_anchors x 4) 的 MXNet.NDArray
经过 reshape((1, -1, 4)) 后得到的 anchors 是 (1, H_1 x W_2 x num_anchors + ... + H_6 x W_6 x num_anchors, 4) 的 MXNet.NDArray

        if autograd.is_training():
            return [cls_preds, box_preds, anchors]

因此，如果是训练状态返回的是一个 List of MXNet.NDArray，分别如下
cls_preds 是一个 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, fg_cls + 1) 的 MXNet.NDArray
box_preds 是一个 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, 4) 的 MXNet.NDArray
anchors   是一个 (1, H_1 x W_2 x num_anchors + ... + H_6 x W_6 x num_anchors, 4) 的 MXNet.NDArray

        bboxes = self.bbox_decoder(box_preds, anchors)

bbox_decoder 做的就是把 (cx, cy, w, h) 这样的编码变成 x_{min}, y_{min}, x_{max}, y_{max}
得到的 bboxes 仍然是 (N, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, 4) 的 MXNet.NDArray，只不过编码变成了 x_{min}, y_{min}, x_{max}, y_{max}

        cls_ids, scores = self.cls_decoder(F.softmax(cls_preds, axis=-1))

F.softmax(cls_preds, axis=-1) 是将 cls_preds 里面的 Prediction score（经过 softmax 之前的），经过 softmax 之后转换成 Prediction Probability，尺寸还是 (N, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, self.num_classes + 1)
得到的 cls_ids 还是 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, fg_class) 的矩阵，只不过对应的 score value 如果小于 thresh，则是 -1，否则是 class id
得到的 scores 也是 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, fg_class) 的矩阵，只不过对应的 score value 如果小于 thresh，则是 0，否则是该类的 Probability

        results = []
        for i in range(self.num_classes):
            cls_id = cls_ids.slice_axis(axis=-1, begin=i, end=i+1)
            score = scores.slice_axis(axis=-1, begin=i, end=i+1)
            # per class results
            per_result = F.concat(*[cls_id, score, bboxes], dim=-1)
            results.append(per_result)
        result = F.concat(*results, dim=1)
对于循环内的每一个:

cls_id 是 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, 1) 的矩阵
score  是 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, 1) 的矩阵
bboxes 是 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, 4) 的矩阵

因此：

per_result 是 (B, H_1 x W_1 x num_anchors + ... + H_6 x W_6 x num_anchors, 6) 的矩阵，6 维中的第一个是 class id, 第二个是 Probability，3 - 6 是 bbox 的坐标
result 是 (B, H_1 x W_1 x num_anchors x num_classes + ... + H_6 x W_6 x num_anchors x num_classes, 6) 的矩阵

        if self.nms_thresh > 0 and self.nms_thresh < 1:
            result = F.contrib.box_nms(
                result, overlap_thresh=self.nms_thresh, topk=self.nms_topk, valid_thresh=0.01,
                id_index=0, score_index=1, coord_start=2, force_suppress=False)
            if self.post_nms > 0:
                result = result.slice_axis(axis=1, begin=0, end=self.post_nms)
        ids = F.slice_axis(result, axis=2, begin=0, end=1)
        scores = F.slice_axis(result, axis=2, begin=1, end=2)
        bboxes = F.slice_axis(result, axis=2, begin=2, end=6)
        return ids, scores, bboxes            

经过 box_nms 和 slice_axis 之后，result 是 b x post_nms x 6 的矩阵
这里可以看到 返回的 ids, scores, bboxes  其实都是固定数目的，self.post_nms 是 100，所以会返回 100 个预测结果，但这里绝大部分的 Probability 都是 0.02、0.03 这样的，在 demo_ssd.py 中，因为会有一个 thresh 比如 0.5，只会画出 大于 thresh 的框，所以抑制了绝大部分 Background
ids 是 b x post_nms x 1 的矩阵
scores 是 b x post_nms x 1 的矩阵
bboxes 是 b x post_nms x 4 的矩阵