Skip to content

Instantly share code, notes, and snippets.

@YimianDai
Last active April 23, 2020 20:52
Show Gist options
  • Save YimianDai/f1d87ac76209c18d06bd555d76fc3793 to your computer and use it in GitHub Desktop.
Save YimianDai/f1d87ac76209c18d06bd555d76fc3793 to your computer and use it in GitHub Desktop.
GluonCV Segmentation Train Script

Toolbox 既是一种 Blessing,可以让一知半解的我们迅速上手,同时也是一种 Curse,需要考虑很多细节。一点关于 gluon-cv/scripts/segmentation/train.py 的笔记。

  • parse_args
    • model
    • dataset
    • training hyper params
    • cuda and logging
    • checking point
    • evaluation only
    • synchronized Batch Normalization
  • Trainer
    • __init__
      1. image transform
      2. dataset and dataloader
      3. create network
        • model = get_model
        • net = DataParallelModel(model, ctx)
        • evaluator = DataParallelModel(SegEvalModel(model), ctx)
          • 这里 evaluatornet 多了一个 SegEvalModel,这个 SegEvalModel 就是调用 modelevaluate 函数,而非 forward,这两者的区别在于 forward 可以返回一个 tuple,比如 FCN c4、c3 各自的结果,但 evaluate 是返回 self.forward(x)[0] 永远只返回 forward 返回的 tuple 的第一个元素。
      4. resume checkpoint if needed
      5. create criterion
      6. optimizer and lr scheduling
      7. evaluation metrics
    • training (One Epoch)
      • For Every Mini-Batch:
        1. self.net forward to get outputs
        2. given outputs and labels, calc loss
        3. autograd.backward calc gradient
        4. optimizer.step to update parameter
        5. accumulate train loss
    • validation
      • For Every Mini-Batch:
        1. self.evaluator forward to get outputs
        2. given outputs and labels, update metric
        3. get pixAcc and mIoU
  • save_checkpoint
    • prepare filename and filepath
    • net.save_parameters
    • if is_best
      • net.save_parameters with suffix model_best
  • __main__
    • trainer = Trainer(args), create Trainer object
    • if args.eval, Evaluation Mode
      • call trainer.validation
    • else, Train Mode
      • For Every Epoch:
        1. call trainer.training
        2. if val
        3. call trainer.validation

原来代码里面的 save_checkpoint 函数只能 save 每一个 epoch 的参数,不能 save model_best.params,如果要发挥作用,要做以下修改:

  1. 需要将 training 函数中的 save_checkpoint 给注释掉
  2. Trainer__def__ 函数中添加 self.best_IoU = 0self.is_best = False
  3. validation 的最后添加下面代码:
        if mIoU > self.best_IoU:
            self.is_best = True
        else:
            self.is_best = False
        # save every epoch
        save_checkpoint(self.net.module, self.args, self.is_best)

怎么样在原来的基础上 load 已经训练好的参数,然后继续训练

只获取 Test 的 Metric Mode

python demo_tiny_fcn.py --epoch 2 --batch-size 10 --lr 0.005 --resume './runs/iceberg/unet/default/model_best.params' --eval

保存所有测试分割结果模式

Train Script

$ python demo_tiny_fcn.py --epoch 20 --batch-size 16 --lr 0.001 --resume './runs/iceberg/unet/default/model_best.params' --syncbn

是怎么处理 多 GPU 的?

$ python train_tiny_fcn.py --epoch 100 --batch-size 20 --test-batch-size 20 --train-split train --val-split val --lr 0.01 --dataset DENTIST --model FCN

在 Debug 机器上:

python train_tiny_fcn.py --epoch 1000 --batch-size 10 --test-batch-size 10 --train-split train --val-split val --lr 0.001 --dataset DENTIST --model FCN --metric mAP

在 Fully-Run 机器上:

python train_tiny_fcn.py --epoch 1000 --batch-size 32 --test-batch-size 32 --train-split train --val-split val --lr 0.01 --dataset Iceberg --model FCN --metric mAP --syncbn --resume tmp_FCN_Iceberg_best_mAP.params

python train_tiny_mask_fcn.py --epoch 1000 --batch-size 40 --test-batch-size 40 --train-split train --val-split val --lr 0.01 --dataset Iceberg --model FCN --metric mAP --syncbn --gpus 0,1,2,3

--resume tmp_FCN_Iceberg_best_mAP.params

nohup 版本

$ nohup python train_tiny_fcn.py --epoch 1000 --batch-size 40 --test-batch-size 40 --train-split train --val-split val --lr 0.001 --dataset Iceberg --model FCN --metric mAP > FCN_Iceberg_best_mAP.out &

代码解读

                outputs = self.net(data.astype(args.dtype, copy=False))

outputs 是 Tuple of Tuple of mx.ndarray, 每个 mx.ndarray 的形状是 (B, C, H, W)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment