Skip to content

Instantly share code, notes, and snippets.

@mmgaggle
Last active August 3, 2020 19:53
Show Gist options
  • Save mmgaggle/15e938a49909f1101f0093e17294546f to your computer and use it in GitHub Desktop.
Save mmgaggle/15e938a49909f1101f0093e17294546f to your computer and use it in GitHub Desktop.
_CUDA_COMPAT_STATUS=CUDA Driver UNAVAILABLE (cuInit(0) returned 100)
NVIDIA_PYTORCH_VERSION=19.05
MOFED_VERSION=4.4-1.0.0
COCOAPI_VERSION=2.0+nv0.3.1
CUDNN_VERSION=7.6.0.64
HOSTNAME=ssd
DATADIR=/ocs-ml-data/coco
NVIDIA_REQUIRE_CUDA=cuda>=5.0
KUBERNETES_PORT=tcp://172.30.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
TERM=xterm
NSIGHT_SYSTEMS_VERSION=2019.3.1
CUBLAS_VERSION=10.2.0.163
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_HOST=172.30.0.1
NEXP=1
LC_ALL=C.UTF-8
PYTHONIOENCODING=utf-8
LD_LIBRARY_PATH=/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NVIDIA_VISIBLE_DEVICES=all
ENV=/etc/shinit
_CUDA_COMPAT_PATH=/usr/local/cuda/compat
CUDA_CACHE_DISABLE=1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
TRT_VERSION=5.1.5.0
CUDA_DRIVER_VERSION=418.67
NVIDIA_BUILD_ID=6411784
PATH=/opt/conda/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/workspace/single_stage_detector
PYTORCH_VERSION=1.1.0a0+828a6a3
PYTORCH_BUILD_VERSION=1.1.0a0+828a6a3
CUDA_VERSION=10.1.163
OMPI_MCA_btl_vader_single_copy_mechanism=none
SHLVL=1
HOME=/root
DALI_VERSION=0.9.1
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
DALI_BUILD=719215
OPENMPI_VERSION=3.1.3
NCCL_VERSION=2.4.6
INSLURM=0
BASH_ENV=/etc/bash.bashrc
LOGDIR=/ocs-ml-data/logs
NSS_SDB_USE_CACHE=no
OPENCV_FOR_THREADS_NUM=1
OMP_NUM_THREADS=1
PYTORCH_BUILD_NUMBER=0
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443
_=/usr/bin/printenv
Run vars: id 31478 gpus 4 mparams
STARTING TIMING RUN AT 2020-08-03 06:32:03 PM
+ NUMEPOCHS=80
running benchmark
+ echo 'running benchmark'
+ export DATASET_DIR=/data/coco2017
+ DATASET_DIR=/data/coco2017
+ export TORCH_MODEL_ZOO=/data/torchvision
+ TORCH_MODEL_ZOO=/data/torchvision
+ python3 -m bind_launch --nsockets_per_node 1 --ncores_per_socket 16 --nproc_per_node 4 train.py --use-fp16 --nhwc --pad-input --jit --delay-allreduce --opt-loss --epochs 80 --warmup-factor 0 --no-save --threshold=0.23 --data /data/coco2017 --evaluation 120000 160000 180000 200000 220000 240000 260000 280000 --batch-size 120 --eval-batch-size 160 --warmup 650 --lr 2.92e-3 --wd 1.6e-4 --use-nvjpeg --use-roi-decode
:::MLL 1596479525.114 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
:::MLL 1596479525.114 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
:::MLL 1596479525.116 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
BN group: 1
BN group: 1
BN group: 1
:::MLL 1596479525.120 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
BN group: 1
0 Using seed = 3396354582
1 Using seed = 3396354583
3 Using seed = 3396354585
2 Using seed = 3396354584
:::MLL 1596479529.327 max_samples: {"value": 1, "metadata": {"file": "utils.py", "lineno": 465}}
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
87306240it [00:02, 37778057.99it/s]
87306240it [00:02, 38328804.28it/s]
87306240it [00:02, 39160723.67it/s]
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Delaying allreduces to the end of backward()
:::MLL 1596479532.474 model_bn_span: {"value": 120, "metadata": {"file": "train.py", "lineno": 480}}
:::MLL 1596479532.475 global_batch_size: {"value": 480, "metadata": {"file": "train.py", "lineno": 481}}
:::MLL 1596479532.486 opt_base_learning_rate: {"value": 0.045, "metadata": {"file": "train.py", "lineno": 511}}
:::MLL 1596479532.486 opt_weight_decay: {"value": 0.00016, "metadata": {"file": "train.py", "lineno": 513}}
:::MLL 1596479532.486 opt_learning_rate_warmup_steps: {"value": 650, "metadata": {"file": "train.py", "lineno": 516}}
:::MLL 1596479532.487 opt_learning_rate_warmup_factor: {"value": 0, "metadata": {"file": "train.py", "lineno": 518}}
87306240it [00:00, 96442733.39it/s]
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
epoch nbatch loss
:::MLL 1596479540.646 init_stop: {"value": null, "metadata": {"file": "train.py", "lineno": 604}}
:::MLL 1596479540.647 run_start: {"value": null, "metadata": {"file": "train.py", "lineno": 610}}
loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
Done (t=0.66s)
creating index...
Done (t=0.66s)
creating index...
Done (t=0.66s)
creating index...
Done (t=0.66s)
creating index...
time_check a: 1596479542.946700573
time_check b: 1596479547.190808773
:::MLL 1596479548.002 block_start: {"value": null, "metadata": {"first_epoch_num": 1, "epoch_count": 32.74606450292497, "file": "train.py", "lineno": 669}}
:::MLL 1596479548.003 epoch_start: {"value": null, "metadata": {"epoch_num": 1, "file": "train.py", "lineno": 673}}
Iteration: 0, Loss function: 23.010, Average Loss: 0.023, avg. samples / sec: 36.97
Iteration: 20, Loss function: 20.652, Average Loss: 0.450, avg. samples / sec: 1945.46
Iteration: 40, Loss function: 20.099, Average Loss: 0.849, avg. samples / sec: 2148.80
Iteration: 60, Loss function: 16.421, Average Loss: 1.207, avg. samples / sec: 2166.96
Iteration: 80, Loss function: 11.545, Average Loss: 1.445, avg. samples / sec: 2160.84
Iteration: 100, Loss function: 10.326, Average Loss: 1.621, avg. samples / sec: 2168.89
Iteration: 120, Loss function: 9.471, Average Loss: 1.782, avg. samples / sec: 2163.36
Iteration: 140, Loss function: 8.934, Average Loss: 1.929, avg. samples / sec: 2173.41
Iteration: 160, Loss function: 8.868, Average Loss: 2.068, avg. samples / sec: 2181.67
Iteration: 180, Loss function: 8.540, Average Loss: 2.198, avg. samples / sec: 2176.20
Iteration: 200, Loss function: 8.401, Average Loss: 2.322, avg. samples / sec: 2179.32
Iteration: 220, Loss function: 8.327, Average Loss: 2.442, avg. samples / sec: 2183.55
Iteration: 240, Loss function: 8.279, Average Loss: 2.555, avg. samples / sec: 2179.25
:::MLL 1596479603.693 epoch_stop: {"value": null, "metadata": {"epoch_num": 1, "file": "train.py", "lineno": 819}}
:::MLL 1596479603.694 epoch_start: {"value": null, "metadata": {"epoch_num": 2, "file": "train.py", "lineno": 673}}
Iteration: 260, Loss function: 7.940, Average Loss: 2.665, avg. samples / sec: 2172.77
Iteration: 280, Loss function: 7.550, Average Loss: 2.768, avg. samples / sec: 2180.30
Iteration: 300, Loss function: 7.217, Average Loss: 2.864, avg. samples / sec: 2172.69
Iteration: 320, Loss function: 8.221, Average Loss: 2.959, avg. samples / sec: 2176.64
Iteration: 340, Loss function: 7.486, Average Loss: 3.060, avg. samples / sec: 2169.93
Iteration: 360, Loss function: 7.224, Average Loss: 3.147, avg. samples / sec: 2179.19
Iteration: 380, Loss function: 7.490, Average Loss: 3.228, avg. samples / sec: 2170.93
Iteration: 400, Loss function: 7.088, Average Loss: 3.307, avg. samples / sec: 2166.55
Iteration: 420, Loss function: 7.320, Average Loss: 3.387, avg. samples / sec: 2172.39
Iteration: 440, Loss function: 6.961, Average Loss: 3.461, avg. samples / sec: 2168.27
Iteration: 460, Loss function: 6.971, Average Loss: 3.530, avg. samples / sec: 2172.12
Iteration: 480, Loss function: 6.588, Average Loss: 3.596, avg. samples / sec: 2171.24
:::MLL 1596479657.594 epoch_stop: {"value": null, "metadata": {"epoch_num": 2, "file": "train.py", "lineno": 819}}
:::MLL 1596479657.595 epoch_start: {"value": null, "metadata": {"epoch_num": 3, "file": "train.py", "lineno": 673}}
Iteration: 500, Loss function: 7.012, Average Loss: 3.661, avg. samples / sec: 2159.67
Iteration: 520, Loss function: 6.650, Average Loss: 3.722, avg. samples / sec: 2171.54
Iteration: 540, Loss function: 6.946, Average Loss: 3.780, avg. samples / sec: 2163.70
Iteration: 560, Loss function: 6.757, Average Loss: 3.838, avg. samples / sec: 2168.25
Iteration: 580, Loss function: 6.344, Average Loss: 3.893, avg. samples / sec: 2169.27
Iteration: 600, Loss function: 6.131, Average Loss: 3.942, avg. samples / sec: 2158.56
Iteration: 620, Loss function: 6.786, Average Loss: 3.991, avg. samples / sec: 2165.71
Iteration: 640, Loss function: 6.616, Average Loss: 4.039, avg. samples / sec: 2165.73
Iteration: 660, Loss function: 6.276, Average Loss: 4.083, avg. samples / sec: 2166.54
Iteration: 680, Loss function: 6.039, Average Loss: 4.122, avg. samples / sec: 2162.49
Iteration: 700, Loss function: 6.108, Average Loss: 4.161, avg. samples / sec: 2167.98
Iteration: 720, Loss function: 6.383, Average Loss: 4.203, avg. samples / sec: 2168.30
:::MLL 1596479711.679 epoch_stop: {"value": null, "metadata": {"epoch_num": 3, "file": "train.py", "lineno": 819}}
:::MLL 1596479711.679 epoch_start: {"value": null, "metadata": {"epoch_num": 4, "file": "train.py", "lineno": 673}}
Iteration: 740, Loss function: 6.107, Average Loss: 4.240, avg. samples / sec: 2158.10
Iteration: 760, Loss function: 5.782, Average Loss: 4.272, avg. samples / sec: 2157.74
Iteration: 780, Loss function: 5.819, Average Loss: 4.302, avg. samples / sec: 2163.34
Iteration: 800, Loss function: 5.520, Average Loss: 4.331, avg. samples / sec: 2159.15
Iteration: 820, Loss function: 5.654, Average Loss: 4.360, avg. samples / sec: 2160.89
Iteration: 840, Loss function: 6.002, Average Loss: 4.387, avg. samples / sec: 2156.31
Iteration: 860, Loss function: 5.636, Average Loss: 4.412, avg. samples / sec: 2160.81
Iteration: 880, Loss function: 5.528, Average Loss: 4.436, avg. samples / sec: 2154.79
Iteration: 900, Loss function: 5.592, Average Loss: 4.459, avg. samples / sec: 2157.04
Iteration: 920, Loss function: 5.529, Average Loss: 4.480, avg. samples / sec: 2157.44
Iteration: 940, Loss function: 5.364, Average Loss: 4.501, avg. samples / sec: 2162.06
Iteration: 960, Loss function: 5.633, Average Loss: 4.523, avg. samples / sec: 2157.86
:::MLL 1596479766.171 epoch_stop: {"value": null, "metadata": {"epoch_num": 4, "file": "train.py", "lineno": 819}}
:::MLL 1596479766.172 epoch_start: {"value": null, "metadata": {"epoch_num": 5, "file": "train.py", "lineno": 673}}
Iteration: 980, Loss function: 5.336, Average Loss: 4.543, avg. samples / sec: 2149.41
Iteration: 1000, Loss function: 5.518, Average Loss: 4.561, avg. samples / sec: 2161.57
Iteration: 1020, Loss function: 5.197, Average Loss: 4.577, avg. samples / sec: 2153.19
Iteration: 1040, Loss function: 5.422, Average Loss: 4.593, avg. samples / sec: 2156.21
Iteration: 1060, Loss function: 5.351, Average Loss: 4.607, avg. samples / sec: 2157.13
Iteration: 1080, Loss function: 5.352, Average Loss: 4.621, avg. samples / sec: 2157.61
Iteration: 1100, Loss function: 5.505, Average Loss: 4.634, avg. samples / sec: 2159.93
Iteration: 1120, Loss function: 5.268, Average Loss: 4.646, avg. samples / sec: 2155.72
Iteration: 1140, Loss function: 5.115, Average Loss: 4.659, avg. samples / sec: 2159.03
Iteration: 1160, Loss function: 5.466, Average Loss: 4.671, avg. samples / sec: 2161.11
Iteration: 1180, Loss function: 5.333, Average Loss: 4.682, avg. samples / sec: 2157.11
Iteration: 1200, Loss function: 5.325, Average Loss: 4.692, avg. samples / sec: 2163.03
Iteration: 1220, Loss function: 5.086, Average Loss: 4.702, avg. samples / sec: 2166.94
:::MLL 1596479820.429 epoch_stop: {"value": null, "metadata": {"epoch_num": 5, "file": "train.py", "lineno": 819}}
:::MLL 1596479820.430 epoch_start: {"value": null, "metadata": {"epoch_num": 6, "file": "train.py", "lineno": 673}}
Iteration: 1240, Loss function: 4.831, Average Loss: 4.711, avg. samples / sec: 2160.51
Iteration: 1260, Loss function: 4.857, Average Loss: 4.718, avg. samples / sec: 2157.49
Iteration: 1280, Loss function: 5.504, Average Loss: 4.726, avg. samples / sec: 2153.82
Iteration: 1300, Loss function: 5.172, Average Loss: 4.732, avg. samples / sec: 2160.78
Iteration: 1320, Loss function: 4.870, Average Loss: 4.739, avg. samples / sec: 2159.87
Iteration: 1340, Loss function: 5.227, Average Loss: 4.745, avg. samples / sec: 2162.08
Iteration: 1360, Loss function: 5.337, Average Loss: 4.750, avg. samples / sec: 2156.42
Iteration: 1380, Loss function: 5.051, Average Loss: 4.756, avg. samples / sec: 2162.77
Iteration: 1400, Loss function: 4.671, Average Loss: 4.763, avg. samples / sec: 2162.73
Iteration: 1420, Loss function: 5.093, Average Loss: 4.767, avg. samples / sec: 2158.78
Iteration: 1440, Loss function: 5.055, Average Loss: 4.772, avg. samples / sec: 2156.42
Iteration: 1460, Loss function: 4.851, Average Loss: 4.776, avg. samples / sec: 2160.70
:::MLL 1596479874.678 epoch_stop: {"value": null, "metadata": {"epoch_num": 6, "file": "train.py", "lineno": 819}}
:::MLL 1596479874.679 epoch_start: {"value": null, "metadata": {"epoch_num": 7, "file": "train.py", "lineno": 673}}
Iteration: 1480, Loss function: 4.876, Average Loss: 4.780, avg. samples / sec: 2148.47
Iteration: 1500, Loss function: 5.011, Average Loss: 4.783, avg. samples / sec: 2157.60
Iteration: 1520, Loss function: 4.778, Average Loss: 4.784, avg. samples / sec: 2160.48
Iteration: 1540, Loss function: 4.694, Average Loss: 4.785, avg. samples / sec: 2158.56
Iteration: 1560, Loss function: 5.101, Average Loss: 4.788, avg. samples / sec: 2159.53
Iteration: 1580, Loss function: 4.700, Average Loss: 4.790, avg. samples / sec: 2153.68
Iteration: 1600, Loss function: 4.906, Average Loss: 4.790, avg. samples / sec: 2161.36
Iteration: 1620, Loss function: 4.894, Average Loss: 4.792, avg. samples / sec: 2159.54
Iteration: 1640, Loss function: 5.072, Average Loss: 4.794, avg. samples / sec: 2161.06
Iteration: 1660, Loss function: 4.849, Average Loss: 4.795, avg. samples / sec: 2153.77
Iteration: 1680, Loss function: 4.719, Average Loss: 4.795, avg. samples / sec: 2159.25
Iteration: 1700, Loss function: 4.939, Average Loss: 4.797, avg. samples / sec: 2156.35
:::MLL 1596479929.179 epoch_stop: {"value": null, "metadata": {"epoch_num": 7, "file": "train.py", "lineno": 819}}
:::MLL 1596479929.180 epoch_start: {"value": null, "metadata": {"epoch_num": 8, "file": "train.py", "lineno": 673}}
Iteration: 1720, Loss function: 4.625, Average Loss: 4.797, avg. samples / sec: 2153.24
Iteration: 1740, Loss function: 4.727, Average Loss: 4.798, avg. samples / sec: 2163.10
Iteration: 1760, Loss function: 5.001, Average Loss: 4.798, avg. samples / sec: 2159.38
Iteration: 1780, Loss function: 4.592, Average Loss: 4.798, avg. samples / sec: 2158.09
Iteration: 1800, Loss function: 4.534, Average Loss: 4.798, avg. samples / sec: 2152.26
Iteration: 1820, Loss function: 4.808, Average Loss: 4.798, avg. samples / sec: 2156.88
Iteration: 1840, Loss function: 4.612, Average Loss: 4.797, avg. samples / sec: 2156.58
Iteration: 1860, Loss function: 4.575, Average Loss: 4.796, avg. samples / sec: 2152.50
Iteration: 1880, Loss function: 4.678, Average Loss: 4.795, avg. samples / sec: 2162.25
Iteration: 1900, Loss function: 4.733, Average Loss: 4.793, avg. samples / sec: 2156.32
Iteration: 1920, Loss function: 5.201, Average Loss: 4.792, avg. samples / sec: 2158.86
Iteration: 1940, Loss function: 4.881, Average Loss: 4.792, avg. samples / sec: 2163.83
:::MLL 1596479983.451 epoch_stop: {"value": null, "metadata": {"epoch_num": 8, "file": "train.py", "lineno": 819}}
:::MLL 1596479983.451 epoch_start: {"value": null, "metadata": {"epoch_num": 9, "file": "train.py", "lineno": 673}}
Iteration: 1960, Loss function: 4.298, Average Loss: 4.788, avg. samples / sec: 2157.71
Iteration: 1980, Loss function: 4.536, Average Loss: 4.786, avg. samples / sec: 2161.86
Iteration: 2000, Loss function: 4.323, Average Loss: 4.783, avg. samples / sec: 2154.97
Iteration: 2020, Loss function: 4.922, Average Loss: 4.780, avg. samples / sec: 2160.21
Iteration: 2040, Loss function: 4.491, Average Loss: 4.777, avg. samples / sec: 2148.43
Iteration: 2060, Loss function: 4.379, Average Loss: 4.774, avg. samples / sec: 2161.12
Iteration: 2080, Loss function: 4.787, Average Loss: 4.772, avg. samples / sec: 2164.71
Iteration: 2100, Loss function: 4.691, Average Loss: 4.770, avg. samples / sec: 2154.80
Iteration: 2120, Loss function: 4.536, Average Loss: 4.768, avg. samples / sec: 2162.26
Iteration: 2140, Loss function: 4.672, Average Loss: 4.765, avg. samples / sec: 2157.94
Iteration: 2160, Loss function: 4.340, Average Loss: 4.763, avg. samples / sec: 2157.02
Iteration: 2180, Loss function: 4.891, Average Loss: 4.758, avg. samples / sec: 2156.81
:::MLL 1596480037.724 epoch_stop: {"value": null, "metadata": {"epoch_num": 9, "file": "train.py", "lineno": 819}}
:::MLL 1596480037.725 epoch_start: {"value": null, "metadata": {"epoch_num": 10, "file": "train.py", "lineno": 673}}
Iteration: 2200, Loss function: 4.810, Average Loss: 4.756, avg. samples / sec: 2156.05
Iteration: 2220, Loss function: 4.256, Average Loss: 4.751, avg. samples / sec: 2154.81
Iteration: 2240, Loss function: 4.285, Average Loss: 4.746, avg. samples / sec: 2159.92
Iteration: 2260, Loss function: 4.480, Average Loss: 4.743, avg. samples / sec: 2161.12
Iteration: 2280, Loss function: 4.617, Average Loss: 4.739, avg. samples / sec: 2153.14
Iteration: 2300, Loss function: 4.374, Average Loss: 4.734, avg. samples / sec: 2166.04
Iteration: 2320, Loss function: 4.318, Average Loss: 4.732, avg. samples / sec: 2154.29
Iteration: 2340, Loss function: 4.299, Average Loss: 4.729, avg. samples / sec: 2152.77
Iteration: 2360, Loss function: 4.602, Average Loss: 4.726, avg. samples / sec: 2160.52
Iteration: 2380, Loss function: 4.740, Average Loss: 4.725, avg. samples / sec: 2154.42
Iteration: 2400, Loss function: 4.620, Average Loss: 4.721, avg. samples / sec: 2154.57
Iteration: 2420, Loss function: 4.693, Average Loss: 4.718, avg. samples / sec: 2159.82
Iteration: 2440, Loss function: 4.801, Average Loss: 4.714, avg. samples / sec: 2160.95
:::MLL 1596480092.241 epoch_stop: {"value": null, "metadata": {"epoch_num": 10, "file": "train.py", "lineno": 819}}
:::MLL 1596480092.241 epoch_start: {"value": null, "metadata": {"epoch_num": 11, "file": "train.py", "lineno": 673}}
Iteration: 2460, Loss function: 4.196, Average Loss: 4.709, avg. samples / sec: 2153.58
Iteration: 2480, Loss function: 4.854, Average Loss: 4.704, avg. samples / sec: 2162.23
Iteration: 2500, Loss function: 4.304, Average Loss: 4.699, avg. samples / sec: 2160.07
Iteration: 2520, Loss function: 4.555, Average Loss: 4.695, avg. samples / sec: 2162.03
Iteration: 2540, Loss function: 4.763, Average Loss: 4.691, avg. samples / sec: 2157.23
Iteration: 2560, Loss function: 4.508, Average Loss: 4.687, avg. samples / sec: 2159.38
Iteration: 2580, Loss function: 4.671, Average Loss: 4.683, avg. samples / sec: 2160.13
Iteration: 2600, Loss function: 4.279, Average Loss: 4.679, avg. samples / sec: 2154.50
Iteration: 2620, Loss function: 4.454, Average Loss: 4.676, avg. samples / sec: 2151.77
Iteration: 2640, Loss function: 4.323, Average Loss: 4.671, avg. samples / sec: 2162.91
Iteration: 2660, Loss function: 4.361, Average Loss: 4.667, avg. samples / sec: 2158.13
Iteration: 2680, Loss function: 4.550, Average Loss: 4.662, avg. samples / sec: 2160.94
:::MLL 1596480146.490 epoch_stop: {"value": null, "metadata": {"epoch_num": 11, "file": "train.py", "lineno": 819}}
:::MLL 1596480146.491 epoch_start: {"value": null, "metadata": {"epoch_num": 12, "file": "train.py", "lineno": 673}}
Iteration: 2700, Loss function: 4.522, Average Loss: 4.658, avg. samples / sec: 2150.43
Iteration: 2720, Loss function: 4.346, Average Loss: 4.652, avg. samples / sec: 2162.68
Iteration: 2740, Loss function: 4.682, Average Loss: 4.648, avg. samples / sec: 2151.40
Iteration: 2760, Loss function: 4.268, Average Loss: 4.643, avg. samples / sec: 2164.40
Iteration: 2780, Loss function: 4.516, Average Loss: 4.640, avg. samples / sec: 2152.09
Iteration: 2800, Loss function: 4.504, Average Loss: 4.635, avg. samples / sec: 2159.90
Iteration: 2820, Loss function: 4.559, Average Loss: 4.631, avg. samples / sec: 2161.87
Iteration: 2840, Loss function: 4.125, Average Loss: 4.626, avg. samples / sec: 2159.50
Iteration: 2860, Loss function: 5.070, Average Loss: 4.622, avg. samples / sec: 2161.84
Iteration: 2880, Loss function: 4.319, Average Loss: 4.617, avg. samples / sec: 2160.47
Iteration: 2900, Loss function: 4.284, Average Loss: 4.611, avg. samples / sec: 2160.57
Iteration: 2920, Loss function: 4.509, Average Loss: 4.606, avg. samples / sec: 2163.98
:::MLL 1596480200.738 epoch_stop: {"value": null, "metadata": {"epoch_num": 12, "file": "train.py", "lineno": 819}}
:::MLL 1596480200.739 epoch_start: {"value": null, "metadata": {"epoch_num": 13, "file": "train.py", "lineno": 673}}
Iteration: 2940, Loss function: 4.515, Average Loss: 4.601, avg. samples / sec: 2155.98
Iteration: 2960, Loss function: 4.319, Average Loss: 4.597, avg. samples / sec: 2159.77
Iteration: 2980, Loss function: 4.125, Average Loss: 4.591, avg. samples / sec: 2160.71
Iteration: 3000, Loss function: 4.384, Average Loss: 4.586, avg. samples / sec: 2157.09
Iteration: 3020, Loss function: 4.211, Average Loss: 4.581, avg. samples / sec: 2162.21
Iteration: 3040, Loss function: 4.143, Average Loss: 4.577, avg. samples / sec: 2157.44
Iteration: 3060, Loss function: 4.280, Average Loss: 4.572, avg. samples / sec: 2164.27
Iteration: 3080, Loss function: 4.305, Average Loss: 4.567, avg. samples / sec: 2159.48
Iteration: 3100, Loss function: 4.473, Average Loss: 4.564, avg. samples / sec: 2160.43
Iteration: 3120, Loss function: 4.506, Average Loss: 4.559, avg. samples / sec: 2162.44
Iteration: 3140, Loss function: 4.397, Average Loss: 4.556, avg. samples / sec: 2158.02
Iteration: 3160, Loss function: 4.209, Average Loss: 4.551, avg. samples / sec: 2158.04
:::MLL 1596480254.977 epoch_stop: {"value": null, "metadata": {"epoch_num": 13, "file": "train.py", "lineno": 819}}
:::MLL 1596480254.978 epoch_start: {"value": null, "metadata": {"epoch_num": 14, "file": "train.py", "lineno": 673}}
Iteration: 3180, Loss function: 4.485, Average Loss: 4.547, avg. samples / sec: 2151.44
Iteration: 3200, Loss function: 4.214, Average Loss: 4.542, avg. samples / sec: 2158.85
Iteration: 3220, Loss function: 4.349, Average Loss: 4.538, avg. samples / sec: 2159.43
Iteration: 3240, Loss function: 4.151, Average Loss: 4.533, avg. samples / sec: 2157.45
Iteration: 3260, Loss function: 4.180, Average Loss: 4.528, avg. samples / sec: 2160.78
Iteration: 3280, Loss function: 4.311, Average Loss: 4.523, avg. samples / sec: 2157.96
Iteration: 3300, Loss function: 4.241, Average Loss: 4.518, avg. samples / sec: 2159.85
Iteration: 3320, Loss function: 4.148, Average Loss: 4.513, avg. samples / sec: 2160.49
Iteration: 3340, Loss function: 4.148, Average Loss: 4.509, avg. samples / sec: 2163.97
Iteration: 3360, Loss function: 4.567, Average Loss: 4.504, avg. samples / sec: 2160.32
Iteration: 3380, Loss function: 4.336, Average Loss: 4.500, avg. samples / sec: 2157.33
Iteration: 3400, Loss function: 4.112, Average Loss: 4.496, avg. samples / sec: 2154.55
Iteration: 3420, Loss function: 4.218, Average Loss: 4.491, avg. samples / sec: 2159.75
:::MLL 1596480309.456 epoch_stop: {"value": null, "metadata": {"epoch_num": 14, "file": "train.py", "lineno": 819}}
:::MLL 1596480309.457 epoch_start: {"value": null, "metadata": {"epoch_num": 15, "file": "train.py", "lineno": 673}}
Iteration: 3440, Loss function: 4.156, Average Loss: 4.487, avg. samples / sec: 2149.22
Iteration: 3460, Loss function: 4.125, Average Loss: 4.482, avg. samples / sec: 2163.94
Iteration: 3480, Loss function: 4.278, Average Loss: 4.479, avg. samples / sec: 2155.79
Iteration: 3500, Loss function: 4.079, Average Loss: 4.473, avg. samples / sec: 2165.56
Iteration: 3520, Loss function: 4.325, Average Loss: 4.470, avg. samples / sec: 2155.84
Iteration: 3540, Loss function: 4.139, Average Loss: 4.466, avg. samples / sec: 2161.33
Iteration: 3560, Loss function: 4.194, Average Loss: 4.461, avg. samples / sec: 2164.46
Iteration: 3580, Loss function: 4.327, Average Loss: 4.456, avg. samples / sec: 2164.18
Iteration: 3600, Loss function: 4.453, Average Loss: 4.453, avg. samples / sec: 2160.17
Iteration: 3620, Loss function: 4.233, Average Loss: 4.448, avg. samples / sec: 2155.53
Iteration: 3640, Loss function: 4.040, Average Loss: 4.443, avg. samples / sec: 2158.90
Iteration: 3660, Loss function: 4.036, Average Loss: 4.439, avg. samples / sec: 2164.56
:::MLL 1596480363.680 epoch_stop: {"value": null, "metadata": {"epoch_num": 15, "file": "train.py", "lineno": 819}}
:::MLL 1596480363.680 epoch_start: {"value": null, "metadata": {"epoch_num": 16, "file": "train.py", "lineno": 673}}
Iteration: 3680, Loss function: 4.446, Average Loss: 4.436, avg. samples / sec: 2148.28
Iteration: 3700, Loss function: 4.312, Average Loss: 4.431, avg. samples / sec: 2157.80
Iteration: 3720, Loss function: 3.956, Average Loss: 4.427, avg. samples / sec: 2156.19
Iteration: 3740, Loss function: 3.985, Average Loss: 4.422, avg. samples / sec: 2154.17
Iteration: 3760, Loss function: 4.115, Average Loss: 4.418, avg. samples / sec: 2163.16
Iteration: 3780, Loss function: 4.360, Average Loss: 4.412, avg. samples / sec: 2164.31
Iteration: 3800, Loss function: 4.727, Average Loss: 4.408, avg. samples / sec: 2152.51
Iteration: 3820, Loss function: 4.422, Average Loss: 4.403, avg. samples / sec: 2156.92
Iteration: 3840, Loss function: 4.487, Average Loss: 4.400, avg. samples / sec: 2161.71
Iteration: 3860, Loss function: 4.122, Average Loss: 4.396, avg. samples / sec: 2160.30
Iteration: 3880, Loss function: 4.062, Average Loss: 4.392, avg. samples / sec: 2156.46
Iteration: 3900, Loss function: 3.979, Average Loss: 4.388, avg. samples / sec: 2158.01
:::MLL 1596480417.966 epoch_stop: {"value": null, "metadata": {"epoch_num": 16, "file": "train.py", "lineno": 819}}
:::MLL 1596480417.966 epoch_start: {"value": null, "metadata": {"epoch_num": 17, "file": "train.py", "lineno": 673}}
Iteration: 3920, Loss function: 4.211, Average Loss: 4.384, avg. samples / sec: 2154.66
Iteration: 3940, Loss function: 4.120, Average Loss: 4.379, avg. samples / sec: 2160.69
Iteration: 3960, Loss function: 4.477, Average Loss: 4.376, avg. samples / sec: 2154.10
Iteration: 3980, Loss function: 4.019, Average Loss: 4.371, avg. samples / sec: 2157.67
Iteration: 4000, Loss function: 4.471, Average Loss: 4.367, avg. samples / sec: 2155.74
Iteration: 4020, Loss function: 4.173, Average Loss: 4.363, avg. samples / sec: 2162.27
Iteration: 4040, Loss function: 3.977, Average Loss: 4.359, avg. samples / sec: 2156.91
Iteration: 4060, Loss function: 4.361, Average Loss: 4.355, avg. samples / sec: 2166.89
Iteration: 4080, Loss function: 4.232, Average Loss: 4.351, avg. samples / sec: 2159.08
Iteration: 4100, Loss function: 4.065, Average Loss: 4.349, avg. samples / sec: 2157.47
Iteration: 4120, Loss function: 4.260, Average Loss: 4.346, avg. samples / sec: 2163.69
Iteration: 4140, Loss function: 4.014, Average Loss: 4.342, avg. samples / sec: 2159.07
:::MLL 1596480472.438 epoch_stop: {"value": null, "metadata": {"epoch_num": 17, "file": "train.py", "lineno": 819}}
:::MLL 1596480472.439 epoch_start: {"value": null, "metadata": {"epoch_num": 18, "file": "train.py", "lineno": 673}}
Iteration: 4160, Loss function: 4.299, Average Loss: 4.339, avg. samples / sec: 2151.55
Iteration: 4180, Loss function: 4.331, Average Loss: 4.334, avg. samples / sec: 2161.40
Iteration: 4200, Loss function: 4.325, Average Loss: 4.329, avg. samples / sec: 2163.85
Iteration: 4220, Loss function: 4.086, Average Loss: 4.326, avg. samples / sec: 2161.62
Iteration: 4240, Loss function: 4.109, Average Loss: 4.323, avg. samples / sec: 2155.26
Iteration: 4260, Loss function: 4.171, Average Loss: 4.318, avg. samples / sec: 2158.94
Iteration: 4280, Loss function: 4.273, Average Loss: 4.314, avg. samples / sec: 2156.38
Iteration: 4300, Loss function: 3.779, Average Loss: 4.313, avg. samples / sec: 2164.21
Iteration: 4320, Loss function: 4.061, Average Loss: 4.309, avg. samples / sec: 2163.39
Iteration: 4340, Loss function: 4.082, Average Loss: 4.305, avg. samples / sec: 2158.84
Iteration: 4360, Loss function: 4.290, Average Loss: 4.303, avg. samples / sec: 2157.31
Iteration: 4380, Loss function: 3.997, Average Loss: 4.300, avg. samples / sec: 2157.65
:::MLL 1596480526.671 epoch_stop: {"value": null, "metadata": {"epoch_num": 18, "file": "train.py", "lineno": 819}}
:::MLL 1596480526.672 epoch_start: {"value": null, "metadata": {"epoch_num": 19, "file": "train.py", "lineno": 673}}
Iteration: 4400, Loss function: 3.992, Average Loss: 4.296, avg. samples / sec: 2156.02
Iteration: 4420, Loss function: 4.210, Average Loss: 4.293, avg. samples / sec: 2162.95
Iteration: 4440, Loss function: 4.340, Average Loss: 4.288, avg. samples / sec: 2163.16
Iteration: 4460, Loss function: 4.250, Average Loss: 4.285, avg. samples / sec: 2162.34
Iteration: 4480, Loss function: 4.166, Average Loss: 4.281, avg. samples / sec: 2159.10
Iteration: 4500, Loss function: 4.089, Average Loss: 4.279, avg. samples / sec: 2164.97
Iteration: 4520, Loss function: 4.005, Average Loss: 4.275, avg. samples / sec: 2161.36
Iteration: 4540, Loss function: 4.003, Average Loss: 4.272, avg. samples / sec: 2164.87
Iteration: 4560, Loss function: 4.550, Average Loss: 4.269, avg. samples / sec: 2161.34
Iteration: 4580, Loss function: 4.222, Average Loss: 4.265, avg. samples / sec: 2155.01
Iteration: 4600, Loss function: 4.467, Average Loss: 4.262, avg. samples / sec: 2158.27
Iteration: 4620, Loss function: 3.980, Average Loss: 4.259, avg. samples / sec: 2157.51
Iteration: 4640, Loss function: 4.084, Average Loss: 4.257, avg. samples / sec: 2161.24
:::MLL 1596480580.883 epoch_stop: {"value": null, "metadata": {"epoch_num": 19, "file": "train.py", "lineno": 819}}
:::MLL 1596480580.883 epoch_start: {"value": null, "metadata": {"epoch_num": 20, "file": "train.py", "lineno": 673}}
Iteration: 4660, Loss function: 4.123, Average Loss: 4.252, avg. samples / sec: 2154.55
Iteration: 4680, Loss function: 3.903, Average Loss: 4.249, avg. samples / sec: 2151.80
Iteration: 4700, Loss function: 4.265, Average Loss: 4.245, avg. samples / sec: 2158.35
Iteration: 4720, Loss function: 4.026, Average Loss: 4.242, avg. samples / sec: 2159.21
Iteration: 4740, Loss function: 3.766, Average Loss: 4.239, avg. samples / sec: 2159.57
Iteration: 4760, Loss function: 4.027, Average Loss: 4.235, avg. samples / sec: 2159.85
Iteration: 4780, Loss function: 4.393, Average Loss: 4.230, avg. samples / sec: 2156.84
Iteration: 4800, Loss function: 4.010, Average Loss: 4.228, avg. samples / sec: 2160.09
Iteration: 4820, Loss function: 4.136, Average Loss: 4.225, avg. samples / sec: 2160.08
Iteration: 4840, Loss function: 3.980, Average Loss: 4.222, avg. samples / sec: 2156.41
Iteration: 4860, Loss function: 4.360, Average Loss: 4.220, avg. samples / sec: 2161.53
Iteration: 4880, Loss function: 4.039, Average Loss: 4.217, avg. samples / sec: 2154.16
:::MLL 1596480635.383 epoch_stop: {"value": null, "metadata": {"epoch_num": 20, "file": "train.py", "lineno": 819}}
:::MLL 1596480635.383 epoch_start: {"value": null, "metadata": {"epoch_num": 21, "file": "train.py", "lineno": 673}}
Iteration: 4900, Loss function: 3.766, Average Loss: 4.213, avg. samples / sec: 2150.53
Iteration: 4920, Loss function: 3.909, Average Loss: 4.210, avg. samples / sec: 2156.41
Iteration: 4940, Loss function: 4.042, Average Loss: 4.205, avg. samples / sec: 2159.91
Iteration: 4960, Loss function: 4.007, Average Loss: 4.202, avg. samples / sec: 2156.67
Iteration: 4980, Loss function: 4.399, Average Loss: 4.200, avg. samples / sec: 2152.86
Iteration: 5000, Loss function: 4.069, Average Loss: 4.197, avg. samples / sec: 2157.00
Iteration: 5020, Loss function: 4.131, Average Loss: 4.194, avg. samples / sec: 2154.31
Iteration: 5040, Loss function: 3.919, Average Loss: 4.192, avg. samples / sec: 2155.14
Iteration: 5060, Loss function: 4.102, Average Loss: 4.189, avg. samples / sec: 2159.87
Iteration: 5080, Loss function: 4.234, Average Loss: 4.187, avg. samples / sec: 2156.58
Iteration: 5100, Loss function: 4.227, Average Loss: 4.186, avg. samples / sec: 2154.93
Iteration: 5120, Loss function: 3.796, Average Loss: 4.183, avg. samples / sec: 2158.16
:::MLL 1596480689.709 epoch_stop: {"value": null, "metadata": {"epoch_num": 21, "file": "train.py", "lineno": 819}}
:::MLL 1596480689.710 epoch_start: {"value": null, "metadata": {"epoch_num": 22, "file": "train.py", "lineno": 673}}
Iteration: 5140, Loss function: 4.078, Average Loss: 4.180, avg. samples / sec: 2150.09
Iteration: 5160, Loss function: 3.865, Average Loss: 4.176, avg. samples / sec: 2157.26
Iteration: 5180, Loss function: 3.856, Average Loss: 4.172, avg. samples / sec: 2160.17
Iteration: 5200, Loss function: 4.226, Average Loss: 4.168, avg. samples / sec: 2160.80
Iteration: 5220, Loss function: 3.712, Average Loss: 4.165, avg. samples / sec: 2159.35
Iteration: 5240, Loss function: 4.145, Average Loss: 4.163, avg. samples / sec: 2163.44
Iteration: 5260, Loss function: 4.178, Average Loss: 4.161, avg. samples / sec: 2155.93
Iteration: 5280, Loss function: 4.043, Average Loss: 4.159, avg. samples / sec: 2164.58
Iteration: 5300, Loss function: 3.859, Average Loss: 4.156, avg. samples / sec: 2157.55
Iteration: 5320, Loss function: 3.925, Average Loss: 4.153, avg. samples / sec: 2161.92
Iteration: 5340, Loss function: 4.105, Average Loss: 4.152, avg. samples / sec: 2158.87
Iteration: 5360, Loss function: 3.873, Average Loss: 4.150, avg. samples / sec: 2154.39
:::MLL 1596480743.963 epoch_stop: {"value": null, "metadata": {"epoch_num": 22, "file": "train.py", "lineno": 819}}
:::MLL 1596480743.964 epoch_start: {"value": null, "metadata": {"epoch_num": 23, "file": "train.py", "lineno": 673}}
Iteration: 5380, Loss function: 4.001, Average Loss: 4.147, avg. samples / sec: 2152.11
Iteration: 5400, Loss function: 4.002, Average Loss: 4.144, avg. samples / sec: 2159.55
Iteration: 5420, Loss function: 4.236, Average Loss: 4.142, avg. samples / sec: 2160.82
Iteration: 5440, Loss function: 4.035, Average Loss: 4.138, avg. samples / sec: 2161.06
Iteration: 5460, Loss function: 4.059, Average Loss: 4.136, avg. samples / sec: 2162.96
Iteration: 5480, Loss function: 3.971, Average Loss: 4.134, avg. samples / sec: 2161.25
Iteration: 5500, Loss function: 3.972, Average Loss: 4.130, avg. samples / sec: 2159.53
Iteration: 5520, Loss function: 3.861, Average Loss: 4.127, avg. samples / sec: 2152.00
Iteration: 5540, Loss function: 4.353, Average Loss: 4.126, avg. samples / sec: 2153.90
Iteration: 5560, Loss function: 4.084, Average Loss: 4.123, avg. samples / sec: 2159.93
Iteration: 5580, Loss function: 3.999, Average Loss: 4.121, avg. samples / sec: 2156.96
Iteration: 5600, Loss function: 4.050, Average Loss: 4.119, avg. samples / sec: 2157.83
:::MLL 1596480798.230 epoch_stop: {"value": null, "metadata": {"epoch_num": 23, "file": "train.py", "lineno": 819}}
:::MLL 1596480798.230 epoch_start: {"value": null, "metadata": {"epoch_num": 24, "file": "train.py", "lineno": 673}}
Iteration: 5620, Loss function: 3.965, Average Loss: 4.117, avg. samples / sec: 2155.74
Iteration: 5640, Loss function: 3.910, Average Loss: 4.112, avg. samples / sec: 2154.94
Iteration: 5660, Loss function: 3.904, Average Loss: 4.109, avg. samples / sec: 2157.27
Iteration: 5680, Loss function: 3.948, Average Loss: 4.106, avg. samples / sec: 2159.06
Iteration: 5700, Loss function: 4.297, Average Loss: 4.105, avg. samples / sec: 2164.41
Iteration: 5720, Loss function: 4.062, Average Loss: 4.102, avg. samples / sec: 2157.56
Iteration: 5740, Loss function: 3.897, Average Loss: 4.101, avg. samples / sec: 2162.14
Iteration: 5760, Loss function: 3.779, Average Loss: 4.097, avg. samples / sec: 2159.34
Iteration: 5780, Loss function: 3.985, Average Loss: 4.096, avg. samples / sec: 2156.78
Iteration: 5800, Loss function: 4.293, Average Loss: 4.094, avg. samples / sec: 2158.05
Iteration: 5820, Loss function: 3.771, Average Loss: 4.093, avg. samples / sec: 2157.63
Iteration: 5840, Loss function: 3.888, Average Loss: 4.091, avg. samples / sec: 2155.72
Iteration: 5860, Loss function: 3.951, Average Loss: 4.089, avg. samples / sec: 2154.48
:::MLL 1596480852.731 epoch_stop: {"value": null, "metadata": {"epoch_num": 24, "file": "train.py", "lineno": 819}}
:::MLL 1596480852.731 epoch_start: {"value": null, "metadata": {"epoch_num": 25, "file": "train.py", "lineno": 673}}
Iteration: 5880, Loss function: 3.919, Average Loss: 4.086, avg. samples / sec: 2150.65
Iteration: 5900, Loss function: 3.863, Average Loss: 4.082, avg. samples / sec: 2161.49
Iteration: 5920, Loss function: 3.884, Average Loss: 4.077, avg. samples / sec: 2159.71
Iteration: 5940, Loss function: 3.872, Average Loss: 4.075, avg. samples / sec: 2163.01
Iteration: 5960, Loss function: 3.909, Average Loss: 4.073, avg. samples / sec: 2152.70
Iteration: 5980, Loss function: 3.738, Average Loss: 4.071, avg. samples / sec: 2158.93
Iteration: 6000, Loss function: 3.696, Average Loss: 4.069, avg. samples / sec: 2163.44
Iteration: 6020, Loss function: 3.783, Average Loss: 4.066, avg. samples / sec: 2163.25
Iteration: 6040, Loss function: 3.888, Average Loss: 4.063, avg. samples / sec: 2159.56
Iteration: 6060, Loss function: 3.766, Average Loss: 4.061, avg. samples / sec: 2155.66
Iteration: 6080, Loss function: 3.921, Average Loss: 4.060, avg. samples / sec: 2159.57
Iteration: 6100, Loss function: 3.696, Average Loss: 4.059, avg. samples / sec: 2158.10
:::MLL 1596480906.988 epoch_stop: {"value": null, "metadata": {"epoch_num": 25, "file": "train.py", "lineno": 819}}
:::MLL 1596480906.988 epoch_start: {"value": null, "metadata": {"epoch_num": 26, "file": "train.py", "lineno": 673}}
Iteration: 6120, Loss function: 3.954, Average Loss: 4.057, avg. samples / sec: 2148.27
Iteration: 6140, Loss function: 3.765, Average Loss: 4.053, avg. samples / sec: 2161.21
Iteration: 6160, Loss function: 3.696, Average Loss: 4.050, avg. samples / sec: 2154.37
Iteration: 6180, Loss function: 3.989, Average Loss: 4.047, avg. samples / sec: 2160.51
Iteration: 6200, Loss function: 3.662, Average Loss: 4.045, avg. samples / sec: 2160.90
Iteration: 6220, Loss function: 3.626, Average Loss: 4.043, avg. samples / sec: 2160.90
Iteration: 6240, Loss function: 3.757, Average Loss: 4.040, avg. samples / sec: 2159.76
Iteration: 6260, Loss function: 3.954, Average Loss: 4.039, avg. samples / sec: 2162.14
Iteration: 6280, Loss function: 3.677, Average Loss: 4.037, avg. samples / sec: 2158.22
Iteration: 6300, Loss function: 3.757, Average Loss: 4.036, avg. samples / sec: 2158.30
Iteration: 6320, Loss function: 4.010, Average Loss: 4.034, avg. samples / sec: 2150.87
Iteration: 6340, Loss function: 3.905, Average Loss: 4.033, avg. samples / sec: 2158.46
:::MLL 1596480961.264 epoch_stop: {"value": null, "metadata": {"epoch_num": 26, "file": "train.py", "lineno": 819}}
:::MLL 1596480961.264 epoch_start: {"value": null, "metadata": {"epoch_num": 27, "file": "train.py", "lineno": 673}}
Iteration: 6360, Loss function: 4.010, Average Loss: 4.031, avg. samples / sec: 2152.97
Iteration: 6380, Loss function: 3.847, Average Loss: 4.029, avg. samples / sec: 2158.93
Iteration: 6400, Loss function: 3.988, Average Loss: 4.026, avg. samples / sec: 2162.26
Iteration: 6420, Loss function: 3.583, Average Loss: 4.023, avg. samples / sec: 2159.83
Iteration: 6440, Loss function: 3.677, Average Loss: 4.020, avg. samples / sec: 2153.38
Iteration: 6460, Loss function: 4.038, Average Loss: 4.017, avg. samples / sec: 2156.10
Iteration: 6480, Loss function: 3.818, Average Loss: 4.015, avg. samples / sec: 2157.75
Iteration: 6500, Loss function: 3.779, Average Loss: 4.012, avg. samples / sec: 2160.70
Iteration: 6520, Loss function: 4.000, Average Loss: 4.011, avg. samples / sec: 2161.56
Iteration: 6540, Loss function: 3.770, Average Loss: 4.009, avg. samples / sec: 2163.00
Iteration: 6560, Loss function: 3.731, Average Loss: 4.008, avg. samples / sec: 2160.18
Iteration: 6580, Loss function: 4.048, Average Loss: 4.007, avg. samples / sec: 2164.20
:::MLL 1596481015.724 epoch_stop: {"value": null, "metadata": {"epoch_num": 27, "file": "train.py", "lineno": 819}}
:::MLL 1596481015.725 epoch_start: {"value": null, "metadata": {"epoch_num": 28, "file": "train.py", "lineno": 673}}
Iteration: 6600, Loss function: 3.615, Average Loss: 4.005, avg. samples / sec: 2154.10
Iteration: 6620, Loss function: 3.934, Average Loss: 4.004, avg. samples / sec: 2157.89
Iteration: 6640, Loss function: 4.069, Average Loss: 4.002, avg. samples / sec: 2161.21
Iteration: 6660, Loss function: 3.870, Average Loss: 3.999, avg. samples / sec: 2161.08
Iteration: 6680, Loss function: 3.847, Average Loss: 3.997, avg. samples / sec: 2156.68
Iteration: 6700, Loss function: 4.199, Average Loss: 3.994, avg. samples / sec: 2155.52
Iteration: 6720, Loss function: 3.991, Average Loss: 3.993, avg. samples / sec: 2156.30
Iteration: 6740, Loss function: 3.897, Average Loss: 3.990, avg. samples / sec: 2157.64
Iteration: 6760, Loss function: 3.784, Average Loss: 3.988, avg. samples / sec: 2158.16
Iteration: 6780, Loss function: 4.239, Average Loss: 3.987, avg. samples / sec: 2164.47
Iteration: 6800, Loss function: 3.871, Average Loss: 3.987, avg. samples / sec: 2165.53
Iteration: 6820, Loss function: 3.838, Average Loss: 3.986, avg. samples / sec: 2162.49
Iteration: 6840, Loss function: 3.652, Average Loss: 3.985, avg. samples / sec: 2155.52
:::MLL 1596481069.976 epoch_stop: {"value": null, "metadata": {"epoch_num": 28, "file": "train.py", "lineno": 819}}
:::MLL 1596481069.977 epoch_start: {"value": null, "metadata": {"epoch_num": 29, "file": "train.py", "lineno": 673}}
Iteration: 6860, Loss function: 3.879, Average Loss: 3.982, avg. samples / sec: 2153.93
Iteration: 6880, Loss function: 4.029, Average Loss: 3.980, avg. samples / sec: 2162.57
Iteration: 6900, Loss function: 3.860, Average Loss: 3.976, avg. samples / sec: 2158.00
Iteration: 6920, Loss function: 3.828, Average Loss: 3.974, avg. samples / sec: 2157.75
Iteration: 6940, Loss function: 3.702, Average Loss: 3.973, avg. samples / sec: 2158.96
Iteration: 6960, Loss function: 3.953, Average Loss: 3.970, avg. samples / sec: 2156.08
Iteration: 6980, Loss function: 3.951, Average Loss: 3.968, avg. samples / sec: 2156.65
Iteration: 7000, Loss function: 3.761, Average Loss: 3.966, avg. samples / sec: 2156.52
Iteration: 7020, Loss function: 4.069, Average Loss: 3.965, avg. samples / sec: 2153.93
Iteration: 7040, Loss function: 3.687, Average Loss: 3.963, avg. samples / sec: 2158.35
Iteration: 7060, Loss function: 3.927, Average Loss: 3.961, avg. samples / sec: 2162.74
Iteration: 7080, Loss function: 3.998, Average Loss: 3.959, avg. samples / sec: 2158.71
:::MLL 1596481124.249 epoch_stop: {"value": null, "metadata": {"epoch_num": 29, "file": "train.py", "lineno": 819}}
:::MLL 1596481124.250 epoch_start: {"value": null, "metadata": {"epoch_num": 30, "file": "train.py", "lineno": 673}}
Iteration: 7100, Loss function: 4.112, Average Loss: 3.958, avg. samples / sec: 2153.73
Iteration: 7120, Loss function: 3.807, Average Loss: 3.955, avg. samples / sec: 2155.98
Iteration: 7140, Loss function: 3.739, Average Loss: 3.953, avg. samples / sec: 2159.02
Iteration: 7160, Loss function: 3.835, Average Loss: 3.952, avg. samples / sec: 2159.56
Iteration: 7180, Loss function: 4.076, Average Loss: 3.951, avg. samples / sec: 2157.65
Iteration: 7200, Loss function: 3.694, Average Loss: 3.950, avg. samples / sec: 2151.27
Iteration: 7220, Loss function: 4.253, Average Loss: 3.948, avg. samples / sec: 2155.33
Iteration: 7240, Loss function: 3.681, Average Loss: 3.946, avg. samples / sec: 2153.58
Iteration: 7260, Loss function: 3.633, Average Loss: 3.943, avg. samples / sec: 2159.88
Iteration: 7280, Loss function: 4.122, Average Loss: 3.941, avg. samples / sec: 2152.36
Iteration: 7300, Loss function: 3.870, Average Loss: 3.940, avg. samples / sec: 2157.78
Iteration: 7320, Loss function: 3.878, Average Loss: 3.940, avg. samples / sec: 2157.15
:::MLL 1596481178.797 epoch_stop: {"value": null, "metadata": {"epoch_num": 30, "file": "train.py", "lineno": 819}}
:::MLL 1596481178.797 epoch_start: {"value": null, "metadata": {"epoch_num": 31, "file": "train.py", "lineno": 673}}
Iteration: 7340, Loss function: 3.699, Average Loss: 3.938, avg. samples / sec: 2151.15
Iteration: 7360, Loss function: 3.670, Average Loss: 3.936, avg. samples / sec: 2155.38
Iteration: 7380, Loss function: 3.838, Average Loss: 3.934, avg. samples / sec: 2159.04
Iteration: 7400, Loss function: 3.631, Average Loss: 3.932, avg. samples / sec: 2155.24
Iteration: 7420, Loss function: 3.689, Average Loss: 3.931, avg. samples / sec: 2157.41
Iteration: 7440, Loss function: 3.871, Average Loss: 3.928, avg. samples / sec: 2158.12
Iteration: 7460, Loss function: 3.878, Average Loss: 3.927, avg. samples / sec: 2161.10
Iteration: 7480, Loss function: 3.930, Average Loss: 3.925, avg. samples / sec: 2158.26
Iteration: 7500, Loss function: 3.489, Average Loss: 3.925, avg. samples / sec: 2156.36
Iteration: 7520, Loss function: 3.768, Average Loss: 3.925, avg. samples / sec: 2157.30
Iteration: 7540, Loss function: 3.571, Average Loss: 3.923, avg. samples / sec: 2157.04
Iteration: 7560, Loss function: 3.694, Average Loss: 3.922, avg. samples / sec: 2157.14
:::MLL 1596481233.101 epoch_stop: {"value": null, "metadata": {"epoch_num": 31, "file": "train.py", "lineno": 819}}
:::MLL 1596481233.102 epoch_start: {"value": null, "metadata": {"epoch_num": 32, "file": "train.py", "lineno": 673}}
Iteration: 7580, Loss function: 3.924, Average Loss: 3.919, avg. samples / sec: 2147.91
Iteration: 7600, Loss function: 3.780, Average Loss: 3.917, avg. samples / sec: 2152.56
Iteration: 7620, Loss function: 4.150, Average Loss: 3.916, avg. samples / sec: 2155.62
Iteration: 7640, Loss function: 3.677, Average Loss: 3.915, avg. samples / sec: 2160.87
Iteration: 7660, Loss function: 3.770, Average Loss: 3.913, avg. samples / sec: 2161.66
Iteration: 7680, Loss function: 3.702, Average Loss: 3.912, avg. samples / sec: 2153.39
Iteration: 7700, Loss function: 3.846, Average Loss: 3.910, avg. samples / sec: 2163.73
Iteration: 7720, Loss function: 3.724, Average Loss: 3.910, avg. samples / sec: 2157.08
Iteration: 7740, Loss function: 3.712, Average Loss: 3.907, avg. samples / sec: 2159.77
Iteration: 7760, Loss function: 3.650, Average Loss: 3.906, avg. samples / sec: 2159.58
Iteration: 7780, Loss function: 3.723, Average Loss: 3.904, avg. samples / sec: 2163.22
Iteration: 7800, Loss function: 3.934, Average Loss: 3.902, avg. samples / sec: 2156.48
:::MLL 1596481287.372 epoch_stop: {"value": null, "metadata": {"epoch_num": 32, "file": "train.py", "lineno": 819}}
:::MLL 1596481287.373 epoch_start: {"value": null, "metadata": {"epoch_num": 33, "file": "train.py", "lineno": 673}}
Iteration: 7820, Loss function: 3.696, Average Loss: 3.902, avg. samples / sec: 2153.34
Iteration: 7840, Loss function: 3.769, Average Loss: 3.901, avg. samples / sec: 2159.35
Iteration: 7860, Loss function: 3.842, Average Loss: 3.899, avg. samples / sec: 2154.59
Iteration: 7880, Loss function: 4.000, Average Loss: 3.897, avg. samples / sec: 2157.98
Iteration: 7900, Loss function: 3.511, Average Loss: 3.894, avg. samples / sec: 2156.98
Iteration: 7920, Loss function: 3.779, Average Loss: 3.892, avg. samples / sec: 2153.65
Iteration: 7940, Loss function: 3.964, Average Loss: 3.890, avg. samples / sec: 2153.04
Iteration: 7960, Loss function: 3.720, Average Loss: 3.888, avg. samples / sec: 2160.96
Iteration: 7980, Loss function: 3.921, Average Loss: 3.887, avg. samples / sec: 2157.32
Iteration: 8000, Loss function: 3.769, Average Loss: 3.887, avg. samples / sec: 2157.35
:::MLL 1596481328.115 eval_start: {"value": null, "metadata": {"epoch_num": 33, "file": "train.py", "lineno": 276}}
Predicting Ended, total time: 12.05 s
Loading and preparing results...
Loading and preparing results...
Loading and preparing results...
Loading and preparing results...
DONE (t=0.56s)
DONE (t=0.56s)
DONE (t=0.56s)
DONE (t=0.57s)
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=3.08s).
Accumulating evaluation results...
DONE (t=0.00s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.18284
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.33092
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.18368
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.04235
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.18964
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.29474
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.18660
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.27321
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.28883
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.07753
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.30996
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.44755
Current AP: 0.18284 AP goal: 0.23000
:::MLL 1596481343.872 eval_accuracy: {"value": 0.18284130761312606, "metadata": {"epoch_num": 33, "file": "train.py", "lineno": 389}}
:::MLL 1596481343.910 eval_stop: {"value": null, "metadata": {"epoch_num": 33, "file": "train.py", "lineno": 392}}
:::MLL 1596481343.935 block_stop: {"value": null, "metadata": {"first_epoch_num": 1, "file": "train.py", "lineno": 804}}
:::MLL 1596481343.936 block_start: {"value": null, "metadata": {"first_epoch_num": 33, "epoch_count": 10.915354834308324, "file": "train.py", "lineno": 813}}
Iteration: 8020, Loss function: 3.866, Average Loss: 3.886, avg. samples / sec: 471.95
Iteration: 8040, Loss function: 3.875, Average Loss: 3.886, avg. samples / sec: 2150.82
Iteration: 8060, Loss function: 3.961, Average Loss: 3.885, avg. samples / sec: 2153.14
:::MLL 1596481357.825 epoch_stop: {"value": null, "metadata": {"epoch_num": 33, "file": "train.py", "lineno": 819}}
:::MLL 1596481357.825 epoch_start: {"value": null, "metadata": {"epoch_num": 34, "file": "train.py", "lineno": 673}}
Iteration: 8080, Loss function: 3.788, Average Loss: 3.883, avg. samples / sec: 2149.48
Iteration: 8100, Loss function: 3.409, Average Loss: 3.882, avg. samples / sec: 2157.48
Iteration: 8120, Loss function: 3.806, Average Loss: 3.880, avg. samples / sec: 2156.81
Iteration: 8140, Loss function: 3.817, Average Loss: 3.878, avg. samples / sec: 2159.61
Iteration: 8160, Loss function: 4.315, Average Loss: 3.878, avg. samples / sec: 2160.64
Iteration: 8180, Loss function: 3.746, Average Loss: 3.876, avg. samples / sec: 2158.15
Iteration: 8200, Loss function: 3.712, Average Loss: 3.875, avg. samples / sec: 2161.14
Iteration: 8220, Loss function: 3.970, Average Loss: 3.874, avg. samples / sec: 2164.50
Iteration: 8240, Loss function: 3.749, Average Loss: 3.873, avg. samples / sec: 2165.31
Iteration: 8260, Loss function: 3.707, Average Loss: 3.873, avg. samples / sec: 2158.43
Iteration: 8280, Loss function: 3.609, Average Loss: 3.871, avg. samples / sec: 2160.98
Iteration: 8300, Loss function: 3.826, Average Loss: 3.870, avg. samples / sec: 2160.18
:::MLL 1596481412.058 epoch_stop: {"value": null, "metadata": {"epoch_num": 34, "file": "train.py", "lineno": 819}}
:::MLL 1596481412.059 epoch_start: {"value": null, "metadata": {"epoch_num": 35, "file": "train.py", "lineno": 673}}
Iteration: 8320, Loss function: 3.709, Average Loss: 3.869, avg. samples / sec: 2155.99
Iteration: 8340, Loss function: 3.735, Average Loss: 3.866, avg. samples / sec: 2159.77
Iteration: 8360, Loss function: 3.939, Average Loss: 3.865, avg. samples / sec: 2159.31
Iteration: 8380, Loss function: 3.912, Average Loss: 3.863, avg. samples / sec: 2155.06
Iteration: 8400, Loss function: 3.712, Average Loss: 3.862, avg. samples / sec: 2159.35
Iteration: 8420, Loss function: 3.748, Average Loss: 3.861, avg. samples / sec: 2161.74
Iteration: 8440, Loss function: 3.775, Average Loss: 3.860, avg. samples / sec: 2159.50
Iteration: 8460, Loss function: 3.874, Average Loss: 3.859, avg. samples / sec: 2159.03
Iteration: 8480, Loss function: 3.602, Average Loss: 3.859, avg. samples / sec: 2148.66
Iteration: 8500, Loss function: 3.683, Average Loss: 3.858, avg. samples / sec: 2155.91
Iteration: 8520, Loss function: 3.898, Average Loss: 3.857, avg. samples / sec: 2159.81
Iteration: 8540, Loss function: 3.846, Average Loss: 3.856, avg. samples / sec: 2153.22
:::MLL 1596481466.350 epoch_stop: {"value": null, "metadata": {"epoch_num": 35, "file": "train.py", "lineno": 819}}
:::MLL 1596481466.351 epoch_start: {"value": null, "metadata": {"epoch_num": 36, "file": "train.py", "lineno": 673}}
Iteration: 8560, Loss function: 3.751, Average Loss: 3.856, avg. samples / sec: 2149.65
Iteration: 8580, Loss function: 3.548, Average Loss: 3.854, avg. samples / sec: 2158.12
Iteration: 8600, Loss function: 3.782, Average Loss: 3.851, avg. samples / sec: 2155.82
Iteration: 8620, Loss function: 3.761, Average Loss: 3.850, avg. samples / sec: 2157.59
Iteration: 8640, Loss function: 3.676, Average Loss: 3.848, avg. samples / sec: 2156.43
Iteration: 8660, Loss function: 4.101, Average Loss: 3.848, avg. samples / sec: 2151.28
Iteration: 8680, Loss function: 4.023, Average Loss: 3.848, avg. samples / sec: 2152.71
Iteration: 8700, Loss function: 3.718, Average Loss: 3.847, avg. samples / sec: 2156.35
Iteration: 8720, Loss function: 3.531, Average Loss: 3.846, avg. samples / sec: 2152.77
Iteration: 8740, Loss function: 3.640, Average Loss: 3.845, avg. samples / sec: 2148.75
Iteration: 8760, Loss function: 4.122, Average Loss: 3.845, avg. samples / sec: 2151.98
Iteration: 8780, Loss function: 3.630, Average Loss: 3.845, avg. samples / sec: 2150.97
:::MLL 1596481520.734 epoch_stop: {"value": null, "metadata": {"epoch_num": 36, "file": "train.py", "lineno": 819}}
:::MLL 1596481520.735 epoch_start: {"value": null, "metadata": {"epoch_num": 37, "file": "train.py", "lineno": 673}}
Iteration: 8800, Loss function: 3.798, Average Loss: 3.844, avg. samples / sec: 2154.32
Iteration: 8820, Loss function: 3.434, Average Loss: 3.841, avg. samples / sec: 2157.29
Iteration: 8840, Loss function: 3.756, Average Loss: 3.840, avg. samples / sec: 2159.71
Iteration: 8860, Loss function: 4.233, Average Loss: 3.839, avg. samples / sec: 2156.03
Iteration: 8880, Loss function: 3.705, Average Loss: 3.836, avg. samples / sec: 2161.19
Iteration: 8900, Loss function: 3.731, Average Loss: 3.835, avg. samples / sec: 2153.98
Iteration: 8920, Loss function: 3.783, Average Loss: 3.833, avg. samples / sec: 2154.79
Iteration: 8940, Loss function: 3.867, Average Loss: 3.831, avg. samples / sec: 2152.78
Iteration: 8960, Loss function: 3.721, Average Loss: 3.831, avg. samples / sec: 2156.70
Iteration: 8980, Loss function: 3.890, Average Loss: 3.830, avg. samples / sec: 2150.47
Iteration: 9000, Loss function: 3.718, Average Loss: 3.830, avg. samples / sec: 2156.91
Iteration: 9020, Loss function: 3.441, Average Loss: 3.829, avg. samples / sec: 2153.69
:::MLL 1596481575.300 epoch_stop: {"value": null, "metadata": {"epoch_num": 37, "file": "train.py", "lineno": 819}}
:::MLL 1596481575.301 epoch_start: {"value": null, "metadata": {"epoch_num": 38, "file": "train.py", "lineno": 673}}
Iteration: 9040, Loss function: 4.100, Average Loss: 3.829, avg. samples / sec: 2150.30
Iteration: 9060, Loss function: 3.722, Average Loss: 3.827, avg. samples / sec: 2150.78
Iteration: 9080, Loss function: 3.669, Average Loss: 3.826, avg. samples / sec: 2152.95
Iteration: 9100, Loss function: 3.885, Average Loss: 3.824, avg. samples / sec: 2157.10
Iteration: 9120, Loss function: 3.812, Average Loss: 3.822, avg. samples / sec: 2152.47
Iteration: 9140, Loss function: 3.616, Average Loss: 3.821, avg. samples / sec: 2157.30
Iteration: 9160, Loss function: 3.746, Average Loss: 3.820, avg. samples / sec: 2154.10
Iteration: 9180, Loss function: 3.720, Average Loss: 3.819, avg. samples / sec: 2157.79
Iteration: 9200, Loss function: 3.797, Average Loss: 3.819, avg. samples / sec: 2157.84
Iteration: 9220, Loss function: 3.834, Average Loss: 3.818, avg. samples / sec: 2150.29
Iteration: 9240, Loss function: 3.834, Average Loss: 3.818, avg. samples / sec: 2156.46
Iteration: 9260, Loss function: 3.996, Average Loss: 3.817, avg. samples / sec: 2155.91
Iteration: 9280, Loss function: 3.829, Average Loss: 3.817, avg. samples / sec: 2149.78
:::MLL 1596481629.673 epoch_stop: {"value": null, "metadata": {"epoch_num": 38, "file": "train.py", "lineno": 819}}
:::MLL 1596481629.673 epoch_start: {"value": null, "metadata": {"epoch_num": 39, "file": "train.py", "lineno": 673}}
Iteration: 9300, Loss function: 3.551, Average Loss: 3.815, avg. samples / sec: 2144.37
Iteration: 9320, Loss function: 3.763, Average Loss: 3.813, avg. samples / sec: 2155.94
Iteration: 9340, Loss function: 3.608, Average Loss: 3.811, avg. samples / sec: 2154.91
Iteration: 9360, Loss function: 3.903, Average Loss: 3.809, avg. samples / sec: 2153.75
Iteration: 9380, Loss function: 3.678, Average Loss: 3.808, avg. samples / sec: 2158.80
Iteration: 9400, Loss function: 3.842, Average Loss: 3.808, avg. samples / sec: 2158.57
Iteration: 9420, Loss function: 3.952, Average Loss: 3.807, avg. samples / sec: 2157.29
Iteration: 9440, Loss function: 3.691, Average Loss: 3.805, avg. samples / sec: 2149.12
Iteration: 9460, Loss function: 3.584, Average Loss: 3.803, avg. samples / sec: 2151.18
Iteration: 9480, Loss function: 3.856, Average Loss: 3.802, avg. samples / sec: 2154.62
Iteration: 9500, Loss function: 4.076, Average Loss: 3.801, avg. samples / sec: 2154.28
Iteration: 9520, Loss function: 4.047, Average Loss: 3.800, avg. samples / sec: 2153.34
:::MLL 1596481684.051 epoch_stop: {"value": null, "metadata": {"epoch_num": 39, "file": "train.py", "lineno": 819}}
:::MLL 1596481684.052 epoch_start: {"value": null, "metadata": {"epoch_num": 40, "file": "train.py", "lineno": 673}}
Iteration: 9540, Loss function: 3.736, Average Loss: 3.799, avg. samples / sec: 2147.06
Iteration: 9560, Loss function: 3.506, Average Loss: 3.797, avg. samples / sec: 2155.64
Iteration: 9580, Loss function: 3.811, Average Loss: 3.794, avg. samples / sec: 2156.30
Iteration: 9600, Loss function: 3.680, Average Loss: 3.793, avg. samples / sec: 2153.81
Iteration: 9620, Loss function: 3.555, Average Loss: 3.790, avg. samples / sec: 2153.64
Iteration: 9640, Loss function: 3.756, Average Loss: 3.790, avg. samples / sec: 2154.70
Iteration: 9660, Loss function: 3.691, Average Loss: 3.788, avg. samples / sec: 2157.81
Iteration: 9680, Loss function: 3.884, Average Loss: 3.788, avg. samples / sec: 2157.56
Iteration: 9700, Loss function: 3.821, Average Loss: 3.787, avg. samples / sec: 2150.14
Iteration: 9720, Loss function: 3.770, Average Loss: 3.787, avg. samples / sec: 2155.03
Iteration: 9740, Loss function: 3.861, Average Loss: 3.786, avg. samples / sec: 2154.66
Iteration: 9760, Loss function: 3.911, Average Loss: 3.786, avg. samples / sec: 2152.98
:::MLL 1596481738.641 epoch_stop: {"value": null, "metadata": {"epoch_num": 40, "file": "train.py", "lineno": 819}}
:::MLL 1596481738.641 epoch_start: {"value": null, "metadata": {"epoch_num": 41, "file": "train.py", "lineno": 673}}
Iteration: 9780, Loss function: 3.736, Average Loss: 3.785, avg. samples / sec: 2150.18
Iteration: 9800, Loss function: 3.696, Average Loss: 3.784, avg. samples / sec: 2153.31
Iteration: 9820, Loss function: 3.861, Average Loss: 3.782, avg. samples / sec: 2156.74
Iteration: 9840, Loss function: 3.831, Average Loss: 3.780, avg. samples / sec: 2154.23
Iteration: 9860, Loss function: 3.656, Average Loss: 3.779, avg. samples / sec: 2153.83
Iteration: 9880, Loss function: 3.852, Average Loss: 3.778, avg. samples / sec: 2151.30
Iteration: 9900, Loss function: 3.906, Average Loss: 3.777, avg. samples / sec: 2153.82
Iteration: 9920, Loss function: 3.596, Average Loss: 3.775, avg. samples / sec: 2155.76
Iteration: 9940, Loss function: 3.843, Average Loss: 3.775, avg. samples / sec: 2151.31
Iteration: 9960, Loss function: 3.740, Average Loss: 3.774, avg. samples / sec: 2153.33
Iteration: 9980, Loss function: 3.755, Average Loss: 3.774, avg. samples / sec: 2145.72
Iteration: 10000, Loss function: 3.956, Average Loss: 3.773, avg. samples / sec: 2153.82
:::MLL 1596481793.051 epoch_stop: {"value": null, "metadata": {"epoch_num": 41, "file": "train.py", "lineno": 819}}
:::MLL 1596481793.051 epoch_start: {"value": null, "metadata": {"epoch_num": 42, "file": "train.py", "lineno": 673}}
Iteration: 10020, Loss function: 3.772, Average Loss: 3.773, avg. samples / sec: 2147.29
Iteration: 10040, Loss function: 4.243, Average Loss: 3.771, avg. samples / sec: 2153.27
Iteration: 10060, Loss function: 3.819, Average Loss: 3.770, avg. samples / sec: 2153.47
Iteration: 10080, Loss function: 3.803, Average Loss: 3.768, avg. samples / sec: 2149.85
Iteration: 10100, Loss function: 3.892, Average Loss: 3.767, avg. samples / sec: 2152.50
Iteration: 10120, Loss function: 3.556, Average Loss: 3.766, avg. samples / sec: 2156.29
Iteration: 10140, Loss function: 3.435, Average Loss: 3.764, avg. samples / sec: 2154.82
Iteration: 10160, Loss function: 3.944, Average Loss: 3.763, avg. samples / sec: 2152.65
Iteration: 10180, Loss function: 3.895, Average Loss: 3.762, avg. samples / sec: 2154.76
Iteration: 10200, Loss function: 3.675, Average Loss: 3.762, avg. samples / sec: 2158.51
Iteration: 10220, Loss function: 4.045, Average Loss: 3.761, avg. samples / sec: 2156.36
Iteration: 10240, Loss function: 3.511, Average Loss: 3.759, avg. samples / sec: 2147.94
Iteration: 10260, Loss function: 3.611, Average Loss: 3.757, avg. samples / sec: 2156.79
:::MLL 1596481847.437 epoch_stop: {"value": null, "metadata": {"epoch_num": 42, "file": "train.py", "lineno": 819}}
:::MLL 1596481847.438 epoch_start: {"value": null, "metadata": {"epoch_num": 43, "file": "train.py", "lineno": 673}}
Iteration: 10280, Loss function: 4.140, Average Loss: 3.755, avg. samples / sec: 2144.01
Iteration: 10300, Loss function: 3.850, Average Loss: 3.753, avg. samples / sec: 2157.82
Iteration: 10320, Loss function: 3.517, Average Loss: 3.752, avg. samples / sec: 2158.52
Iteration: 10340, Loss function: 3.418, Average Loss: 3.751, avg. samples / sec: 2157.78
Iteration: 10360, Loss function: 3.970, Average Loss: 3.751, avg. samples / sec: 2154.23
Iteration: 10380, Loss function: 3.552, Average Loss: 3.749, avg. samples / sec: 2156.86
Iteration: 10400, Loss function: 3.722, Average Loss: 3.746, avg. samples / sec: 2161.42
Iteration: 10420, Loss function: 3.843, Average Loss: 3.745, avg. samples / sec: 2157.12
Iteration: 10440, Loss function: 3.876, Average Loss: 3.745, avg. samples / sec: 2149.52
Iteration: 10460, Loss function: 3.593, Average Loss: 3.744, avg. samples / sec: 2153.87
Iteration: 10480, Loss function: 3.790, Average Loss: 3.743, avg. samples / sec: 2144.03
Iteration: 10500, Loss function: 3.587, Average Loss: 3.743, avg. samples / sec: 2152.93
:::MLL 1596481902.035 epoch_stop: {"value": null, "metadata": {"epoch_num": 43, "file": "train.py", "lineno": 819}}
:::MLL 1596481902.036 epoch_start: {"value": null, "metadata": {"epoch_num": 44, "file": "train.py", "lineno": 673}}
Iteration: 10520, Loss function: 3.704, Average Loss: 3.741, avg. samples / sec: 2152.02
Iteration: 10540, Loss function: 4.021, Average Loss: 3.739, avg. samples / sec: 2160.91
Iteration: 10560, Loss function: 3.829, Average Loss: 3.738, avg. samples / sec: 2155.40
Iteration: 10580, Loss function: 3.565, Average Loss: 3.736, avg. samples / sec: 2160.39
Iteration: 10600, Loss function: 3.554, Average Loss: 3.736, avg. samples / sec: 2157.86
Iteration: 10620, Loss function: 3.736, Average Loss: 3.735, avg. samples / sec: 2151.95
Iteration: 10640, Loss function: 3.682, Average Loss: 3.736, avg. samples / sec: 2154.60
Iteration: 10660, Loss function: 3.653, Average Loss: 3.735, avg. samples / sec: 2154.48
lr decay step #1
:::MLL 1596481937.879 eval_start: {"value": null, "metadata": {"epoch_num": 44, "file": "train.py", "lineno": 276}}
Predicting Ended, total time: 11.73 s
Loading and preparing results...
Loading and preparing results...
Loading and preparing results...
Loading and preparing results...
DONE (t=0.62s)
DONE (t=0.62s)
DONE (t=0.62s)
DONE (t=0.62s)
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=2.86s).
Accumulating evaluation results...
DONE (t=0.00s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.18698
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.33712
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.18925
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.04901
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.19379
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.29924
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.19244
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.27989
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.29607
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.08418
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.31751
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.45562
Current AP: 0.18698 AP goal: 0.23000
:::MLL 1596481953.151 eval_accuracy: {"value": 0.18698260054161284, "metadata": {"epoch_num": 44, "file": "train.py", "lineno": 389}}
:::MLL 1596481953.170 eval_stop: {"value": null, "metadata": {"epoch_num": 44, "file": "train.py", "lineno": 392}}
:::MLL 1596481953.196 block_stop: {"value": null, "metadata": {"first_epoch_num": 33, "file": "train.py", "lineno": 804}}
:::MLL 1596481953.197 block_start: {"value": null, "metadata": {"first_epoch_num": 44, "epoch_count": 5.457677417154162, "file": "train.py", "lineno": 813}}
Iteration: 10680, Loss function: 3.374, Average Loss: 3.733, avg. samples / sec: 485.31
Iteration: 10700, Loss function: 3.837, Average Loss: 3.729, avg. samples / sec: 2148.70
Iteration: 10720, Loss function: 3.472, Average Loss: 3.724, avg. samples / sec: 2156.01
Iteration: 10740, Loss function: 3.465, Average Loss: 3.719, avg. samples / sec: 2150.97
:::MLL 1596481971.706 epoch_stop: {"value": null, "metadata": {"epoch_num": 44, "file": "train.py", "lineno": 819}}
:::MLL 1596481971.706 epoch_start: {"value": null, "metadata": {"epoch_num": 45, "file": "train.py", "lineno": 673}}
Iteration: 10760, Loss function: 3.074, Average Loss: 3.714, avg. samples / sec: 2154.85
Iteration: 10780, Loss function: 3.192, Average Loss: 3.708, avg. samples / sec: 2160.23
Iteration: 10800, Loss function: 3.346, Average Loss: 3.702, avg. samples / sec: 2163.82
Iteration: 10820, Loss function: 3.479, Average Loss: 3.695, avg. samples / sec: 2156.46
Iteration: 10840, Loss function: 3.245, Average Loss: 3.688, avg. samples / sec: 2159.42
Iteration: 10860, Loss function: 3.296, Average Loss: 3.681, avg. samples / sec: 2161.21
Iteration: 10880, Loss function: 3.236, Average Loss: 3.675, avg. samples / sec: 2152.32
Iteration: 10900, Loss function: 3.356, Average Loss: 3.669, avg. samples / sec: 2153.93
Iteration: 10920, Loss function: 3.474, Average Loss: 3.663, avg. samples / sec: 2160.98
Iteration: 10940, Loss function: 3.167, Average Loss: 3.656, avg. samples / sec: 2154.25
Iteration: 10960, Loss function: 3.263, Average Loss: 3.649, avg. samples / sec: 2155.49
Iteration: 10980, Loss function: 3.157, Average Loss: 3.643, avg. samples / sec: 2155.84
:::MLL 1596482026.008 epoch_stop: {"value": null, "metadata": {"epoch_num": 45, "file": "train.py", "lineno": 819}}
:::MLL 1596482026.008 epoch_start: {"value": null, "metadata": {"epoch_num": 46, "file": "train.py", "lineno": 673}}
Iteration: 11000, Loss function: 3.314, Average Loss: 3.636, avg. samples / sec: 2146.86
Iteration: 11020, Loss function: 3.425, Average Loss: 3.630, avg. samples / sec: 2160.22
Iteration: 11040, Loss function: 3.035, Average Loss: 3.624, avg. samples / sec: 2157.23
Iteration: 11060, Loss function: 3.151, Average Loss: 3.617, avg. samples / sec: 2160.22
Iteration: 11080, Loss function: 3.336, Average Loss: 3.612, avg. samples / sec: 2158.81
Iteration: 11100, Loss function: 3.354, Average Loss: 3.606, avg. samples / sec: 2152.26
Iteration: 11120, Loss function: 3.216, Average Loss: 3.601, avg. samples / sec: 2156.93
Iteration: 11140, Loss function: 3.267, Average Loss: 3.595, avg. samples / sec: 2159.12
Iteration: 11160, Loss function: 3.206, Average Loss: 3.588, avg. samples / sec: 2153.06
Iteration: 11180, Loss function: 3.378, Average Loss: 3.583, avg. samples / sec: 2154.33
Iteration: 11200, Loss function: 3.179, Average Loss: 3.579, avg. samples / sec: 2154.16
Iteration: 11220, Loss function: 3.401, Average Loss: 3.572, avg. samples / sec: 2157.33
:::MLL 1596482080.339 epoch_stop: {"value": null, "metadata": {"epoch_num": 46, "file": "train.py", "lineno": 819}}
:::MLL 1596482080.339 epoch_start: {"value": null, "metadata": {"epoch_num": 47, "file": "train.py", "lineno": 673}}
Iteration: 11240, Loss function: 3.034, Average Loss: 3.566, avg. samples / sec: 2145.88
Iteration: 11260, Loss function: 3.199, Average Loss: 3.561, avg. samples / sec: 2154.69
Iteration: 11280, Loss function: 3.202, Average Loss: 3.557, avg. samples / sec: 2154.00
Iteration: 11300, Loss function: 3.447, Average Loss: 3.552, avg. samples / sec: 2151.41
Iteration: 11320, Loss function: 3.014, Average Loss: 3.545, avg. samples / sec: 2154.98
Iteration: 11340, Loss function: 3.353, Average Loss: 3.541, avg. samples / sec: 2150.67
Iteration: 11360, Loss function: 3.385, Average Loss: 3.538, avg. samples / sec: 2158.86
Iteration: 11380, Loss function: 3.330, Average Loss: 3.532, avg. samples / sec: 2155.16
Iteration: 11400, Loss function: 3.344, Average Loss: 3.527, avg. samples / sec: 2151.99
Iteration: 11420, Loss function: 3.172, Average Loss: 3.521, avg. samples / sec: 2154.03
Iteration: 11440, Loss function: 3.556, Average Loss: 3.517, avg. samples / sec: 2145.76
Iteration: 11460, Loss function: 3.269, Average Loss: 3.512, avg. samples / sec: 2157.61
Iteration: 11480, Loss function: 3.199, Average Loss: 3.509, avg. samples / sec: 2150.91
:::MLL 1596482134.962 epoch_stop: {"value": null, "metadata": {"epoch_num": 47, "file": "train.py", "lineno": 819}}
:::MLL 1596482134.963 epoch_start: {"value": null, "metadata": {"epoch_num": 48, "file": "train.py", "lineno": 673}}
Iteration: 11500, Loss function: 3.194, Average Loss: 3.505, avg. samples / sec: 2144.67
Iteration: 11520, Loss function: 3.288, Average Loss: 3.501, avg. samples / sec: 2149.49
Iteration: 11540, Loss function: 3.235, Average Loss: 3.498, avg. samples / sec: 2154.92
Iteration: 11560, Loss function: 3.336, Average Loss: 3.493, avg. samples / sec: 2150.87
Iteration: 11580, Loss function: 3.323, Average Loss: 3.490, avg. samples / sec: 2151.03
Iteration: 11600, Loss function: 3.207, Average Loss: 3.486, avg. samples / sec: 2150.88
Iteration: 11620, Loss function: 3.416, Average Loss: 3.482, avg. samples / sec: 2154.76
Iteration: 11640, Loss function: 3.433, Average Loss: 3.479, avg. samples / sec: 2153.25
Iteration: 11660, Loss function: 3.183, Average Loss: 3.474, avg. samples / sec: 2148.24
Iteration: 11680, Loss function: 3.376, Average Loss: 3.470, avg. samples / sec: 2153.56
Iteration: 11700, Loss function: 3.250, Average Loss: 3.465, avg. samples / sec: 2152.66
Iteration: 11720, Loss function: 3.353, Average Loss: 3.461, avg. samples / sec: 2151.45
:::MLL 1596482189.405 epoch_stop: {"value": null, "metadata": {"epoch_num": 48, "file": "train.py", "lineno": 819}}
:::MLL 1596482189.405 epoch_start: {"value": null, "metadata": {"epoch_num": 49, "file": "train.py", "lineno": 673}}
Iteration: 11740, Loss function: 3.490, Average Loss: 3.456, avg. samples / sec: 2142.28
Iteration: 11760, Loss function: 3.352, Average Loss: 3.453, avg. samples / sec: 2146.70
Iteration: 11780, Loss function: 2.993, Average Loss: 3.448, avg. samples / sec: 2153.76
Iteration: 11800, Loss function: 3.137, Average Loss: 3.445, avg. samples / sec: 2151.94
Iteration: 11820, Loss function: 3.134, Average Loss: 3.442, avg. samples / sec: 2153.78
Iteration: 11840, Loss function: 3.061, Average Loss: 3.437, avg. samples / sec: 2156.19
Iteration: 11860, Loss function: 3.360, Average Loss: 3.433, avg. samples / sec: 2155.59
Iteration: 11880, Loss function: 3.183, Average Loss: 3.428, avg. samples / sec: 2151.52
Iteration: 11900, Loss function: 3.155, Average Loss: 3.425, avg. samples / sec: 2150.51
Iteration: 11920, Loss function: 3.085, Average Loss: 3.422, avg. samples / sec: 2152.15
Iteration: 11940, Loss function: 3.288, Average Loss: 3.418, avg. samples / sec: 2149.24
Iteration: 11960, Loss function: 3.395, Average Loss: 3.416, avg. samples / sec: 2152.41
:::MLL 1596482243.851 epoch_stop: {"value": null, "metadata": {"epoch_num": 49, "file": "train.py", "lineno": 819}}
:::MLL 1596482243.851 epoch_start: {"value": null, "metadata": {"epoch_num": 50, "file": "train.py", "lineno": 673}}
Iteration: 11980, Loss function: 3.206, Average Loss: 3.413, avg. samples / sec: 2139.70
Iteration: 12000, Loss function: 3.097, Average Loss: 3.410, avg. samples / sec: 2149.19
:::MLL 1596482250.571 eval_start: {"value": null, "metadata": {"epoch_num": 50, "file": "train.py", "lineno": 276}}
Predicting Ended, total time: 11.32 s
Loading and preparing results...
Loading and preparing results...
Loading and preparing results...
Loading and preparing results...
DONE (t=0.66s)
DONE (t=0.67s)
DONE (t=0.67s)
DONE (t=0.67s)
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=2.87s).
Accumulating evaluation results...
DONE (t=0.00s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.23580
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.40337
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.24307
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.06381
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.24911
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.37950
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.22684
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.32986
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.34675
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.11024
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.37755
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.52998
Current AP: 0.23580 AP goal: 0.23000
:::MLL 1596482265.493 eval_accuracy: {"value": 0.23579561756133255, "metadata": {"epoch_num": 50, "file": "train.py", "lineno": 389}}
:::MLL 1596482265.589 eval_stop: {"value": null, "metadata": {"epoch_num": 50, "file": "train.py", "lineno": 392}}
:::MLL 1596482265.616 block_stop: {"value": null, "metadata": {"first_epoch_num": 44, "file": "train.py", "lineno": 804}}
:::MLL 1596482266.062 run_stop: {"value": null, "metadata": {"status": "success", "file": "train.py", "lineno": 849}}
Binding: ['/usr/bin/numactl', '--physcpubind=0-3,16-19', '--membind=0', '/opt/conda/bin/python3', '-u', 'train.py', '--local_rank=0', '--use-fp16', '--nhwc', '--pad-input', '--jit', '--delay-allreduce', '--opt-loss', '--epochs', '80', '--warmup-factor', '0', '--no-save', '--threshold=0.23', '--data', '/data/coco2017', '--evaluation', '120000', '160000', '180000', '200000', '220000', '240000', '260000', '280000', '--batch-size', '120', '--eval-batch-size', '160', '--warmup', '650', '--lr', '2.92e-3', '--wd', '1.6e-4', '--use-nvjpeg', '--use-roi-decode']
Binding: ['/usr/bin/numactl', '--physcpubind=4-7,20-23', '--membind=0', '/opt/conda/bin/python3', '-u', 'train.py', '--local_rank=1', '--use-fp16', '--nhwc', '--pad-input', '--jit', '--delay-allreduce', '--opt-loss', '--epochs', '80', '--warmup-factor', '0', '--no-save', '--threshold=0.23', '--data', '/data/coco2017', '--evaluation', '120000', '160000', '180000', '200000', '220000', '240000', '260000', '280000', '--batch-size', '120', '--eval-batch-size', '160', '--warmup', '650', '--lr', '2.92e-3', '--wd', '1.6e-4', '--use-nvjpeg', '--use-roi-decode']
Binding: ['/usr/bin/numactl', '--physcpubind=8-11,24-27', '--membind=0', '/opt/conda/bin/python3', '-u', 'train.py', '--local_rank=2', '--use-fp16', '--nhwc', '--pad-input', '--jit', '--delay-allreduce', '--opt-loss', '--epochs', '80', '--warmup-factor', '0', '--no-save', '--threshold=0.23', '--data', '/data/coco2017', '--evaluation', '120000', '160000', '180000', '200000', '220000', '240000', '260000', '280000', '--batch-size', '120', '--eval-batch-size', '160', '--warmup', '650', '--lr', '2.92e-3', '--wd', '1.6e-4', '--use-nvjpeg', '--use-roi-decode']
Binding: ['/usr/bin/numactl', '--physcpubind=12-15,28-31', '--membind=0', '/opt/conda/bin/python3', '-u', 'train.py', '--local_rank=3', '--use-fp16', '--nhwc', '--pad-input', '--jit', '--delay-allreduce', '--opt-loss', '--epochs', '80', '--warmup-factor', '0', '--no-save', '--threshold=0.23', '--data', '/data/coco2017', '--evaluation', '120000', '160000', '180000', '200000', '220000', '240000', '260000', '280000', '--batch-size', '120', '--eval-batch-size', '160', '--warmup', '650', '--lr', '2.92e-3', '--wd', '1.6e-4', '--use-nvjpeg', '--use-roi-decode']
+ ret_code=0
+ set +x
ENDING TIMING RUN AT 2020-08-03 07:17:50 PM
RESULT,SINGLE_STAGE_DETECTOR,,2747,nvidia,2020-08-03 06:32:03 PM
_CUDA_COMPAT_STATUS=CUDA Driver UNAVAILABLE (cuInit(0) returned 100)
NVIDIA_PYTORCH_VERSION=19.05
MOFED_VERSION=4.4-1.0.0
COCOAPI_VERSION=2.0+nv0.3.1
CUDNN_VERSION=7.6.0.64
HOSTNAME=ssd
DATADIR=/ocs-ml-data/coco
NVIDIA_REQUIRE_CUDA=cuda>=5.0
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT=tcp://172.30.0.1:443
TERM=xterm
NSIGHT_SYSTEMS_VERSION=2019.3.1
CUBLAS_VERSION=10.2.0.163
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_HOST=172.30.0.1
NEXP=1
LC_ALL=C.UTF-8
PYTHONIOENCODING=utf-8
LD_LIBRARY_PATH=/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NVIDIA_VISIBLE_DEVICES=all
ENV=/etc/shinit
_CUDA_COMPAT_PATH=/usr/local/cuda/compat
CUDA_CACHE_DISABLE=1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
TRT_VERSION=5.1.5.0
CUDA_DRIVER_VERSION=418.67
NVIDIA_BUILD_ID=6411784
PATH=/opt/conda/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/workspace/single_stage_detector
PYTORCH_VERSION=1.1.0a0+828a6a3
PYTORCH_BUILD_VERSION=1.1.0a0+828a6a3
CUDA_VERSION=10.1.163
OMPI_MCA_btl_vader_single_copy_mechanism=none
SHLVL=1
HOME=/root
DALI_VERSION=0.9.1
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
DALI_BUILD=719215
OPENMPI_VERSION=3.1.3
NCCL_VERSION=2.4.6
INSLURM=0
BASH_ENV=/etc/bash.bashrc
LOGDIR=/ocs-ml-data/logs
NSS_SDB_USE_CACHE=no
OPENCV_FOR_THREADS_NUM=1
OMP_NUM_THREADS=1
PYTORCH_BUILD_NUMBER=0
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443
_=/usr/bin/printenv
Run vars: id 28869 gpus 4 mparams
STARTING TIMING RUN AT 2020-07-30 11:40:05 PM
+ NUMEPOCHS=80
running benchmark
+ echo 'running benchmark'
+ export DATASET_DIR=/data/coco2017
+ DATASET_DIR=/data/coco2017
+ export TORCH_MODEL_ZOO=/data/torchvision
+ TORCH_MODEL_ZOO=/data/torchvision
+ python3 -m bind_launch --nsockets_per_node 1 --ncores_per_socket 16 --nproc_per_node 4 train.py --use-fp16 --nhwc --pad-input --jit --delay-allreduce --opt-loss --epochs 80 --warmup-factor 0 --no-save --threshold=0.23 --data /data/coco2017 --evaluation 120000 160000 180000 200000 220000 240000 260000 280000 --batch-size 120 --eval-batch-size 160 --warmup 650 --lr 2.92e-3 --wd 1.6e-4 --use-nvjpeg --use-roi-decode
:::MLL 1596152406.833 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
:::MLL 1596152406.835 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
:::MLL 1596152406.835 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
BN group: 1
BN group: 1
BN group: 1
:::MLL 1596152406.839 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 833}}
BN group: 1
0 Using seed = 4100562049
1 Using seed = 4100562050
2 Using seed = 4100562051
3 Using seed = 4100562052
:::MLL 1596152411.996 max_samples: {"value": 1, "metadata": {"file": "utils.py", "lineno": 465}}
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /data/torchvision/resnet34-333f7ec4.pth
87306240it [00:01, 49622227.05it/s]
87306240it [00:01, 49923219.48it/s]
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Delaying allreduces to the end of backward()
:::MLL 1596152414.605 model_bn_span: {"value": 120, "metadata": {"file": "train.py", "lineno": 480}}
:::MLL 1596152414.605 global_batch_size: {"value": 480, "metadata": {"file": "train.py", "lineno": 481}}
:::MLL 1596152414.613 opt_base_learning_rate: {"value": 0.045, "metadata": {"file": "train.py", "lineno": 511}}
:::MLL 1596152414.614 opt_weight_decay: {"value": 0.00016, "metadata": {"file": "train.py", "lineno": 513}}
:::MLL 1596152414.614 opt_learning_rate_warmup_steps: {"value": 650, "metadata": {"file": "train.py", "lineno": 516}}
:::MLL 1596152414.615 opt_learning_rate_warmup_factor: {"value": 0, "metadata": {"file": "train.py", "lineno": 518}}
87306240it [00:00, 97183120.00it/s]
87306240it [00:00, 94301685.86it/s]
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
epoch nbatch loss
:::MLL 1596152423.319 init_stop: {"value": null, "metadata": {"file": "train.py", "lineno": 604}}
:::MLL 1596152423.320 run_start: {"value": null, "metadata": {"file": "train.py", "lineno": 610}}
loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
Done (t=0.58s)
creating index...
Done (t=0.62s)
creating index...
Done (t=0.65s)
creating index...
Done (t=0.65s)
creating index...
time_check a: 1596152425.581552744
Run vars: id 18702 gpus 4 mparams
+ SEED=15157
+ MAX_TOKENS=10240
+ DATASET_DIR=/data/data/mlperf/data/transformer/wmt14_en_de/utf8
+ MODE=TRAIN
+ NUMEPOCHS=30
+ case "$MODE" in
+ source run_training.sh
+++ date +%s
++ START=1596473651
+++ date '+%Y-%m-%d %r'
++ START_FMT='2020-08-03 04:54:11 PM'
STARTING TIMING RUN AT 2020-08-03 04:54:11 PM
++ echo 'STARTING TIMING RUN AT 2020-08-03 04:54:11 PM'
++ [[ 4 -ne 1 ]]
++ DISTRIBUTED_INIT_METHOD='--distributed-init-method env://'
++ export DGXSYSTEM
++ export SLURM_NTASKS_PER_NODE
++ export SLURM_NNODES
++ export MLPERF_HOST_OS
++ python3 -m bind_launch --nsockets_per_node 1 --ncores_per_socket 16 --nproc_per_node 4 train.py /data/data/mlperf/data/transformer/wmt14_en_de/utf8 --seed 15157 --arch transformer_wmt_en_de_big_t2t --share-all-embeddings --optimizer adam --adam-betas '(0.9, 0.997)' --adam-eps 1e-9 --clip-norm 0.0 --lr-scheduler inverse_sqrt --warmup-init-lr 0.0 --warmup-updates 1000 --lr 1.976e-3 --min-lr 0.0 --dropout 0.1 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 10240 --max-epoch 30 --target-bleu 25.0 --ignore-case --no-save --update-freq 1 --fp16 --seq-len-multiple 2 --softmax-type fast_fill --source_lang en --target_lang de --bucket_growth_factor 1.035 --batching_scheme v0p5_better --batch_multiple_strategy dynamic --fast-xentropy --max-len-a 1 --max-len-b 50 --lenpen 0.6 --distributed-init-method env:// --max-source-positions 64 --max-target-positions 64 --enable-parallel-backward-allred-opt --parallel-backward-allred-opt-threshold 105404416 --parallel-backward-allred-cuda-nstreams 2 --adam-betas '(0.9,0.98)'
| distributed init (rank 0): env://
| distributed env init. MASTER_ADDR: 127.0.0.1, MASTER_PORT: 29500, WORLD_SIZE: 4, RANK: 1
| distributed init (rank 0): env://
| distributed env init. MASTER_ADDR: 127.0.0.1, MASTER_PORT: 29500, WORLD_SIZE: 4, RANK: 3
| distributed init (rank 0): env://
| distributed env init. MASTER_ADDR: 127.0.0.1, MASTER_PORT: 29500, WORLD_SIZE: 4, RANK: 2
| distributed init (rank 0): env://
| distributed env init. MASTER_ADDR: 127.0.0.1, MASTER_PORT: 29500, WORLD_SIZE: 4, RANK: 0
| distributed init done!
| distributed init done!
| distributed init done!
:::MLL 1596473656.039 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 57}}
| distributed init done!
| initialized host transformer as rank 0 and device id 0
:::MLL 1596473656.048 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 57}}
:::MLL 1596473656.063 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 57}}
:::MLL 1596473656.074 init_start: {"value": null, "metadata": {"file": "train.py", "lineno": 57}}
Namespace(adam_betas='(0.9,0.98)', adam_eps=1e-09, adaptive_softmax_cutoff=None, arch='transformer_wmt_en_de_big_t2t', attention_dropout=0.1, batch_multiple_strategy='dynamic', batching_scheme='v0p5_better', beam=4, bucket_growth_factor=1.035, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data='/data/data/mlperf/data/transformer/wmt14_en_de/utf8', dataloader_num_workers=1, decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, device_id=0, distributed_backend='nccl', distributed_init_method='env://', distributed_port=-1, distributed_rank=0, distributed_world_size=4, dropout=0.1, enable_dataloader_pin_memory=False, enable_parallel_backward_allred_opt=True, enable_parallel_backward_allred_opt_correctness_check=False, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fast_xentropy=True, fp16=True, fuse_dropout_add=False, fuse_relu_dropout=False, gen_subset='test', ignore_case=True, keep_interval_updates=-1, label_smoothing=0.1, left_pad_source='True', left_pad_target='False', lenpen=0.6, local_rank=0, log_format=None, log_interval=1000, log_translations=False, lr=[0.001976], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=30, max_len_a=1.0, max_len_b=50, max_sentences=None, max_sentences_valid=None, max_source_positions=64, max_target_positions=64, max_tokens=10240, max_update=0, min_len=1, min_loss_scale=0.0001, min_lr=0.0, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_progress_bar=False, no_save=True, no_token_positional_embeddings=False, num_shards=1, online_eval=False, optimizer='adam', parallel_backward_allred_cuda_nstreams=2, parallel_backward_allred_opt_threshold=105404416, path=None, prefix_size=0, print_alignment=False, profile=None, quiet=False, raw_text=False, relu_dropout=0.1, remove_bpe=None, replace_unk=None, restore_file='checkpoint_last.pt', sampling=False, sampling_temperature=1, sampling_topk=-1, save_dir='checkpoints', save_interval=1, save_interval_updates=0, score_reference=False, seed=15157, sentence_avg=False, seq_len_multiple=2, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, softmax_type='fast_fill', source_lang='en', target_bleu=25.0, target_lang='de', task='translation', train_subset='train', unkpen=0, unnormalized=False, update_freq=[1], valid_subset='valid', validate_interval=1, warmup_init_lr=0.0, warmup_updates=1000, weight_decay=0.0)
:::MLL 1596473660.411 global_batch_size: {"value": 40960, "metadata": {"file": "train.py", "lineno": 74}}
:::MLL 1596473660.411 opt_name: {"value": "adam", "metadata": {"file": "train.py", "lineno": 75}}
:::MLL 1596473660.412 opt_base_learning_rate: {"value": 0.001976, "metadata": {"file": "train.py", "lineno": 77}}
:::MLL 1596473660.412 opt_learning_rate_warmup_steps: {"value": 1000, "metadata": {"file": "train.py", "lineno": 78}}
:::MLL 1596473660.413 max_sequence_length: {"value": 64, "metadata": {"file": "train.py", "lineno": 80}}
:::MLL 1596473660.413 opt_adam_beta_1: {"value": 0.9, "metadata": {"file": "train.py", "lineno": 81}}
:::MLL 1596473660.414 opt_adam_beta_2: {"value": 0.98, "metadata": {"file": "train.py", "lineno": 82}}
:::MLL 1596473660.414 opt_adam_epsilon: {"value": 1e-09, "metadata": {"file": "train.py", "lineno": 83}}
| [en] dictionary: 33712 types
| [de] dictionary: 33712 types
| model transformer_wmt_en_de_big_t2t, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 210808832
| parallel all-reduce ENABLED. all-reduce threshold: 105404416
| # of parallel all-reduce cuda streams: 2
| training on 4 GPUs
| max tokens per GPU = 10240 and max sentences per GPU = None
:::MLL 1596473666.564 init_stop: {"value": null, "metadata": {"file": "train.py", "lineno": 140}}
:::MLL 1596473666.564 run_start: {"value": null, "metadata": {"file": "train.py", "lineno": 142}}
filename: /data/data/mlperf/data/transformer/wmt14_en_de/utf8/train.en-de.en
raw_text: False
| /data/data/mlperf/data/transformer/wmt14_en_de/utf8 train 4590101 examples
filename: /data/data/mlperf/data/transformer/wmt14_en_de/utf8/train1.en-de.en
raw_text: False
filename: /data/data/mlperf/data/transformer/wmt14_en_de/utf8/train1.de-en.en
raw_text: False
srcline: tensor([16407, 337, 10148, 7, 5, 10701, 7796, 31212, 6, 8991, 10, 10148, 400, 59, 5, 8991, 4, 2])
| Sentences are being padded to multiples of: 2
filename: /data/data/mlperf/data/transformer/wmt14_en_de/utf8/test.en-de.en
raw_text: False
| /data/data/mlperf/data/transformer/wmt14_en_de/utf8 test 3003 examples
srcline: tensor([ 7549, 4344, 64, 32364, 1259, 20, 13504, 8959, 3868, 2])
| Sentences are being padded to multiples of: 2
filename: /data/data/mlperf/data/transformer/wmt14_en_de/utf8/test1.en-de.en
raw_text: False
filename: /data/data/mlperf/data/transformer/wmt14_en_de/utf8/test1.de-en.en
raw_text: False
:::MLL 1596473667.558 block_start: {"value": null, "metadata": {"first_epoch_num": 1, "epoch_count": 1, "file": "train.py", "lineno": 162}}
:::MLL 1596473667.559 epoch_start: {"value": null, "metadata": {"epoch_num": 1, "file": "train.py", "lineno": 163}}
generated 13089 batches in 2.031663s
got epoch iterator 2.032055377960205
| WARNING: overflow detected, setting loss scale to: 64.0
| WARNING: overflow detected, setting loss scale to: 32.0
| WARNING: overflow detected, setting loss scale to: 16.0
| WARNING: overflow detected, setting loss scale to: 8.0
| epoch 001: 1000 / 3273 loss=8.138, nll_loss=0.000, ppl=1.00, wps=160773, ups=4.2, wpb=37457, bsz=1283, num_updates=997, lr=0.00197007, gnorm=77574.494, clip=100%, oom=0, loss_scale=8.000, wall=238
| epoch 001: 2000 / 3273 loss=6.955, nll_loss=0.000, ppl=1.00, wps=160512, ups=4.2, wpb=37442, bsz=1288, num_updates=1997, lr=0.00139829, gnorm=56396.657, clip=100%, oom=0, loss_scale=8.000, wall=471
| WARNING: overflow detected, setting loss scale to: 4.0
| epoch 001: 3000 / 3273 loss=6.418, nll_loss=0.000, ppl=1.00, wps=160372, ups=4.3, wpb=37412, bsz=1296, num_updates=2996, lr=0.00114161, gnorm=43275.777, clip=100%, oom=0, loss_scale=4.000, wall=704
| epoch 001 | loss 6.309 | nll_loss 0.000 | ppl 1.00 | wps 160330 | ups 4.3 | wpb 37406 | bsz 1299 | num_updates 3268 | lr 0.00109306 | gnorm 40648.718 | clip 100% | oom 0 | loss_scale 4.000 | wall 768
epoch time 762.752370595932
:::MLL 1596474432.344 epoch_stop: {"value": null, "metadata": {"epoch_num": 1, "file": "train.py", "lineno": 201}}
:::MLL 1596474432.345 eval_start: {"value": null, "metadata": {"epoch_num": 1, "file": "train.py", "lineno": 547}}
generated 51 batches in 0.000890s
| Translated 816 sentences (21857 tokens) in 10.6s (76.82 sentences/s, 2057.77 tokens/s)
| Generate test with beam=4: bleu_score=18.5831
| Eval completed in: 15.62s
:::MLL 1596474447.966 eval_stop: {"value": null, "metadata": {"epoch_num": 1, "file": "train.py", "lineno": 658}}
:::MLL 1596474447.970 eval_accuracy: {"value": "18.58307123184204", "metadata": {"epoch_num": 1, "file": "train.py", "lineno": 211}}
validation and scoring 15.626490831375122
:::MLL 1596474448.014 block_stop: {"value": null, "metadata": {"first_epoch_num": 1, "file": "train.py", "lineno": 226}}
:::MLL 1596474448.015 block_start: {"value": null, "metadata": {"first_epoch_num": 2, "epoch_count": 1, "file": "train.py", "lineno": 162}}
:::MLL 1596474448.015 epoch_start: {"value": null, "metadata": {"epoch_num": 2, "file": "train.py", "lineno": 163}}
generated 13089 batches in 2.050562s
got epoch iterator 2.1360983848571777
| epoch 002: 1000 / 3273 loss=4.945, nll_loss=0.000, ppl=1.00, wps=160051, ups=4.0, wpb=37326, bsz=1293, num_updates=4269, lr=0.000956365, gnorm=33791.614, clip=100%, oom=0, loss_scale=8.000, wall=1019
| epoch 002: 2000 / 3273 loss=4.867, nll_loss=0.000, ppl=1.00, wps=160079, ups=4.1, wpb=37365, bsz=1304, num_updates=5269, lr=0.000860841, gnorm=30508.502, clip=100%, oom=0, loss_scale=8.000, wall=1253
| WARNING: overflow detected, setting loss scale to: 4.0
| epoch 002: 3000 / 3273 loss=4.805, nll_loss=0.000, ppl=1.00, wps=160147, ups=4.2, wpb=37406, bsz=1300, num_updates=6268, lr=0.000789264, gnorm=26977.435, clip=100%, oom=0, loss_scale=4.000, wall=1487
| epoch 002 | loss 4.793 | nll_loss 0.000 | ppl 1.00 | wps 160188 | ups 4.2 | wpb 37407 | bsz 1299 | num_updates 6540 | lr 0.000772677 | gnorm 26155.442 | clip 100% | oom 0 | loss_scale 4.000 | wall 1550
epoch time 764.170138835907
:::MLL 1596475214.323 epoch_stop: {"value": null, "metadata": {"epoch_num": 2, "file": "train.py", "lineno": 201}}
:::MLL 1596475214.323 eval_start: {"value": null, "metadata": {"epoch_num": 2, "file": "train.py", "lineno": 547}}
generated 51 batches in 0.000887s
| Translated 816 sentences (23102 tokens) in 10.9s (74.67 sentences/s, 2113.88 tokens/s)
| Generate test with beam=4: bleu_score=21.8538
| Eval completed in: 15.71s
:::MLL 1596475230.037 eval_stop: {"value": null, "metadata": {"epoch_num": 2, "file": "train.py", "lineno": 658}}
:::MLL 1596475230.040 eval_accuracy: {"value": "21.853771805763245", "metadata": {"epoch_num": 2, "file": "train.py", "lineno": 211}}
validation and scoring 15.719095230102539
:::MLL 1596475230.085 block_stop: {"value": null, "metadata": {"first_epoch_num": 2, "file": "train.py", "lineno": 226}}
:::MLL 1596475230.085 block_start: {"value": null, "metadata": {"first_epoch_num": 3, "epoch_count": 1, "file": "train.py", "lineno": 162}}
:::MLL 1596475230.086 epoch_start: {"value": null, "metadata": {"epoch_num": 3, "file": "train.py", "lineno": 163}}
generated 13089 batches in 2.044298s
got epoch iterator 2.129481554031372
| epoch 003: 1000 / 3273 loss=4.535, nll_loss=0.000, ppl=1.00, wps=160515, ups=4.0, wpb=37430, bsz=1292, num_updates=7541, lr=0.000719569, gnorm=23792.844, clip=100%, oom=0, loss_scale=8.000, wall=1801
| epoch 003: 2000 / 3273 loss=4.528, nll_loss=0.000, ppl=1.00, wps=160300, ups=4.1, wpb=37411, bsz=1295, num_updates=8541, lr=0.000676134, gnorm=22589.425, clip=100%, oom=0, loss_scale=8.000, wall=2035
| WARNING: overflow detected, setting loss scale to: 8.0
| epoch 003: 3000 / 3273 loss=4.517, nll_loss=0.000, ppl=1.00, wps=160201, ups=4.2, wpb=37405, bsz=1298, num_updates=9540, lr=0.000639754, gnorm=21896.716, clip=100%, oom=0, loss_scale=16.000, wall=2268
| WARNING: overflow detected, setting loss scale to: 4.0
| epoch 003 | loss 4.513 | nll_loss 0.000 | ppl 1.00 | wps 160132 | ups 4.2 | wpb 37407 | bsz 1299 | num_updates 9811 | lr 0.000630856 | gnorm 21551.439 | clip 100% | oom 0 | loss_scale 4.000 | wall 2332
epoch time 764.1955001354218
:::MLL 1596475996.412 epoch_stop: {"value": null, "metadata": {"epoch_num": 3, "file": "train.py", "lineno": 201}}
:::MLL 1596475996.413 eval_start: {"value": null, "metadata": {"epoch_num": 3, "file": "train.py", "lineno": 547}}
generated 51 batches in 0.000853s
| Translated 816 sentences (22758 tokens) in 9.8s (82.85 sentences/s, 2310.62 tokens/s)
| Generate test with beam=4: bleu_score=23.0341
| Eval completed in: 14.65s
:::MLL 1596476011.068 eval_stop: {"value": null, "metadata": {"epoch_num": 3, "file": "train.py", "lineno": 658}}
:::MLL 1596476011.071 eval_accuracy: {"value": "23.034103214740753", "metadata": {"epoch_num": 3, "file": "train.py", "lineno": 211}}
validation and scoring 14.660557985305786
:::MLL 1596476011.115 block_stop: {"value": null, "metadata": {"first_epoch_num": 3, "file": "train.py", "lineno": 226}}
:::MLL 1596476011.116 block_start: {"value": null, "metadata": {"first_epoch_num": 4, "epoch_count": 1, "file": "train.py", "lineno": 162}}
:::MLL 1596476011.116 epoch_start: {"value": null, "metadata": {"epoch_num": 4, "file": "train.py", "lineno": 163}}
generated 13089 batches in 2.007486s
got epoch iterator 2.0928962230682373
| epoch 004: 1000 / 3273 loss=4.408, nll_loss=0.000, ppl=1.00, wps=161118, ups=4.0, wpb=37559, bsz=1296, num_updates=10812, lr=0.000600944, gnorm=20166.925, clip=100%, oom=0, loss_scale=4.000, wall=2582
| epoch 004: 2000 / 3273 loss=4.400, nll_loss=0.000, ppl=1.00, wps=160688, ups=4.1, wpb=37479, bsz=1291, num_updates=11812, lr=0.000574944, gnorm=19094.401, clip=100%, oom=0, loss_scale=8.000, wall=2816
| epoch 004: 3000 / 3273 loss=4.381, nll_loss=0.000, ppl=1.00, wps=160398, ups=4.2, wpb=37441, bsz=1298, num_updates=12812, lr=0.00055205, gnorm=18578.995, clip=100%, oom=0, loss_scale=8.000, wall=3049
| epoch 004 | loss 4.381 | nll_loss 0.000 | ppl 1.00 | wps 160280 | ups 4.2 | wpb 37408 | bsz 1299 | num_updates 13084 | lr 0.000546282 | gnorm 18450.025 | clip 100% | oom 0 | loss_scale 8.000 | wall 3113
epoch time 764.0049517154694
:::MLL 1596476777.215 epoch_stop: {"value": null, "metadata": {"epoch_num": 4, "file": "train.py", "lineno": 201}}
:::MLL 1596476777.216 eval_start: {"value": null, "metadata": {"epoch_num": 4, "file": "train.py", "lineno": 547}}
generated 51 batches in 0.000880s
| Translated 816 sentences (23266 tokens) in 10.8s (75.50 sentences/s, 2152.76 tokens/s)
| Generate test with beam=4: bleu_score=23.8039
| Eval completed in: 15.64s
:::MLL 1596476792.861 eval_stop: {"value": null, "metadata": {"epoch_num": 4, "file": "train.py", "lineno": 658}}
:::MLL 1596476792.865 eval_accuracy: {"value": "23.80388379096985", "metadata": {"epoch_num": 4, "file": "train.py", "lineno": 211}}
validation and scoring 15.650740385055542
:::MLL 1596476792.909 block_stop: {"value": null, "metadata": {"first_epoch_num": 4, "file": "train.py", "lineno": 226}}
:::MLL 1596476792.910 block_start: {"value": null, "metadata": {"first_epoch_num": 5, "epoch_count": 1, "file": "train.py", "lineno": 162}}
:::MLL 1596476792.910 epoch_start: {"value": null, "metadata": {"epoch_num": 5, "file": "train.py", "lineno": 163}}
generated 13089 batches in 2.049533s
got epoch iterator 2.136399030685425
| epoch 005: 1000 / 3273 loss=4.301, nll_loss=0.000, ppl=1.00, wps=160522, ups=4.0, wpb=37456, bsz=1287, num_updates=14085, lr=0.000526512, gnorm=18367.411, clip=100%, oom=0, loss_scale=16.000, wall=3364
| WARNING: overflow detected, setting loss scale to: 8.0
| WARNING: overflow detected, setting loss scale to: 4.0
| epoch 005: 2000 / 3273 loss=4.299, nll_loss=0.000, ppl=1.00, wps=160286, ups=4.1, wpb=37445, bsz=1294, num_updates=15083, lr=0.000508795, gnorm=18251.750, clip=100%, oom=0, loss_scale=4.000, wall=3598
| epoch 005: 3000 / 3273 loss=4.300, nll_loss=0.000, ppl=1.00, wps=160148, ups=4.2, wpb=37401, bsz=1296, num_updates=16083, lr=0.000492724, gnorm=17491.908, clip=100%, oom=0, loss_scale=4.000, wall=3831
| epoch 005 | loss 4.298 | nll_loss 0.000 | ppl 1.00 | wps 160192 | ups 4.2 | wpb 37409 | bsz 1298 | num_updates 16355 | lr 0.000488609 | gnorm 17302.823 | clip 100% | oom 0 | loss_scale 4.000 | wall 3895
epoch time 763.9332764148712
:::MLL 1596477558.982 epoch_stop: {"value": null, "metadata": {"epoch_num": 5, "file": "train.py", "lineno": 201}}
:::MLL 1596477558.983 eval_start: {"value": null, "metadata": {"epoch_num": 5, "file": "train.py", "lineno": 547}}
generated 51 batches in 0.000868s
| Translated 816 sentences (22549 tokens) in 9.3s (88.15 sentences/s, 2435.95 tokens/s)
| Generate test with beam=4: bleu_score=25.0238
| Eval completed in: 14.02s
:::MLL 1596477573.008 eval_stop: {"value": null, "metadata": {"epoch_num": 5, "file": "train.py", "lineno": 658}}
:::MLL 1596477573.012 eval_accuracy: {"value": "25.023803114891052", "metadata": {"epoch_num": 5, "file": "train.py", "lineno": 211}}
validation and scoring 14.031888246536255
:::MLL 1596477573.055 block_stop: {"value": null, "metadata": {"first_epoch_num": 5, "file": "train.py", "lineno": 226}}
:::MLL 1596477573.056 run_stop: {"value": null, "metadata": {"status": "success", "file": "train.py", "lineno": 231}}
| done training in 3907.1 seconds
++ ret_code=0
++ sleep 3
++ [[ 0 != 0 ]]
+++ date +%s
++ END=1596477577
+++ date '+%Y-%m-%d %r'
++ END_FMT='2020-08-03 05:59:37 PM'
ENDING TIMING RUN AT 2020-08-03 05:59:37 PM
++ echo 'ENDING TIMING RUN AT 2020-08-03 05:59:37 PM'
++ RESULT=3926
++ RESULT_NAME=transformer
++ echo 'RESULT,transformer,15157,3926,,2020-08-03 04:54:11 PM'
RESULT,transformer,15157,3926,,2020-08-03 04:54:11 PM
+ set +x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment