Created
April 29, 2020 09:13
-
-
Save jramapuram/b877d5fa97c5bc50dc53ec18d5038391 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 | |
train-0[Epoch 1][1280768 samples][849.67 sec]: Loss: 7.0388 Top-1: 0.1027 Top-5: 0.4965 | |
test-0[Epoch 1][50176 samples][17.05 sec]: Loss: 6.9965 Top-1: 0.1016 Top-5: 0.4604 | |
/home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:114: UserWarning: Seems like `optimizer.step()` has been ov | |
erridden after learning rate scheduler initialization. Please, make sure to call `optimizer.step()` before `lr_scheduler.step()`. See more details at https://py | |
torch.org/docs/stable/optim.html#how-to-adjust-learning-rate | |
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 | |
train-0[Epoch 2][1281280 samples][851.96 sec]: Loss: 5.2698 Top-1: 8.3982 Top-5: 20.8343 | |
test-0[Epoch 2][50176 samples][16.72 sec]: Loss: 4.0580 Top-1: 18.9772 Top-5: 41.2129 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 | |
train-0[Epoch 3][1281280 samples][848.86 sec]: Loss: 3.9013 Top-1: 22.7465 Top-5: 44.8709 | |
test-0[Epoch 3][50176 samples][17.22 sec]: Loss: 3.6010 Top-1: 26.4190 Top-5: 50.2671 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 | |
train-0[Epoch 4][1281280 samples][852.70 sec]: Loss: 3.3167 Top-1: 31.4567 Top-5: 55.7103 | |
test-0[Epoch 4][50176 samples][17.07 sec]: Loss: 2.9855 Top-1: 35.9196 Top-5: 61.6071 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0 | |
train-0[Epoch 5][1281280 samples][850.95 sec]: Loss: 2.9109 Top-1: 38.2023 Top-5: 62.8001 | |
test-0[Epoch 5][50176 samples][17.12 sec]: Loss: 2.4874 Top-1: 44.2821 Top-5: 70.0155 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 | |
train-0[Epoch 6][1281280 samples][852.87 sec]: Loss: 2.6764 Top-1: 42.3411 Top-5: 66.7361 | |
test-0[Epoch 6][50176 samples][17.10 sec]: Loss: 2.6723 Top-1: 41.9703 Top-5: 67.0819 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 | |
train-0[Epoch 7][1281280 samples][853.50 sec]: Loss: 2.5180 Top-1: 45.1213 Top-5: 69.3008 | |
test-0[Epoch 7][50176 samples][16.95 sec]: Loss: 2.2402 Top-1: 49.1291 Top-5: 74.2427 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 | |
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 131072.0 | |
Traceback (most recent call last): | |
File "supervised_main.py", line 636, in <module> | |
run(rank=0, num_replicas=args.num_replicas) | |
File "supervised_main.py", line 602, in run | |
train(epoch, model, optimizer, loader.train_loader, grapher) | |
File "supervised_main.py", line 529, in train | |
return execute_graph(epoch, model, train_loader, grapher, optimizer, prefix='train') | |
File "supervised_main.py", line 448, in execute_graph | |
for minibatch, labels in loader: | |
File "/home/jramapuram/sshfs/ml_base/datasets/dali_imagefolder.py", line 230, in __next__ | |
sample = super(DALIClassificationIteratorLikePytorch, self).__next__() | |
File "/home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 177, in __next__ | |
outputs.append(p.share_outputs()) | |
File "/home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 410, in share_outputs | |
return self._pipe.ShareOutputs() | |
RuntimeError: Critical error in pipeline: [/opt/dali/dali/util/local_file.cc:105] File mapping failed: /datasets/imagenet/ILSVRC/Data/CLS-LOC/train/n01601694/n0 | |
1601694_13136.JPG | |
Stacktrace (9 entries): | |
[frame 0]: /home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x6ab7e) [0x7f6b188acb7e] | |
[frame 1]: /home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x1772b4) [0x7f6b189b92b4] | |
[frame 2]: /home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::FileStream::Open(std::string const&, bool)+0xfb | |
) [0x7f6b189ac0eb] | |
[frame 3]: /home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x12effea) [0x7f6af5599fea] | |
[frame 4]: /home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x133454a) [0x7f6af55de54a] | |
[frame 5]: /home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x1335d25) [0x7f6af55dfd25] | |
[frame 6]: /home/jramapuram/.venv3/envs/pytorch1.5-py37/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x18d0bb0) [0x7f6af5b7abb0] | |
[frame 7]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f6b87cf8609] | |
[frame 8]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f6b87c1f103] | |
Current pipeline object is no longer valid. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment