Compiling Model...
/localdata/evaw/workspace/venv/poplar_sdk-ubuntu_18_04-2.6.0+1074-33d3efd05d/2.6.0+1074_poptorch/lib/python3.6/site-packages/transformers/models/vit/modeling_vit.py:186: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height != self.image_size[0] or width != self.image_size[1]:
Graph compilation: 100%|██████████| 100/100 [00:15<00:00]
Compiled/Loaded model in 32.70255442708731 secs
***** Running training *****
  Num examples = 106514
  Num Epochs = 3
  Instantaneous batch size per device = 1
  Device Iterations = 1
  Replication Factor = 1
  Gradient Accumulation steps = 128
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Total optimization steps = 2496
 40%|████      | 1000/2496 [06:59<10:13,  2.44it/s]Saving model checkpoint to ./results/checkpoint-1000
---------- Device Allocation -----------
Embedding  --> IPU 0
Encoder 0  --> IPU 0
Encoder 1  --> IPU 0
Encoder 2  --> IPU 0
Encoder 3  --> IPU 1
Encoder 4  --> IPU 1
Encoder 5  --> IPU 1
Encoder 6  --> IPU 2
Encoder 7  --> IPU 2
Encoder 8  --> IPU 2
Encoder 9  --> IPU 3
Encoder 10 --> IPU 3
Encoder 11 --> IPU 3
Head       --> IPU 3
---------------------------------------
Configuration saved in ./results/checkpoint-1000/ipu_config.json
 80%|████████  | 2000/2496 [14:04<03:26,  2.40it/s]Saving model checkpoint to ./results/checkpoint-2000
---------- Device Allocation -----------
Embedding  --> IPU 0
Encoder 0  --> IPU 0
Encoder 1  --> IPU 0
Encoder 2  --> IPU 0
Encoder 3  --> IPU 1
Encoder 4  --> IPU 1
Encoder 5  --> IPU 1
Encoder 6  --> IPU 2
Encoder 7  --> IPU 2
Encoder 8  --> IPU 2
Encoder 9  --> IPU 3
Encoder 10 --> IPU 3
Encoder 11 --> IPU 3
Head       --> IPU 3
---------------------------------------
Configuration saved in ./results/checkpoint-2000/ipu_config.json
100%|██████████| 2496/2496 [17:37<00:00,  2.47it/s]

Training completed. Do not forget to share your model on huggingface.co/models =)


100%|██████████| 2496/2496 [17:37<00:00,  2.36it/s]
{'loss': 0.6216, 'learning_rate': 1.602564102564103e-05, 'epoch': 0.06}
{'loss': 0.4267, 'learning_rate': 3.205128205128206e-05, 'epoch': 0.12}
{'loss': 0.3673, 'learning_rate': 4.8076923076923084e-05, 'epoch': 0.18}
{'loss': 0.3178, 'learning_rate': 6.410256410256412e-05, 'epoch': 0.24}
{'loss': 0.2707, 'learning_rate': 8.012820512820514e-05, 'epoch': 0.3}
{'loss': 0.2589, 'learning_rate': 9.615384615384617e-05, 'epoch': 0.36}
{'loss': 0.2541, 'learning_rate': 0.00011217948717948718, 'epoch': 0
...
: 0.1613, 'learning_rate': 8.401392014073405e-06, 'epoch': 2.7}
{'loss': 0.1605, 'learning_rate': 5.361064379673464e-06, 'epoch': 2.76}
{'loss': 0.2045, 'learning_rate': 2.9866889774481044e-06, 'epoch': 2.82}
{'loss': 0.1533, 'learning_rate': 1.2949737362087156e-06, 'epoch': 2.88}
{'loss': 0.1611, 'learning_rate': 2.978228636022262e-07, 'epoch': 2.94}
{'train_runtime': 1057.5667, 'train_samples_per_second': 302.148, 'train_steps_per_second': 2.36, 'train_loss': 0.2094740134019118, 'epoch': 3.0}