Compiling Model... /localdata/evaw/workspace/venv/poplar_sdk-ubuntu_18_04-2.6.0+1074-33d3efd05d/2.6.0+1074_poptorch/lib/python3.6/site-packages/transformers/models/vit/modeling_vit.py:186: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if height != self.image_size[0] or width != self.image_size[1]: Graph compilation: 100%|██████████| 100/100 [00:15<00:00] Compiled/Loaded model in 32.70255442708731 secs ***** Running training ***** Num examples = 106514 Num Epochs = 3 Instantaneous batch size per device = 1 Device Iterations = 1 Replication Factor = 1 Gradient Accumulation steps = 128 Total train batch size (w. parallel, distributed & accumulation) = 128 Total optimization steps = 2496 40%|████ | 1000/2496 [06:59<10:13, 2.44it/s]Saving model checkpoint to ./results/checkpoint-1000 ---------- Device Allocation ----------- Embedding --> IPU 0 Encoder 0 --> IPU 0 Encoder 1 --> IPU 0 Encoder 2 --> IPU 0 Encoder 3 --> IPU 1 Encoder 4 --> IPU 1 Encoder 5 --> IPU 1 Encoder 6 --> IPU 2 Encoder 7 --> IPU 2 Encoder 8 --> IPU 2 Encoder 9 --> IPU 3 Encoder 10 --> IPU 3 Encoder 11 --> IPU 3 Head --> IPU 3 --------------------------------------- Configuration saved in ./results/checkpoint-1000/ipu_config.json 80%|████████ | 2000/2496 [14:04<03:26, 2.40it/s]Saving model checkpoint to ./results/checkpoint-2000 ---------- Device Allocation ----------- Embedding --> IPU 0 Encoder 0 --> IPU 0 Encoder 1 --> IPU 0 Encoder 2 --> IPU 0 Encoder 3 --> IPU 1 Encoder 4 --> IPU 1 Encoder 5 --> IPU 1 Encoder 6 --> IPU 2 Encoder 7 --> IPU 2 Encoder 8 --> IPU 2 Encoder 9 --> IPU 3 Encoder 10 --> IPU 3 Encoder 11 --> IPU 3 Head --> IPU 3 --------------------------------------- Configuration saved in ./results/checkpoint-2000/ipu_config.json 100%|██████████| 2496/2496 [17:37<00:00, 2.47it/s] Training completed. Do not forget to share your model on huggingface.co/models =) 100%|██████████| 2496/2496 [17:37<00:00, 2.36it/s] {'loss': 0.6216, 'learning_rate': 1.602564102564103e-05, 'epoch': 0.06} {'loss': 0.4267, 'learning_rate': 3.205128205128206e-05, 'epoch': 0.12} {'loss': 0.3673, 'learning_rate': 4.8076923076923084e-05, 'epoch': 0.18} {'loss': 0.3178, 'learning_rate': 6.410256410256412e-05, 'epoch': 0.24} {'loss': 0.2707, 'learning_rate': 8.012820512820514e-05, 'epoch': 0.3} {'loss': 0.2589, 'learning_rate': 9.615384615384617e-05, 'epoch': 0.36} {'loss': 0.2541, 'learning_rate': 0.00011217948717948718, 'epoch': 0 ... : 0.1613, 'learning_rate': 8.401392014073405e-06, 'epoch': 2.7} {'loss': 0.1605, 'learning_rate': 5.361064379673464e-06, 'epoch': 2.76} {'loss': 0.2045, 'learning_rate': 2.9866889774481044e-06, 'epoch': 2.82} {'loss': 0.1533, 'learning_rate': 1.2949737362087156e-06, 'epoch': 2.88} {'loss': 0.1611, 'learning_rate': 2.978228636022262e-07, 'epoch': 2.94} {'train_runtime': 1057.5667, 'train_samples_per_second': 302.148, 'train_steps_per_second': 2.36, 'train_loss': 0.2094740134019118, 'epoch': 3.0}