Skip to content

Instantly share code, notes, and snippets.

View taylanbil's full-sized avatar

Taylan Bilal taylanbil

View GitHub Profile
@taylanbil
taylanbil / 20190816_GPU_Myle_1gpu-dummy.txt
Created August 19, 2019 16:51
1 GPU chip on small dataset
Fri Aug 16 19:07:26 UTC 2019
#!/bin/bash
taskname=fairseq_e2e_gpu
vol_fairseq=/home/taylanbil/fairseq/
vol_data=/home/taylanbil/data/wmt18_en_de_bpej32k
vol_data=/home/taylanbil/data/dummy
python_cli="ipython -i"
python_cli="python"
other_flags="
@taylanbil
taylanbil / 20190816_TPU_1core-dummy.txt
Created August 19, 2019 16:53
Fairseq Transformer, 1 TPU on the small dataset.
Fri Aug 16 18:43:11 UTC 2019
#!/bin/bash
batch_size=128
n_words=64
data_path=/home/taylanbil/data/wmt18_en_de_bpej32k
data_path=/home/taylanbil/data/dummy
@taylanbil
taylanbil / 20190816_GPU_Myle_1gpu-full.txt
Created August 19, 2019 16:54
Fairseq Transformer, 1 GPU, full dataset - measure 100 steps.
Fri Aug 16 18:52:31 UTC 2019
#!/bin/bash
taskname=fairseq_e2e_gpu
vol_fairseq=/home/taylanbil/fairseq/
vol_data=/home/taylanbil/data/wmt18_en_de_bpej32k
#vol_data=/home/taylanbil/data/dummy
python_cli="ipython -i"
python_cli="python"
other_flags="
@taylanbil
taylanbil / 20190816_TPU_1core-full.txt
Created August 19, 2019 16:56
Fairseq Transformer on 1 TPU, full dataset - measure 100 steps.
Fri Aug 16 19:11:06 UTC 2019
#!/bin/bash
batch_size=128
n_words=64
data_path=/home/taylanbil/data/wmt18_en_de_bpej32k
#data_path=/home/taylanbil/data/dummy
@taylanbil
taylanbil / gist:0f6ef92e8d083996b84552eb6856abb6
Created August 30, 2019 05:47
Fairseq Transformer on GPU
This file has been truncated, but you can view the full file.
Mon Aug 19 21:56:00 UTC 2019
#!/bin/bash
taskname=fairseq_e2e_gpu
vol_fairseq=/home/taylanbil/fairseq/
vol_data=/home/taylanbil/data/wmt18_en_de_bpej32k
#vol_data=/home/taylanbil/data/dummy
python_cli="ipython -i"
python_cli="python"
other_flags="
This file has been truncated, but you can view the full file.
Mon Aug 26 16:15:47 UTC 2019
#!/bin/bash
batch_size=512
n_words=64
data_path=/home/taylanbil/data/wmt18_en_de_bpej32k
#data_path=/home/taylanbil/data/dummy
tensors_dir=/home/taylanbil/tensors
@taylanbil
taylanbil / gist:35218dca26ab8f084ea6b85fb7f08fd1
Created September 13, 2019 00:39
Resnet50 run logs w/ multiprocess, fake data
(pytorch-nightly) pytorcx-xla-img :: pytorch/xla/test ‹master*› » XLA_USE_BF16=1 XRT_TPU_CONFIG="tpu_worker;0;10.1.3.2:8470" python test_train_mp_imagenet.py --fake_data --model resnet50
2019-09-13 00:35:30.673264: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) CPU:0 -> /job:tpu_worker/replica:0/task:0/device:XLA_CPU:0
2019-09-13 00:35:30.673318: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (LOCAL) TPU:0 -> /job:tpu_worker/replica:0/task:0/device:TPU:0
2019-09-13 00:35:30.673326: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:1 -> /job:tpu_worker/replica:0/task:0/device:TPU:1
2019-09-13 00:35:30.673332: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:2 -> /job:tpu_worker/replica:0/task:0/device:TPU:2
2019-09-13 00:35:30.673338: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:197] XRT device (REMOTE) TPU:3 -> /job:tpu_worker/replica:0
@taylanbil
taylanbil / gist:8ff73e6b8cb26e550c8d47be1a844f48
Created September 18, 2019 03:30
error while dumping graphs
| WARNING: 240829 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[1422704, 2718830, 2897878, 3673048, 2016896, 2200333, 3886976, 2097242, 3124502, 2871279]
Epoch 1 begin 00:17:55
training/ 00:19:08, device xla:1, step 1, Rate=132.04, GlobalRate=132.04, loss=15.8125, nll_loss=15.8750
training/ 00:20:21, device xla:1, step 2, Rate=54.94, GlobalRate=6.89, loss=15.8125, nll_loss=15.8125
training/ 00:25:46, device xla:1, step 3, Rate=22.92, GlobalRate=2.56, loss=16.0000, nll_loss=16.0000
training/ 00:40:56, device xla:1, step 4, Rate=9.34, GlobalRate=0.98, loss=15.9375, nll_loss=15.9375
training/ 01:58:50, device xla:1, step 5, Rate=3.77, GlobalRate=0.26, loss=15.7500, nll_loss=15.8125
2019-09-18 03:13:04.411218: E tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: session_work.first->session()->Run( session_work.second.feed_inputs, session_work.second.outputs_handles, &outputs) == ::tensorflow::Status::OK() (Unavailable: From /job:tpu_worker/replica:0/
@taylanbil
taylanbil / gist:64aaa74c59745fce84aa217057f421d8
Created September 27, 2019 21:53
[fairseq][transformer] Fresh run w/ 3 large shapes.
Epoch 1 begin 19:52:45
training/ 19:55:52, device xla:5, step 1, Rate=2.19, GlobalRate=2.19
training/ 19:55:52, device xla:4, step 1, Rate=2.19, GlobalRate=2.19
training/ 19:56:05, device xla:2, step 1, Rate=1.97, GlobalRate=1.97
training/ 19:56:05, device xla:1, step 1, Rate=1.97, GlobalRate=1.97
training/ 19:59:14, device xla:8, step 1, Rate=1.60, GlobalRate=1.60
training/ 19:59:14, device xla:3, step 1, Rate=1.60, GlobalRate=1.60
training/ 19:59:16, device xla:6, step 1, Rate=1.60, GlobalRate=1.60
training/ 20:12:01, device xla:7, step 1, Rate=0.94, GlobalRate=0.94
training/ 20:17:46, device xla:5, step 2, Rate=1.11, GlobalRate=0.54
@taylanbil
taylanbil / gist:bbfec9307a2f4c35833d70976fd96bf8
Created September 27, 2019 21:55
[fseq][transformer] warmed up run
Epoch 1 begin 21:38:10
training/ 21:39:31, device xla:1, step 1, Rate=19.64, GlobalRate=19.64
training/ 21:39:31, device xla:2, step 1, Rate=19.52, GlobalRate=19.52
training/ 21:39:31, device xla:5, step 1, Rate=19.37, GlobalRate=19.37
training/ 21:39:31, device xla:8, step 1, Rate=38.76, GlobalRate=38.76
training/ 21:39:31, device xla:4, step 1, Rate=19.31, GlobalRate=19.31
training/ 21:39:31, device xla:6, step 1, Rate=38.53, GlobalRate=38.53
training/ 21:39:31, device xla:7, step 1, Rate=76.98, GlobalRate=76.98
training/ 21:39:31, device xla:3, step 1, Rate=38.31, GlobalRate=38.31
training/ 21:39:51, device xla:8, step 2, Rate=45.96, GlobalRate=46.01