Skip to content

Instantly share code, notes, and snippets.

@yukunlin
Created April 18, 2022 23:44
Show Gist options
  • Save yukunlin/8c4298450299b33dd9a4c0559f50eccc to your computer and use it in GitHub Desktop.
Save yukunlin/8c4298450299b33dd9a4c0559f50eccc to your computer and use it in GitHub Desktop.
/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : fairseq_train_wrapped
min_nodes : 2
max_nodes : 2
nproc_per_node : 8
run_id : foobar
rdzv_backend : c10d
rdzv_endpoint : 10.0.0.213:29500
rdzv_configs : {'timeout': 900}
max_restarts : 0
monitor_interval : 5
log_dir : None
metrics_cfg : {}
INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q
INFO:torch.distributed.elastic.agent.server.api:[] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[] Rendezvous complete for workers. Result:
restart_count=0
master_addr=ip-10-0-0-175.us-west-2.compute.internal
master_port=47405
group_rank=0
group_world_size=2
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[16, 16, 16, 16, 16, 16, 16, 16]
global_world_sizes=[16, 16, 16, 16, 16, 16, 16, 16]
INFO:torch.distributed.elastic.agent.server.api:[] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_deupnvg9/foobar_yi7_bw6q/attempt_0/7/error.json
[0]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[1]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[3]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[4]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[7]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[2]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[5]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[6]:2022-04-18 23:07:20 | WARNING | root | Pytorch pre-release version 1.10.0a0+git36449ea - assuming intent to test it
[1]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 1): env://
[3]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 3): env://
[2]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 2): env://
[5]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 5): env://
[6]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 6): env://
[7]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 7): env://
[0]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 0): env://
[4]:2022-04-18 23:07:23 | INFO | fairseq.distributed.utils | distributed init (rank 4): env://
[3]:2022-04-18 23:07:24 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 3
[1]:2022-04-18 23:07:24 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 1
[2]:2022-04-18 23:07:24 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 2
[6]:2022-04-18 23:07:24 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 6
[7]:2022-04-18 23:07:24 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 7
[5]:2022-04-18 23:07:24 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 5
[4]:2022-04-18 23:07:24 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 4
[0]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 0
[0]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[0]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 0
[1]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[1]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 1
[2]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[2]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 2
[6]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[6]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 6
[5]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[5]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 5
[3]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[3]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 3
[7]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[7]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 7
[4]:2022-04-18 23:07:27 | INFO | torch.distributed.distributed_c10d | Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.
[4]:2022-04-18 23:07:27 | INFO | fairseq.distributed.utils | initialized host ip-10-0-0-175.us-west-2.compute.internal as rank 4
[3]:ip-10-0-0-175:20:20 [3] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[3]:ip-10-0-0-175:20:20 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[3]:ip-10-0-0-175:20:20 [3] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[3]:ip-10-0-0-175:20:20 [3] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[3]:
[3]:ip-10-0-0-175:20:20 [3] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[3]:
[3]:ip-10-0-0-175:20:20 [3] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[3]:ip-10-0-0-175:20:20 [3] NCCL INFO NET/IB : No device found.
[3]:ip-10-0-0-175:20:20 [3] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[3]:ip-10-0-0-175:20:20 [3] NCCL INFO Using network Socket
[1]:ip-10-0-0-175:18:18 [1] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[1]:ip-10-0-0-175:18:18 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[1]:ip-10-0-0-175:18:18 [1] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[1]:ip-10-0-0-175:18:18 [1] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[1]:
[1]:ip-10-0-0-175:18:18 [1] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[1]:
[1]:ip-10-0-0-175:18:18 [1] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[1]:ip-10-0-0-175:18:18 [1] NCCL INFO NET/IB : No device found.
[1]:ip-10-0-0-175:18:18 [1] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[1]:ip-10-0-0-175:18:18 [1] NCCL INFO Using network Socket
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[0]:
[0]:ip-10-0-0-175:17:17 [0] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[0]:
[0]:ip-10-0-0-175:17:17 [0] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO NET/IB : No device found.
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO Using network Socket
[0]:NCCL version 2.10.3+cuda11.3
[2]:ip-10-0-0-175:19:19 [2] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[2]:ip-10-0-0-175:19:19 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[2]:ip-10-0-0-175:19:19 [2] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[2]:ip-10-0-0-175:19:19 [2] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[2]:
[2]:ip-10-0-0-175:19:19 [2] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[2]:
[2]:ip-10-0-0-175:19:19 [2] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[2]:ip-10-0-0-175:19:19 [2] NCCL INFO NET/IB : No device found.
[2]:ip-10-0-0-175:19:19 [2] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[2]:ip-10-0-0-175:19:19 [2] NCCL INFO Using network Socket
[6]:ip-10-0-0-175:23:23 [6] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[6]:ip-10-0-0-175:23:23 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[6]:ip-10-0-0-175:23:23 [6] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[6]:ip-10-0-0-175:23:23 [6] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[6]:
[6]:ip-10-0-0-175:23:23 [6] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[6]:
[6]:ip-10-0-0-175:23:23 [6] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[6]:ip-10-0-0-175:23:23 [6] NCCL INFO NET/IB : No device found.
[6]:ip-10-0-0-175:23:23 [6] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[6]:ip-10-0-0-175:23:23 [6] NCCL INFO Using network Socket
[5]:ip-10-0-0-175:22:22 [5] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[5]:ip-10-0-0-175:22:22 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[5]:ip-10-0-0-175:22:22 [5] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[5]:ip-10-0-0-175:22:22 [5] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[5]:
[5]:ip-10-0-0-175:22:22 [5] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[5]:
[5]:ip-10-0-0-175:22:22 [5] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[5]:ip-10-0-0-175:22:22 [5] NCCL INFO NET/IB : No device found.
[5]:ip-10-0-0-175:22:22 [5] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[5]:ip-10-0-0-175:22:22 [5] NCCL INFO Using network Socket
[4]:ip-10-0-0-175:21:21 [4] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[4]:ip-10-0-0-175:21:21 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[4]:ip-10-0-0-175:21:21 [4] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[4]:ip-10-0-0-175:21:21 [4] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[4]:
[4]:ip-10-0-0-175:21:21 [4] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[4]:
[4]:ip-10-0-0-175:21:21 [4] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[4]:ip-10-0-0-175:21:21 [4] NCCL INFO NET/IB : No device found.
[4]:ip-10-0-0-175:21:21 [4] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[4]:ip-10-0-0-175:21:21 [4] NCCL INFO Using network Socket
[7]:ip-10-0-0-175:24:24 [7] NCCL INFO Bootstrap : Using eth0:10.0.0.175<0>
[7]:ip-10-0-0-175:24:24 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
[7]:ip-10-0-0-175:24:24 [7] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
[7]:ip-10-0-0-175:24:24 [7] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
[7]:
[7]:ip-10-0-0-175:24:24 [7] ofi_init:1157 NCCL WARN NET/OFI Only EFA provider is supported
[7]:
[7]:ip-10-0-0-175:24:24 [7] ofi_init:1208 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
[7]:ip-10-0-0-175:24:24 [7] NCCL INFO NET/IB : No device found.
[7]:ip-10-0-0-175:24:24 [7] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.175<0>
[7]:ip-10-0-0-175:24:24 [7] NCCL INFO Using network Socket
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Trees [0] 5/-1/-1->1->2 [1] 5/-1/-1->1->2
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Channel 00 : 1[170] -> 5[1b0] via P2P/IPC
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Channel 01 : 1[170] -> 5[1b0] via P2P/IPC
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 00/02 : 0 3 2 1 5 6 7 4 8 11 10 9 13 14 15 12
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 01/02 : 0 3 2 1 5 6 7 4 8 11 10 9 13 14 15 12
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Trees [0] 3/8/-1->0->-1 [1] 3/-1/-1->0->8
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 00 : 0[160] -> 3[190] via P2P/IPC
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 01 : 0[160] -> 3[190] via P2P/IPC
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Trees [0] 1/-1/-1->2->3 [1] 1/-1/-1->2->3
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Channel 00 : 2[180] -> 1[170] via P2P/IPC
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Channel 01 : 2[180] -> 1[170] via P2P/IPC
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Trees [0] 6/-1/-1->5->1 [1] 6/-1/-1->5->1
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Channel 00 : 5[1b0] -> 6[1c0] via P2P/IPC
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Channel 01 : 5[1b0] -> 6[1c0] via P2P/IPC
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Trees [0] 2/-1/-1->3->0 [1] 2/-1/-1->3->0
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Channel 00 : 3[190] -> 2[180] via P2P/IPC
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Channel 01 : 3[190] -> 2[180] via P2P/IPC
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Connected all rings
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Trees [0] -1/-1/-1->4->7 [1] -1/-1/-1->4->7
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Channel 00 : 4[1a0] -> 8[160] [send] via NET/Socket/0
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Connected all rings
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Channel 00 : 1[170] -> 2[180] via P2P/IPC
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Channel 01 : 1[170] -> 2[180] via P2P/IPC
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Connected all trees
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Channel 01 : 1[170] -> 4[1a0] via P2P/indirect/0[160]
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 00 : 12[1a0] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 01 : 12[1a0] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Connected all rings
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 00 : 8[160] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 01 : 8[160] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 00 : 0[160] -> 8[160] [send] via NET/Socket/0
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 01 : 0[160] -> 8[160] [send] via NET/Socket/0
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Connected all rings
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Channel 00 : 2[180] -> 3[190] via P2P/IPC
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Channel 01 : 2[180] -> 3[190] via P2P/IPC
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Connected all trees
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Channel 00 : 2[180] -> 4[1a0] via P2P/indirect/0[160]
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Channel 00 : 6[1c0] -> 7[1d0] via P2P/IPC
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Channel 01 : 6[1c0] -> 7[1d0] via P2P/IPC
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Connected all rings
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Channel 00 : 6[1c0] -> 5[1b0] via P2P/IPC
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Channel 01 : 6[1c0] -> 5[1b0] via P2P/IPC
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Connected all trees
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Connected all rings
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Channel 00 : 5[1b0] -> 1[170] via P2P/IPC
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Channel 01 : 5[1b0] -> 1[170] via P2P/IPC
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Connected all trees
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Channel 00 : 3[190] -> 0[160] via P2P/IPC
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Channel 01 : 3[190] -> 0[160] via P2P/IPC
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Channel 01 : 4[1a0] -> 8[160] [send] via NET/Socket/0
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Connected all rings
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Channel 00 : 4[1a0] -> 7[1d0] via P2P/IPC
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Channel 01 : 4[1a0] -> 7[1d0] via P2P/IPC
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Connected all trees
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Channel 00 : 7[1d0] -> 4[1a0] via P2P/IPC
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Channel 01 : 7[1d0] -> 4[1a0] via P2P/IPC
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Connected all rings
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Channel 00 : 7[1d0] -> 6[1c0] via P2P/IPC
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Channel 01 : 7[1d0] -> 6[1c0] via P2P/IPC
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Connected all trees
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Channel 01 : 1[170] -> 6[1c0] via P2P/indirect/5[1b0]
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO Channel 00 : 1[170] -> 7[1d0] via P2P/indirect/3[190]
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Connected all trees
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 01 : 0[160] -> 5[1b0] via P2P/indirect/1[170]
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 00 : 0[160] -> 6[1c0] via P2P/indirect/4[1a0]
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO Channel 01 : 0[160] -> 7[1d0] via P2P/indirect/4[1a0]
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Channel 01 : 2[180] -> 5[1b0] via P2P/indirect/1[170]
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO Channel 01 : 2[180] -> 7[1d0] via P2P/indirect/6[1c0]
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Channel 00 : 6[1c0] -> 0[160] via P2P/indirect/4[1a0]
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Channel 01 : 6[1c0] -> 1[170] via P2P/indirect/5[1b0]
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO Channel 01 : 6[1c0] -> 3[190] via P2P/indirect/2[180]
[6]:ip-10-0-0-175:23:98 [6] NCCL INFO comm 0x7f8f58002fb0 rank 6 nranks 16 cudaDev 6 busId 1c0 - Init COMPLETE
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Channel 01 : 5[1b0] -> 0[160] via P2P/indirect/4[1a0]
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Channel 01 : 5[1b0] -> 2[180] via P2P/indirect/1[170]
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO Channel 00 : 5[1b0] -> 3[190] via P2P/indirect/1[170]
[5]:ip-10-0-0-175:22:97 [5] NCCL INFO comm 0x7f4170002fb0 rank 5 nranks 16 cudaDev 5 busId 1b0 - Init COMPLETE
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Connected all trees
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Channel 01 : 3[190] -> 4[1a0] via P2P/indirect/0[160]
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Channel 00 : 3[190] -> 5[1b0] via P2P/indirect/1[170]
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO Channel 01 : 3[190] -> 6[1c0] via P2P/indirect/7[1d0]
[3]:ip-10-0-0-175:20:92 [3] NCCL INFO comm 0x7f1560002fb0 rank 3 nranks 16 cudaDev 3 busId 190 - Init COMPLETE
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Channel 01 : 4[1a0] -> 1[170] via P2P/indirect/5[1b0]
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Channel 00 : 4[1a0] -> 2[180] via P2P/indirect/6[1c0]
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO Channel 01 : 4[1a0] -> 3[190] via P2P/indirect/0[160]
[4]:ip-10-0-0-175:21:95 [4] NCCL INFO comm 0x7f9b64002fb0 rank 4 nranks 16 cudaDev 4 busId 1a0 - Init COMPLETE
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Channel 01 : 7[1d0] -> 0[160] via P2P/indirect/4[1a0]
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Channel 00 : 7[1d0] -> 1[170] via P2P/indirect/5[1b0]
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO Channel 01 : 7[1d0] -> 2[180] via P2P/indirect/3[190]
[7]:ip-10-0-0-175:24:94 [7] NCCL INFO comm 0x7f1718002fb0 rank 7 nranks 16 cudaDev 7 busId 1d0 - Init COMPLETE
[1]:ip-10-0-0-175:18:93 [1] NCCL INFO comm 0x7f7ae4002fb0 rank 1 nranks 16 cudaDev 1 busId 170 - Init COMPLETE
[0]:ip-10-0-0-175:17:91 [0] NCCL INFO comm 0x7fb154002fb0 rank 0 nranks 16 cudaDev 0 busId 160 - Init COMPLETE
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO Launch mode Parallel
[2]:ip-10-0-0-175:19:96 [2] NCCL INFO comm 0x7f72a4002fb0 rank 2 nranks 16 cudaDev 2 busId 180 - Init COMPLETE
[0]:2022-04-18 23:07:31 | INFO | fairseq_cli.train | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 16, 'distributed_num_procs': 8, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': 'env://', 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': True, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 8, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': 2048, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 2048, 'batch_size_valid': None, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 50000, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.0005], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': '/job/fairseq/checkpoints/transformer_wikitext-103_manual_docker_al2', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 8}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'transformer_lm', 'activation_fn': relu, 'dropout': 0.1, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'relu_dropout': 0.0, 'decoder_embed_dim': 512, 'decoder_output_dim': 512, 'decoder_input_dim': 512, 'decoder_ffn_embed_dim': 2048, 'decoder_layers': 6, 'decoder_attention_heads': 8, 'decoder_normalize_before': False, 'no_decoder_final_norm': False, 'adaptive_softmax_cutoff': None, 'adaptive_softmax_dropout': 0.0, 'adaptive_softmax_factor': 4.0, 'no_token_positional_embeddings': False, 'share_decoder_input_output_embed': True, 'character_embeddings': False, 'character_filters': '[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]', 'character_embedding_dim': 4, 'char_embedder_highway_layers': 2, 'adaptive_input': False, 'adaptive_input_factor': 4.0, 'adaptive_input_cutoff': None, 'tie_adaptive_weights': False, 'tie_adaptive_proj': False, 'decoder_learned_pos': False, 'layernorm_embedding': False, 'no_scale_embedding': False, 'checkpoint_activations': False, 'offload_activations': False, 'decoder_layerdrop': 0.0, 'decoder_layers_to_keep': None, 'quant_noise_pq': 0.0, 'quant_noise_pq_block_size': 8, 'quant_noise_scalar': 0.0, 'min_params_to_wrap': 100000000, 'base_layers': 0, 'base_sublayers': 1, 'base_shuffle': 1, 'scale_fc': False, 'scale_attn': False, 'scale_heads': False, 'scale_resids': False, 'add_bos_token': False, 'tokens_per_sample': 512, 'max_target_positions': None, 'tpu': False}, 'task': {'_name': 'language_modeling', 'data': '/job/fairseq/data-bin/wikitext-103', 'sample_break_mode': none, 'tokens_per_sample': 512, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': False, 'max_target_positions': None, 'shorten_method': none, 'shorten_data_split_list': '', 'pad_to_fixed_length': False, 'pad_to_fixed_bsz': False, 'seed': 1, 'batch_size': None, 'batch_size_valid': None, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': False}, 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9, 0.98)', 'adam_eps': 1e-08, 'weight_decay': 0.01, 'use_old_adam': False, 'fp16_adam_stats': False, 'tpu': False, 'lr': [0.0005]}, 'lr_scheduler': {'_name': 'inverse_sqrt', 'warmup_updates': 4000, 'warmup_init_lr': 1e-07, 'lr': [0.0005]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}}
[0]:2022-04-18 23:07:32 | INFO | fairseq.tasks.language_modeling | dictionary: 267744 types
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | TransformerLanguageModel(
[0]: (decoder): TransformerDecoder(
[0]: (dropout_module): FairseqDropout()
[0]: (embed_tokens): Embedding(267744, 512, padding_idx=1)
[0]: (embed_positions): SinusoidalPositionalEmbedding()
[0]: (layers): ModuleList(
[0]: (0): TransformerDecoderLayerBase(
[0]: (dropout_module): FairseqDropout()
[0]: (self_attn): MultiheadAttention(
[0]: (dropout_module): FairseqDropout()
[0]: (k_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (v_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (q_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (out_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: )
[0]: (activation_dropout_module): FairseqDropout()
[0]: (self_attn_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: (fc1): Linear(in_features=512, out_features=2048, bias=True)
[0]: (fc2): Linear(in_features=2048, out_features=512, bias=True)
[0]: (final_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: )
[0]: (1): TransformerDecoderLayerBase(
[0]: (dropout_module): FairseqDropout()
[0]: (self_attn): MultiheadAttention(
[0]: (dropout_module): FairseqDropout()
[0]: (k_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (v_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (q_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (out_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: )
[0]: (activation_dropout_module): FairseqDropout()
[0]: (self_attn_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: (fc1): Linear(in_features=512, out_features=2048, bias=True)
[0]: (fc2): Linear(in_features=2048, out_features=512, bias=True)
[0]: (final_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: )
[0]: (2): TransformerDecoderLayerBase(
[0]: (dropout_module): FairseqDropout()
[0]: (self_attn): MultiheadAttention(
[0]: (dropout_module): FairseqDropout()
[0]: (k_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (v_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (q_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (out_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: )
[0]: (activation_dropout_module): FairseqDropout()
[0]: (self_attn_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: (fc1): Linear(in_features=512, out_features=2048, bias=True)
[0]: (fc2): Linear(in_features=2048, out_features=512, bias=True)
[0]: (final_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: )
[0]: (3): TransformerDecoderLayerBase(
[0]: (dropout_module): FairseqDropout()
[0]: (self_attn): MultiheadAttention(
[0]: (dropout_module): FairseqDropout()
[0]: (k_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (v_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (q_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (out_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: )
[0]: (activation_dropout_module): FairseqDropout()
[0]: (self_attn_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: (fc1): Linear(in_features=512, out_features=2048, bias=True)
[0]: (fc2): Linear(in_features=2048, out_features=512, bias=True)
[0]: (final_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: )
[0]: (4): TransformerDecoderLayerBase(
[0]: (dropout_module): FairseqDropout()
[0]: (self_attn): MultiheadAttention(
[0]: (dropout_module): FairseqDropout()
[0]: (k_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (v_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (q_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (out_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: )
[0]: (activation_dropout_module): FairseqDropout()
[0]: (self_attn_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: (fc1): Linear(in_features=512, out_features=2048, bias=True)
[0]: (fc2): Linear(in_features=2048, out_features=512, bias=True)
[0]: (final_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: )
[0]: (5): TransformerDecoderLayerBase(
[0]: (dropout_module): FairseqDropout()
[0]: (self_attn): MultiheadAttention(
[0]: (dropout_module): FairseqDropout()
[0]: (k_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (v_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (q_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: (out_proj): Linear(in_features=512, out_features=512, bias=True)
[0]: )
[0]: (activation_dropout_module): FairseqDropout()
[0]: (self_attn_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: (fc1): Linear(in_features=512, out_features=2048, bias=True)
[0]: (fc2): Linear(in_features=2048, out_features=512, bias=True)
[0]: (final_layer_norm): FusedLayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
[0]: )
[0]: )
[0]: (output_projection): Linear(in_features=512, out_features=267744, bias=False)
[0]: )
[0]:)
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | task: LanguageModelingTask
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | model: TransformerLanguageModel
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | criterion: CrossEntropyCriterion
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | num. shared model params: 155,999,232 (num. trained: 155,999,232)
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | num. expert model params: 0 (num. trained: 0)
[0]:2022-04-18 23:07:37 | INFO | fairseq.data.data_utils | loaded 3,760 examples from: /job/fairseq/data-bin/wikitext-103/valid
[0]:2022-04-18 23:07:37 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:2 to store for rank: 0
[0]:2022-04-18 23:07:37 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 16 nodes.
[0]:2022-04-18 23:07:37 | INFO | fairseq.trainer | detected shared parameter: decoder.embed_tokens.weight <- decoder.output_projection.weight
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Trees [0] 5/-1/-1->1->2 [1] 5/-1/-1->1->2
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Channel 00 : 1[170] -> 5[1b0] via P2P/IPC
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Channel 01 : 1[170] -> 5[1b0] via P2P/IPC
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Connected all rings
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Channel 00 : 1[170] -> 2[180] via P2P/IPC
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 00/02 : 0 3 2 1 5 6 7 4 8 11 10 9 13 14 15 12
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 01/02 : 0 3 2 1 5 6 7 4 8 11 10 9 13 14 15 12
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Trees [0] 3/8/-1->0->-1 [1] 3/-1/-1->0->8
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Trees [0] 1/-1/-1->2->3 [1] 1/-1/-1->2->3
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Channel 00 : 2[180] -> 1[170] via P2P/IPC
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Channel 01 : 2[180] -> 1[170] via P2P/IPC
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Trees [0] 2/-1/-1->3->0 [1] 2/-1/-1->3->0
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Channel 00 : 3[190] -> 2[180] via P2P/IPC
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Channel 01 : 3[190] -> 2[180] via P2P/IPC
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Connected all rings
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Channel 00 : 3[190] -> 0[160] via P2P/IPC
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Channel 01 : 3[190] -> 0[160] via P2P/IPC
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Channel 00 : 6[1c0] -> 7[1d0] via P2P/IPC
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Channel 01 : 6[1c0] -> 7[1d0] via P2P/IPC
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Connected all rings
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Channel 00 : 6[1c0] -> 5[1b0] via P2P/IPC
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Channel 01 : 6[1c0] -> 5[1b0] via P2P/IPC
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Trees [0] 6/-1/-1->5->1 [1] 6/-1/-1->5->1
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Channel 00 : 5[1b0] -> 6[1c0] via P2P/IPC
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Channel 01 : 5[1b0] -> 6[1c0] via P2P/IPC
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Connected all rings
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Channel 00 : 5[1b0] -> 1[170] via P2P/IPC
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Channel 01 : 5[1b0] -> 1[170] via P2P/IPC
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Trees [0] -1/-1/-1->4->7 [1] -1/-1/-1->4->7
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Channel 00 : 4[1a0] -> 8[160] [send] via NET/Socket/0
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Channel 01 : 4[1a0] -> 8[160] [send] via NET/Socket/0
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Channel 00 : 7[1d0] -> 4[1a0] via P2P/IPC
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Channel 01 : 7[1d0] -> 4[1a0] via P2P/IPC
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Channel 01 : 1[170] -> 2[180] via P2P/IPC
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Connected all trees
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Channel 01 : 1[170] -> 4[1a0] via P2P/indirect/0[160]
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 00 : 0[160] -> 3[190] via P2P/IPC
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 01 : 0[160] -> 3[190] via P2P/IPC
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 00 : 12[1a0] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 01 : 12[1a0] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Connected all rings
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 00 : 8[160] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 01 : 8[160] -> 0[160] [receive] via NET/Socket/0
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 00 : 0[160] -> 8[160] [send] via NET/Socket/0
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 01 : 0[160] -> 8[160] [send] via NET/Socket/0
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Connected all rings
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Channel 00 : 2[180] -> 3[190] via P2P/IPC
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Channel 01 : 2[180] -> 3[190] via P2P/IPC
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Connected all trees
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Channel 00 : 2[180] -> 4[1a0] via P2P/indirect/0[160]
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Connected all trees
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Connected all trees
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Connected all rings
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Channel 00 : 4[1a0] -> 7[1d0] via P2P/IPC
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Channel 01 : 4[1a0] -> 7[1d0] via P2P/IPC
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Connected all trees
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Connected all rings
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Channel 00 : 7[1d0] -> 6[1c0] via P2P/IPC
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Channel 01 : 7[1d0] -> 6[1c0] via P2P/IPC
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Connected all trees
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Channel 01 : 1[170] -> 6[1c0] via P2P/indirect/5[1b0]
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO Channel 00 : 1[170] -> 7[1d0] via P2P/indirect/3[190]
[1]:ip-10-0-0-175:18:139 [1] NCCL INFO comm 0x7f7aa4002fb0 rank 1 nranks 16 cudaDev 1 busId 170 - Init COMPLETE
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Connected all trees
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 01 : 0[160] -> 5[1b0] via P2P/indirect/1[170]
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 00 : 0[160] -> 6[1c0] via P2P/indirect/4[1a0]
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO Channel 01 : 0[160] -> 7[1d0] via P2P/indirect/4[1a0]
[0]:ip-10-0-0-175:17:135 [0] NCCL INFO comm 0x7fb118002fb0 rank 0 nranks 16 cudaDev 0 busId 160 - Init COMPLETE
[0]:ip-10-0-0-175:17:17 [0] NCCL INFO Launch mode Parallel
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Channel 01 : 2[180] -> 5[1b0] via P2P/indirect/1[170]
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO Channel 01 : 2[180] -> 7[1d0] via P2P/indirect/6[1c0]
[2]:ip-10-0-0-175:19:136 [2] NCCL INFO comm 0x7f726c002fb0 rank 2 nranks 16 cudaDev 2 busId 180 - Init COMPLETE
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Connected all trees
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Channel 01 : 3[190] -> 4[1a0] via P2P/indirect/0[160]
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Channel 00 : 3[190] -> 5[1b0] via P2P/indirect/1[170]
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO Channel 01 : 3[190] -> 6[1c0] via P2P/indirect/7[1d0]
[3]:ip-10-0-0-175:20:142 [3] NCCL INFO comm 0x7f1528002fb0 rank 3 nranks 16 cudaDev 3 busId 190 - Init COMPLETE
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Channel 00 : 6[1c0] -> 0[160] via P2P/indirect/4[1a0]
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Channel 01 : 6[1c0] -> 1[170] via P2P/indirect/5[1b0]
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO Channel 01 : 6[1c0] -> 3[190] via P2P/indirect/2[180]
[6]:ip-10-0-0-175:23:138 [6] NCCL INFO comm 0x7f8f18002fb0 rank 6 nranks 16 cudaDev 6 busId 1c0 - Init COMPLETE
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Channel 01 : 5[1b0] -> 0[160] via P2P/indirect/4[1a0]
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Channel 01 : 5[1b0] -> 2[180] via P2P/indirect/1[170]
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO Channel 00 : 5[1b0] -> 3[190] via P2P/indirect/1[170]
[5]:ip-10-0-0-175:22:141 [5] NCCL INFO comm 0x7f412c002fb0 rank 5 nranks 16 cudaDev 5 busId 1b0 - Init COMPLETE
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Channel 01 : 4[1a0] -> 1[170] via P2P/indirect/5[1b0]
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Channel 00 : 4[1a0] -> 2[180] via P2P/indirect/6[1c0]
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO Channel 01 : 4[1a0] -> 3[190] via P2P/indirect/0[160]
[4]:ip-10-0-0-175:21:137 [4] NCCL INFO comm 0x7f9b1c002fb0 rank 4 nranks 16 cudaDev 4 busId 1a0 - Init COMPLETE
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Channel 01 : 7[1d0] -> 0[160] via P2P/indirect/4[1a0]
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Channel 00 : 7[1d0] -> 1[170] via P2P/indirect/5[1b0]
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO Channel 01 : 7[1d0] -> 2[180] via P2P/indirect/3[190]
[7]:ip-10-0-0-175:24:140 [7] NCCL INFO comm 0x7f16d8002fb0 rank 7 nranks 16 cudaDev 7 busId 1d0 - Init COMPLETE
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | ***********************CUDA enviroments for all 16 workers***********************
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 0: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 1: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 2: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 3: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 4: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 5: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 6: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 7: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 8: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 9: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 10: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 11: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 12: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 13: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 14: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | rank 15: capabilities = 7.0 ; total memory = 31.749 GB ; name = Tesla V100-SXM2-32GB
[0]:2022-04-18 23:07:37 | INFO | fairseq.utils | ***********************CUDA enviroments for all 16 workers***********************
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | training on 16 devices (GPUs/TPUs)
[0]:2022-04-18 23:07:37 | INFO | fairseq_cli.train | max tokens per device = 2048 and max sentences per device = None
[0]:2022-04-18 23:07:37 | INFO | fairseq.trainer | Preparing to load checkpoint /job/fairseq/checkpoints/transformer_wikitext-103_manual_docker_al2/checkpoint_last.pt
[0]:2022-04-18 23:07:44 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16 or --amp
[0]:2022-04-18 23:07:44 | INFO | fairseq.optim.adam | using FusedAdam
[0]:2022-04-18 23:07:46 | INFO | fairseq.trainer | Loaded checkpoint /job/fairseq/checkpoints/transformer_wikitext-103_manual_docker_al2/checkpoint_last.pt (epoch 8 @ 22057 updates)
[0]:2022-04-18 23:07:46 | INFO | fairseq.trainer | loading train data for epoch 8
[0]:2022-04-18 23:07:46 | INFO | fairseq.data.data_utils | loaded 1,801,350 examples from: /job/fairseq/data-bin/wikitext-103/train
[0]:2022-04-18 23:07:47 | INFO | fairseq.data.iterators | grouped total_num_itrs = 3151
[0]:2022-04-18 23:07:47 | INFO | fairseq.trainer | begin training epoch 8
[0]:2022-04-18 23:07:47 | INFO | fairseq_cli.train | Start iterating over samples
[0]:2022-04-18 23:07:49 | INFO | root | Reducer buckets have been rebuilt in this iteration.
[0]:2022-04-18 23:08:24 | INFO | train_inner | epoch 008: 43 / 3151 loss=5.215, ppl=37.15, wps=39610.6, ups=1.21, wpb=32768, bsz=64, num_updates=22100, lr=0.000212718, gnorm=0.682, train_wall=37, gb_free=20.6, wall=46
[0]:2022-04-18 23:09:34 | INFO | train_inner | epoch 008: 143 / 3151 loss=5.236, ppl=37.69, wps=46427.2, ups=1.42, wpb=32768, bsz=64, num_updates=22200, lr=0.000212238, gnorm=0.679, train_wall=70, gb_free=20.6, wall=117
[0]:2022-04-18 23:10:46 | INFO | train_inner | epoch 008: 243 / 3151 loss=5.24, ppl=37.8, wps=45564.2, ups=1.39, wpb=32768, bsz=64, num_updates=22300, lr=0.000211762, gnorm=0.645, train_wall=72, gb_free=20.6, wall=189
[0]:2022-04-18 23:11:56 | INFO | train_inner | epoch 008: 343 / 3151 loss=5.248, ppl=37.99, wps=47210.9, ups=1.44, wpb=32768, bsz=64, num_updates=22400, lr=0.000211289, gnorm=0.691, train_wall=69, gb_free=20.6, wall=258
[0]:2022-04-18 23:13:05 | INFO | train_inner | epoch 008: 443 / 3151 loss=5.253, ppl=38.13, wps=47045.7, ups=1.44, wpb=32768, bsz=64, num_updates=22500, lr=0.000210819, gnorm=0.662, train_wall=69, gb_free=20.6, wall=328
[0]:2022-04-18 23:14:16 | INFO | train_inner | epoch 008: 543 / 3151 loss=5.267, ppl=38.51, wps=46143.4, ups=1.41, wpb=32768, bsz=64, num_updates=22600, lr=0.000210352, gnorm=0.672, train_wall=71, gb_free=20.6, wall=399
[0]:2022-04-18 23:15:29 | INFO | train_inner | epoch 008: 643 / 3151 loss=5.257, ppl=38.23, wps=45269.2, ups=1.38, wpb=32768, bsz=64, num_updates=22700, lr=0.000209888, gnorm=0.668, train_wall=72, gb_free=20.6, wall=471
[0]:2022-04-18 23:16:40 | INFO | train_inner | epoch 008: 743 / 3151 loss=5.263, ppl=38.39, wps=46129.8, ups=1.41, wpb=32768, bsz=64, num_updates=22800, lr=0.000209427, gnorm=0.709, train_wall=71, gb_free=20.6, wall=542
[0]:2022-04-18 23:17:50 | INFO | train_inner | epoch 008: 843 / 3151 loss=5.258, ppl=38.27, wps=46394, ups=1.42, wpb=32768, bsz=64, num_updates=22900, lr=0.000208969, gnorm=0.664, train_wall=70, gb_free=20.6, wall=613
[0]:2022-04-18 23:19:01 | INFO | train_inner | epoch 008: 943 / 3151 loss=5.279, ppl=38.82, wps=46158.8, ups=1.41, wpb=32768, bsz=64, num_updates=23000, lr=0.000208514, gnorm=0.716, train_wall=71, gb_free=20.6, wall=684
[0]:2022-04-18 23:20:11 | INFO | train_inner | epoch 008: 1043 / 3151 loss=5.283, ppl=38.94, wps=47114.8, ups=1.44, wpb=32768, bsz=64, num_updates=23100, lr=0.000208063, gnorm=0.661, train_wall=69, gb_free=20.6, wall=753
[0]:2022-04-18 23:21:24 | INFO | train_inner | epoch 008: 1143 / 3151 loss=5.281, ppl=38.89, wps=45072.8, ups=1.38, wpb=32768, bsz=64, num_updates=23200, lr=0.000207614, gnorm=0.701, train_wall=72, gb_free=20.6, wall=826
[0]:2022-04-18 23:22:36 | INFO | train_inner | epoch 008: 1243 / 3151 loss=5.279, ppl=38.83, wps=45500.7, ups=1.39, wpb=32768, bsz=64, num_updates=23300, lr=0.000207168, gnorm=0.682, train_wall=72, gb_free=20.6, wall=898
[0]:2022-04-18 23:23:46 | INFO | train_inner | epoch 008: 1343 / 3151 loss=5.285, ppl=38.99, wps=46510, ups=1.42, wpb=32768, bsz=64, num_updates=23400, lr=0.000206725, gnorm=0.683, train_wall=70, gb_free=20.6, wall=969
[0]:2022-04-18 23:24:59 | INFO | train_inner | epoch 008: 1443 / 3151 loss=5.289, ppl=39.1, wps=44618.2, ups=1.36, wpb=32768, bsz=64, num_updates=23500, lr=0.000206284, gnorm=0.664, train_wall=73, gb_free=20.6, wall=1042
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment