Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
DeepSpeed Debug logs
21: M9 P[5, 6] avail 3.1e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 5.1e+03, inflight [9]
-gather param for module 3: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3}}
[2021-07-07 21:16:52,635] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 9
[2021-07-07 21:16:52,635] [INFO] [stage3.py:42:print_rank_0] module id 9 handle is None
22: M23 P[] avail 3.1e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [0, 23, 2, 1, 3]
[2021-07-07 21:16:52,636] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 23
[2021-07-07 21:16:52,636] [INFO] [stage3.py:42:print_rank_0] module id 23 handle is None
-gather param for module 24: {'id': 151, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
-gather param for module 24: {'id': 152, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {24}}
[2021-07-07 21:16:52,636] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 151, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
23: M24 P[151, 152] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 8.5e+07, inflight [24, 0, 2, 1, 3]
[2021-07-07 21:16:52,636] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 24
[2021-07-07 21:16:52,637] [INFO] [stage3.py:42:print_rank_0] module id 24 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632dd90>
24: M25 P[] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [0, 2, 1, 25, 3]
[2021-07-07 21:16:52,642] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 25
[2021-07-07 21:16:52,642] [INFO] [stage3.py:42:print_rank_0] module id 25 handle is None
-gather param for module 26: {'id': 153, 'status': 'AVAILABLE', 'numel': 30528, 'persist': True, 'active_sub_modules': {26}}
25: M26 P[153] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [0, 2, 1, 26, 3]
[2021-07-07 21:16:52,642] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 26
[2021-07-07 21:16:52,642] [INFO] [stage3.py:42:print_rank_0] module id 26 handle is None
26: M27 P[] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [0, 2, 1, 27, 3]
[2021-07-07 21:16:52,642] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 27
[2021-07-07 21:16:52,642] [INFO] [stage3.py:42:print_rank_0] module id 27 handle is None
-gather param for module 28: {'id': 154, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
-gather param for module 28: {'id': 155, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {28}}
[2021-07-07 21:16:52,643] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 154, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
27: M28 P[154, 155] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 8.5e+07, inflight [0, 2, 28, 1, 3]
[2021-07-07 21:16:52,643] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 28
[2021-07-07 21:16:52,643] [INFO] [stage3.py:42:print_rank_0] module id 28 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632dd90>
-gather param for module 29: {'id': 156, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
-gather param for module 29: {'id': 157, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
28: M29 P[156, 157] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [0, 29, 2, 1, 3]
[2021-07-07 21:16:52,644] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 29
[2021-07-07 21:16:52,644] [INFO] [stage3.py:42:print_rank_0] module id 29 handle is None
-gather param for module 30: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
29: M30 P[0] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [0, 2, 30, 1, 3]
[2021-07-07 21:16:52,644] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 30
[2021-07-07 21:16:52,644] [INFO] [stage3.py:42:print_rank_0] module id 30 handle is None
-gather param for module 31: {'id': 158, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {31}}
-gather param for module 31: {'id': 159, 'status': 'AVAILABLE', 'numel': 2, 'persist': True, 'active_sub_modules': {31}}
30: M31 P[158, 159] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [0, 2, 1, 31, 3]
[2021-07-07 21:16:52,645] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 31
[2021-07-07 21:16:52,645] [INFO] [stage3.py:42:print_rank_0] module id 31 handle is None
[2021-07-07 21:16:52,647] [INFO] [stage3.py:42:print_rank_0] current submodule in __inflight_module_manager, module id 0
31: M0 P[] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [1, 3, 0, 2]
[2021-07-07 21:16:52,647] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 0
[2021-07-07 21:16:52,647] [INFO] [stage3.py:42:print_rank_0] module id 0 handle is None
32: M25 P[] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [1, 25, 3, 2]
[2021-07-07 21:16:52,648] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 25
[2021-07-07 21:16:52,648] [INFO] [stage3.py:42:print_rank_0] module id 25 handle is None
-gather param for module 31: {'id': 158, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {31}}
-gather param for module 31: {'id': 159, 'status': 'AVAILABLE', 'numel': 2, 'persist': True, 'active_sub_modules': {31}}
33: M31 P[158, 159] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [31, 1, 3, 2]
[2021-07-07 21:16:52,648] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 31
[2021-07-07 21:16:52,648] [INFO] [stage3.py:42:print_rank_0] module id 31 handle is None
[2021-07-07 21:16:52,648] [INFO] [stage3.py:42:print_rank_0] sub_module Linear, sub_module id 31, param {'id': 158, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {31}}
[2021-07-07 21:16:52,648] [INFO] [stage3.py:42:print_rank_0] sub_module Linear, sub_module id 31, param {'id': 159, 'status': 'AVAILABLE', 'numel': 2, 'persist': True, 'active_sub_modules': {31}}
-gather param for module 26: {'id': 153, 'status': 'AVAILABLE', 'numel': 30528, 'persist': True, 'active_sub_modules': {26}}
34: M26 P[153] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [1, 26, 3, 2]
[2021-07-07 21:16:52,649] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 26
[2021-07-07 21:16:52,649] [INFO] [stage3.py:42:print_rank_0] module id 26 handle is None
[2021-07-07 21:16:52,649] [INFO] [stage3.py:42:print_rank_0] sub_module BertLMPredictionHead, sub_module id 26, param {'id': 153, 'status': 'AVAILABLE', 'numel': 30528, 'persist': True, 'active_sub_modules': {26}}
-gather param for module 30: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
35: M30 P[0] avail 3.3e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [1, 30, 3, 2]
[2021-07-07 21:16:52,649] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 30
[2021-07-07 21:16:52,649] [INFO] [stage3.py:42:print_rank_0] module id 30 handle is None
[2021-07-07 21:16:52,649] [INFO] [stage3.py:42:print_rank_0] sub_module Linear, sub_module id 30, param {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
-release param: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3}}
36: M27 P[] avail 2.5e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [1, 3, 27, 2]
[2021-07-07 21:16:52,650] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 27
[2021-07-07 21:16:52,650] [INFO] [stage3.py:42:print_rank_0] module id 27 handle is None
-gather param for module 29: {'id': 156, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
-gather param for module 29: {'id': 157, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
37: M29 P[156, 157] avail 2.5e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [1, 3, 29, 2]
[2021-07-07 21:16:52,650] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 29
[2021-07-07 21:16:52,650] [INFO] [stage3.py:42:print_rank_0] module id 29 handle is None
[2021-07-07 21:16:52,650] [INFO] [stage3.py:42:print_rank_0] sub_module FusedLayerNorm, sub_module id 29, param {'id': 156, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
[2021-07-07 21:16:52,650] [INFO] [stage3.py:42:print_rank_0] sub_module FusedLayerNorm, sub_module id 29, param {'id': 157, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
-gather param for module 28: {'id': 154, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
-gather param for module 28: {'id': 155, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {28}}
38: M28 P[154, 155] avail 2.5e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 8.5e+07, inflight [1, 28, 3, 2]
[2021-07-07 21:16:52,650] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 28
[2021-07-07 21:16:52,651] [INFO] [stage3.py:42:print_rank_0] module id 28 handle is None
[2021-07-07 21:16:52,651] [INFO] [stage3.py:42:print_rank_0] sub_module LinearActivation, sub_module id 28, param {'id': 154, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
[2021-07-07 21:16:52,651] [INFO] [stage3.py:42:print_rank_0] sub_module LinearActivation, sub_module id 28, param {'id': 155, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {28}}
-release param: {'id': 154, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
[2021-07-07 21:16:52,651] [INFO] [stage3.py:42:print_rank_0] current submodule in __inflight_module_manager, module id 1
39: M1 P[] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [1, 3, 2]
[2021-07-07 21:16:52,651] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 1
[2021-07-07 21:16:52,651] [INFO] [stage3.py:42:print_rank_0] module id 1 handle is None
40: M23 P[] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [2, 3, 23]
[2021-07-07 21:16:52,652] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 23
[2021-07-07 21:16:52,652] [INFO] [stage3.py:42:print_rank_0] module id 23 handle is None
-gather param for module 24: {'id': 151, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
-gather param for module 24: {'id': 152, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {24}}
41: M24 P[151, 152] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 8.5e+07, inflight [2, 24, 3]
[2021-07-07 21:16:52,652] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 24
[2021-07-07 21:16:52,652] [INFO] [stage3.py:42:print_rank_0] module id 24 handle is None
[2021-07-07 21:16:52,652] [INFO] [stage3.py:42:print_rank_0] sub_module LinearActivation, sub_module id 24, param {'id': 151, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
[2021-07-07 21:16:52,652] [INFO] [stage3.py:42:print_rank_0] sub_module LinearActivation, sub_module id 24, param {'id': 152, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {24}}
-release param: {'id': 151, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-gather param for module 9: {'id': 5, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
-gather param for module 9: {'id': 6, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
42: M9 P[5, 6] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [2, 9, 3]
[2021-07-07 21:16:52,653] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 9
[2021-07-07 21:16:52,653] [INFO] [stage3.py:42:print_rank_0] module id 9 handle is None
[2021-07-07 21:16:52,653] [INFO] [stage3.py:42:print_rank_0] sub_module FusedLayerNorm, sub_module id 9, param {'id': 5, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
[2021-07-07 21:16:52,653] [INFO] [stage3.py:42:print_rank_0] sub_module FusedLayerNorm, sub_module id 9, param {'id': 6, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
-gather param for module 19: {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 104, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 106, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 107, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 108, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 110, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 112, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 113, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 114, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
43: M19 P[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 19]
[2021-07-07 21:16:52,654] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 19
[2021-07-07 21:16:52,654] [INFO] [stage3.py:42:print_rank_0] module id 19 handle is None
-gather param for module 20: {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 116, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 118, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 119, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 120, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 122, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 124, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 125, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 126, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
44: M20 P[115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 20]
[2021-07-07 21:16:52,655] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 20
[2021-07-07 21:16:52,655] [INFO] [stage3.py:42:print_rank_0] module id 20 handle is None
-gather param for module 21: {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 128, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 130, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 131, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 132, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 134, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 136, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 137, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 138, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
45: M21 P[127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 21, 3]
[2021-07-07 21:16:52,660] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 21
[2021-07-07 21:16:52,660] [INFO] [stage3.py:42:print_rank_0] module id 21 handle is None
-gather param for module 22: {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 140, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 142, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 143, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 144, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 146, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 148, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 149, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 150, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
46: M22 P[139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 22, 3]
[2021-07-07 21:16:52,666] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 22
[2021-07-07 21:16:52,666] [INFO] [stage3.py:42:print_rank_0] module id 22 handle is None
-gather param for module 22: {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 140, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 142, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 143, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 144, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 146, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 148, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 149, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 150, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
47: M22 P[139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 22, 3]
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 22
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] module id 22 handle is None
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 140, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 142, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 143, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 144, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,672] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,673] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 146, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,673] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,673] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 148, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,673] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 149, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
[2021-07-07 21:16:52,673] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 22, param {'id': 150, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-release param: {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 21: {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 128, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 130, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 131, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 132, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 134, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 136, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 137, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 138, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
48: M21 P[127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 21, 3]
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 21
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] module id 21 handle is None
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 128, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 130, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 131, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 132, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 134, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 136, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 137, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
[2021-07-07 21:16:52,679] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 21, param {'id': 138, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-release param: {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 20: {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 116, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 118, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 119, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 120, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 122, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 124, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 125, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 126, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
49: M20 P[115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126] avail 7.9e+07, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 20]
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 20
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] module id 20 handle is None
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 116, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 118, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 119, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 120, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 122, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,691] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,692] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 124, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,692] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 125, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
[2021-07-07 21:16:52,692] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 20, param {'id': 126, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-release param: {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 19: {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 104, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 106, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 107, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 108, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 110, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 112, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 113, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 114, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
50: M19 P[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114] avail 4.6e+05, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 19]
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 19
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] module id 19 handle is None
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 104, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 106, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 107, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 108, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,704] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 110, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,705] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,705] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 112, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,705] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 113, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
[2021-07-07 21:16:52,705] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 19, param {'id': 114, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-release param: {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 15: {'id': 55, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 56, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 57, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 58, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 59, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 60, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 61, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 62, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 63, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 64, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 65, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 66, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,718] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 55, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,718] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 57, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,718] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 61, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,718] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 63, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
51: M15 P[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66] avail 4.6e+05, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 15, 3]
[2021-07-07 21:16:52,719] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 15
[2021-07-07 21:16:52,719] [INFO] [stage3.py:42:print_rank_0] module id 15 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632d810>
-gather param for module 16: {'id': 67, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 68, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 69, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 70, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 71, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 72, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 73, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 74, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 75, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 76, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 77, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 78, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,752] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 67, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,752] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 69, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,752] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 73, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,752] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 75, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
52: M16 P[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78] avail 7.9e+07, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [16, 2, 3]
[2021-07-07 21:16:52,753] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 16
[2021-07-07 21:16:52,753] [INFO] [stage3.py:42:print_rank_0] module id 16 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632d690>
-gather param for module 17: {'id': 79, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 80, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 81, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 82, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 83, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 84, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 85, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 86, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 87, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 88, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 89, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 90, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,786] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 79, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,786] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 81, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,786] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 85, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,786] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 87, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
53: M17 P[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [17, 2, 3]
[2021-07-07 21:16:52,786] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 17
[2021-07-07 21:16:52,786] [INFO] [stage3.py:42:print_rank_0] module id 17 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632df90>
-gather param for module 18: {'id': 91, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 92, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 93, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 94, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 95, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 96, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 97, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 98, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 99, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 100, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 101, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 102, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,819] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 91, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,819] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 93, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,819] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 97, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,819] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 99, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
54: M18 P[91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 18]
[2021-07-07 21:16:52,819] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 18
[2021-07-07 21:16:52,820] [INFO] [stage3.py:42:print_rank_0] module id 18 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632d810>
-gather param for module 18: {'id': 91, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 92, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 93, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 94, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 95, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 96, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 97, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 98, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 99, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 100, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 101, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 102, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
55: M18 P[91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 18]
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 18
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] module id 18 handle is None
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 91, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 92, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 93, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 94, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 95, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 96, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 97, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 98, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 99, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 100, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 101, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
[2021-07-07 21:16:52,852] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 18, param {'id': 102, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-release param: {'id': 91, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 93, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 97, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 99, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 17: {'id': 79, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 80, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 81, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 82, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 83, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 84, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 85, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 86, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 87, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 88, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 89, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 90, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
56: M17 P[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [17, 2, 3]
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 17
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] module id 17 handle is None
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 79, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 80, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 81, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 82, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 83, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 84, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 85, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,858] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 86, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,859] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 87, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,859] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 88, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,859] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 89, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
[2021-07-07 21:16:52,859] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 17, param {'id': 90, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-release param: {'id': 79, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 81, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 85, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 87, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 16: {'id': 67, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 68, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 69, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 70, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 71, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 72, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 73, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 74, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 75, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 76, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 77, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 78, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
57: M16 P[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78] avail 7.9e+07, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [16, 2, 3]
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 16
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] module id 16 handle is None
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 67, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 68, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 69, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 70, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 71, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 72, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 73, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 74, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 75, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 76, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 77, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
[2021-07-07 21:16:52,869] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 16, param {'id': 78, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-release param: {'id': 67, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 69, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 73, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 75, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 15: {'id': 55, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 56, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 57, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 58, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 59, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 60, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 61, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 62, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 63, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 64, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 65, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 66, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
58: M15 P[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66] avail 4.6e+05, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 15, 3]
[2021-07-07 21:16:52,881] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 15
[2021-07-07 21:16:52,881] [INFO] [stage3.py:42:print_rank_0] module id 15 handle is None
[2021-07-07 21:16:52,881] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 55, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,881] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 56, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,881] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 57, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,881] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 58, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 59, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 60, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 61, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 62, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 63, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 64, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 65, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
[2021-07-07 21:16:52,882] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 15, param {'id': 66, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-release param: {'id': 55, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 57, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 61, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 63, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-gather param for module 11: {'id': 7, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 8, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 9, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 10, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 11, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 12, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 13, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 14, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 15, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 16, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 17, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 18, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:52,918] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 7, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
[2021-07-07 21:16:52,918] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 9, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
[2021-07-07 21:16:52,918] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 13, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
[2021-07-07 21:16:52,918] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 15, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
59: M11 P[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] avail 4.6e+05, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 11, 3]
[2021-07-07 21:16:52,919] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 11
[2021-07-07 21:16:52,919] [INFO] [stage3.py:42:print_rank_0] module id 11 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632d690>
-gather param for module 12: {'id': 19, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 20, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 21, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 22, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 23, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 24, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 25, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 26, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 27, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 28, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 29, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 30, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:52,971] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 19, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
[2021-07-07 21:16:52,972] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 21, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
[2021-07-07 21:16:52,972] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 25, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
[2021-07-07 21:16:52,972] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 27, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
60: M12 P[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] avail 7.9e+07, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 12]
[2021-07-07 21:16:52,972] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 12
[2021-07-07 21:16:52,972] [INFO] [stage3.py:42:print_rank_0] module id 12 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632df90>
-gather param for module 13: {'id': 31, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 32, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 34, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 35, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 36, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 38, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 40, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 41, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 42, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 31, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,022] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 31, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,023] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 33, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,023] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 37, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,023] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 39, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
61: M13 P[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 31, 33, 37, 39] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [13, 2, 3]
[2021-07-07 21:16:53,023] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 13
[2021-07-07 21:16:53,023] [INFO] [stage3.py:42:print_rank_0] module id 13 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632dd50>
-gather param for module 14: {'id': 43, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 44, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 45, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 46, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 47, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 48, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 49, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 50, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 51, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 52, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 53, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 54, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,072] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 43, 'status': 'INFLIGHT', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,073] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 45, 'status': 'INFLIGHT', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,073] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 49, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,073] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 51, 'status': 'INFLIGHT', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
62: M14 P[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 14]
[2021-07-07 21:16:53,073] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 14
[2021-07-07 21:16:53,073] [INFO] [stage3.py:42:print_rank_0] module id 14 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632d690>
-gather param for module 14: {'id': 43, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 44, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 45, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 46, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 47, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 48, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 49, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 50, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 51, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 52, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 53, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 54, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
63: M14 P[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 14]
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 14
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] module id 14 handle is None
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 43, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 44, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 45, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 46, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 47, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 48, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,124] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 49, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,125] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 50, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,125] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 51, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,125] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 52, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,125] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 53, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
[2021-07-07 21:16:53,125] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 14, param {'id': 54, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 13: {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 32, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 34, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 35, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 36, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 38, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 40, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 41, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 42, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
64: M13 P[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 31, 33, 37, 39] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [13, 2, 3]
[2021-07-07 21:16:53,131] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 13
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] module id 13 handle is None
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 32, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 34, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 35, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 36, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 38, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 40, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 41, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 42, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
[2021-07-07 21:16:53,132] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 13, param {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 12: {'id': 19, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 20, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 21, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 22, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 23, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 24, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 25, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 26, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 27, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 28, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 29, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 30, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
65: M12 P[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 3, 12]
[2021-07-07 21:16:53,145] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 12
[2021-07-07 21:16:53,145] [INFO] [stage3.py:42:print_rank_0] module id 12 handle is None
[2021-07-07 21:16:53,145] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 19, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,145] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 20, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 21, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 22, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 23, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 24, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 25, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 26, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 27, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 28, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 29, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
[2021-07-07 21:16:53,146] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 12, param {'id': 30, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 11: {'id': 7, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 8, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 9, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 10, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 11, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 12, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 13, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 14, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 15, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 16, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 17, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 18, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
66: M11 P[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 1.6e+08, inflight [2, 11, 3]
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 11
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] module id 11 handle is None
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 7, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 8, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 9, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 10, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 11, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 12, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 13, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 14, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 15, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 16, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 17, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,160] [INFO] [stage3.py:42:print_rank_0] sub_module DeepSpeedTransformerLayer, sub_module id 11, param {'id': 18, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] current submodule in __inflight_module_manager, module id 2
67: M2 P[] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [2, 3]
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 2
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] module id 2 handle is None
68: M7 P[] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [7, 3]
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 7
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] module id 7 handle is None
-gather param for module 6: {'id': 3, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 6: {'id': 4, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
69: M6 P[3, 4] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [6, 3]
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 6
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] module id 6 handle is None
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] sub_module FusedLayerNorm, sub_module id 6, param {'id': 3, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
[2021-07-07 21:16:53,174] [INFO] [stage3.py:42:print_rank_0] sub_module FusedLayerNorm, sub_module id 6, param {'id': 4, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 5: {'id': 2, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {5}}
70: M5 P[2] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [5, 3]
[2021-07-07 21:16:53,175] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 5
[2021-07-07 21:16:53,175] [INFO] [stage3.py:42:print_rank_0] module id 5 handle is None
[2021-07-07 21:16:53,175] [INFO] [stage3.py:42:print_rank_0] sub_module Embedding, sub_module id 5, param {'id': 2, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {5}}
-gather param for module 4: {'id': 1, 'status': 'NOT_AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
[2021-07-07 21:16:53,189] [INFO] [utils.py:629:info_rank_0] all_gather_coalesced {'id': 1, 'status': 'INFLIGHT', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
71: M4 P[1] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.9e+07, inflight [4, 3]
[2021-07-07 21:16:53,189] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 4
[2021-07-07 21:16:53,190] [INFO] [stage3.py:42:print_rank_0] module id 4 handle is <deepspeed.runtime.zero.partition_parameters.AllGatherCoalescedHandle object at 0x7f7bc632d390>
[2021-07-07 21:16:53,190] [INFO] [stage3.py:42:print_rank_0] sub_module Embedding, sub_module id 4, param {'id': 1, 'status': 'AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
[2021-07-07 21:16:53,191] [INFO] [stage3.py:42:print_rank_0] current submodule in __inflight_module_manager, module id 3
72: M3 P[0] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.8e+02, n_inflight 7.8e+07, inflight [3]
[2021-07-07 21:16:53,191] [INFO] [stage3.py:42:print_rank_0] wait_for_fetch current submodule id 3
[2021-07-07 21:16:53,191] [INFO] [stage3.py:42:print_rank_0] module id 3 handle is None
[2021-07-07 21:16:53,191] [INFO] [stage3.py:42:print_rank_0] sub_module Embedding, sub_module id 3, param {'id': 0, 'status': 'NOT_AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3}}
0%| | 9/152966 [00:17<81:54:53, 1.93s/it]
0%| | 9/152942 [00:17<82:10:34, 1.93s/it]
Traceback (most recent call last):
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 607, in <module>
Traceback (most recent call last):
0%| | 9/152613 [00:17<81:51:11, 1.93s/it] File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 607, in <module>
main()
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 600, in main
main()
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 600, in main
run(args, model, optimizer, start_epoch)
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 566, in run
Traceback (most recent call last):
run(args, model, optimizer, start_epoch)
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 566, in run
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 607, in <module>
train(args, index, model, optimizer, pretrain_dataset_provider)
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 179, in train
model.network.backward(loss)
File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
0%| | 9/152794 [00:17<81:55:18, 1.93s/it] return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/engine.py", line 1191, in backward
train(args, index, model, optimizer, pretrain_dataset_provider)
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 179, in train
model.network.backward(loss)
File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/engine.py", line 1191, in backward
main()
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 600, in main
self.optimizer.backward(loss)
File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 2823, in backward
run(args, model, optimizer, start_epoch)
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 566, in run
self.optimizer.backward(loss)
File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
Traceback (most recent call last):
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 607, in <module>
return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 2823, in backward
train(args, index, model, optimizer, pretrain_dataset_provider)
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 179, in train
model.network.backward(loss)
File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/engine.py", line 1191, in backward
main()
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 600, in main
self.optimizer.backward(loss)
run(args, model, optimizer, start_epoch) File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 566, in run
return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 2823, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/fp16/loss_scaler.py", line 53, in backward
train(args, index, model, optimizer, pretrain_dataset_provider)scaled_loss.backward(retain_graph=retain_graph)
File "/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py", line 179, in train
File "/usr/local/lib64/python3.7/site-packages/torch/tensor.py", line 233, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
model.network.backward(loss)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/fp16/loss_scaler.py", line 53, in backward
File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib64/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/engine.py", line 1191, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib64/python3.7/site-packages/torch/tensor.py", line 233, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: The size of tensor a (0) must match the size of tensor b (2560) at non-singleton dimension 1
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib64/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: The size of tensor a (0) must match the size of tensor b (2560) at non-singleton dimension 1
self.optimizer.backward(loss)
File "/home/ec2-user/DeepSpeed/deepspeed/utils/nvtx.py", line 9, in wrapped_fn
return func(*args, **kwargs)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 2823, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/fp16/loss_scaler.py", line 53, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib64/python3.7/site-packages/torch/tensor.py", line 233, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib64/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: The size of tensor a (0) must match the size of tensor b (2560) at non-singleton dimension 1
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/home/ec2-user/DeepSpeed/deepspeed/runtime/fp16/loss_scaler.py", line 53, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib64/python3.7/site-packages/torch/tensor.py", line 233, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib64/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: The size of tensor a (0) must match the size of tensor b (2560) at non-singleton dimension 1
Killing subprocess 43838
Killing subprocess 43839
Killing subprocess 43840
Killing subprocess 43841
Traceback (most recent call last):
File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ec2-user/DeepSpeed/deepspeed/launcher/launch.py", line 171, in <module>
main()
File "/home/ec2-user/DeepSpeed/deepspeed/launcher/launch.py", line 161, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/ec2-user/DeepSpeed/deepspeed/launcher/launch.py", line 139, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/bin/python3', '-u', '/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py', '--local_rank=3', '--max_seq_length', '512', '--print_steps', '10', '--deepspeed', '--data_path_prefix', '/home/ec2-user/bert-data-nv', '--use_nvidia_dataset', '--rewarmup', '--lr_schedule', 'EE', '--attention_dropout_checkpoint', '--lr_offset', '0.0', '--gelu_checkpoint', '--deepspeed_transformer_kernel', '--max_steps', '20', '--ckpt_to_save', '200', '--output_dir', '/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../outputs/zero3_1node_profile_2021-07-07_21:16:05/', '--cf', '/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', '--deepspeed_config', '/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', '--job_name', 'zero3_1node_profile_2021-07-07_21:16:05']' returned non-zero exit status 1.
[2021-07-06 19:29:23,081] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2021-07-06 19:29:23,190] [INFO] [runner.py:360:main] cmd = /bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 /home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../../deepspeed_train.py --max_seq_length 512 --print_steps 10 --deepspeed --data_path_prefix /home/ec2-user/bert-data-nv --use_nvidia_dataset --rewarmup --lr_schedule EE --attention_dropout_checkpoint --lr_offset 0.0 --gelu_checkpoint --deepspeed_transformer_kernel --max_steps 5 --ckpt_to_save 200 --output_dir /home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../outputs/zero3_1node_profile_2021-07-06_19:29:22/ --cf /home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json --deepspeed_config /home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json --job_name zero3_1node_profile_2021-07-06_19:29:22
[2021-07-06 19:29:23,968] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2021-07-06 19:29:23,968] [INFO] [launch.py:89:main] nnodes=1, num_local_procs=4, node_rank=0
[2021-07-06 19:29:23,968] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2021-07-06 19:29:23,968] [INFO] [launch.py:102:main] dist_world_size=4
[2021-07-06 19:29:23,968] [INFO] [launch.py:105:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
Running Config File: zero3_1node_profile_2021-07-06_19:29:22
Args = Namespace(attention_dropout_checkpoint=True, ckpt_to_save=[200], config={'train_batch_size': 64, 'train_micro_batch_size_per_gpu': 2, 'steps_per_print': 100, 'prescale_gradients': False, 'bert_token_file': 'bert-large-uncased', 'bert_model_config': {'vocab_size_or_config_json_file': 32003, 'hidden_size': 2560, 'num_hidden_layers': 12, 'num_attention_heads': 40, 'intermediate_size': 10240, 'hidden_act': 'gelu', 'hidden_dropout_prob': 0.1, 'attention_probs_dropout_prob': 0.1, 'max_position_embeddings': 512, 'initializer_range': 0.02}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'reduce_scatter': False, 'contiguous_gradients': False}, 'zero_allow_untested_optimizer': True, 'optimizer': {'type': 'Adam', 'params': {'lr': 0.0001, 'weight_decay': 0.01, 'bias_correction': True, 'eps': 1e-06}}, 'gradient_clipping': 1.0, 'wall_clock_breakdown': True, 'fp16': {'enabled': True, 'loss_scale': 0, 'initial_scale_power': 20, 'loss_scale_window': 1000}, 'data': {'flags': {'pretrain_dataset': True, 'pretrain_type': 'wiki_bc'}, 'datasets': {'pretrain_dataset': '512/wikicorpus_en'}}, 'training': {'num_epochs': 20, 'warmup_proportion': 0.02, 'learning_rate': 0.002, 'num_workers': 4, 'async_worker': True, 'decay_rate': 0.9, 'decay_step': 150, 'total_training_steps': 15000}}, config_file='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', data_path_prefix='/home/ec2-user/bert-data-nv', deepscale=False, deepscale_config=None, deepspeed=True, deepspeed_config='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', deepspeed_mpi=False, deepspeed_sparse_attention=False, deepspeed_transformer_kernel=True, do_lower_case=True, finetune=False, gelu_checkpoint=True, job_name='zero3_1node_profile_2021-07-06_19:29:22', load_checkpoint_id=None, load_training_checkpoint=None, local_rank=3, logger=<turing.logger.Logger object at 0x7fbc33b3cfd0>, lr_offset=0.0, lr_schedule='EE', max_predictions_per_seq=80, max_seq_length=512, max_steps=5, max_steps_per_epoch=9223372036854775807, no_cuda=False, normalize_invertible=False, output_dir='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../outputs/zero3_1node_profile_2021-07-06_19:29:22/', print_steps=10, progressive_layer_drop=False, refresh_bucket_size=1, rewarmup=True, seed=42, stochastic_mode=False, use_nvidia_dataset=True, use_pretrain=False, validation_data_path_prefix=None)
Running Config File: zero3_1node_profile_2021-07-06_19:29:22
Args = Namespace(attention_dropout_checkpoint=True, ckpt_to_save=[200], config={'train_batch_size': 64, 'train_micro_batch_size_per_gpu': 2, 'steps_per_print': 100, 'prescale_gradients': False, 'bert_token_file': 'bert-large-uncased', 'bert_model_config': {'vocab_size_or_config_json_file': 32003, 'hidden_size': 2560, 'num_hidden_layers': 12, 'num_attention_heads': 40, 'intermediate_size': 10240, 'hidden_act': 'gelu', 'hidden_dropout_prob': 0.1, 'attention_probs_dropout_prob': 0.1, 'max_position_embeddings': 512, 'initializer_range': 0.02}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'reduce_scatter': False, 'contiguous_gradients': False}, 'zero_allow_untested_optimizer': True, 'optimizer': {'type': 'Adam', 'params': {'lr': 0.0001, 'weight_decay': 0.01, 'bias_correction': True, 'eps': 1e-06}}, 'gradient_clipping': 1.0, 'wall_clock_breakdown': True, 'fp16': {'enabled': True, 'loss_scale': 0, 'initial_scale_power': 20, 'loss_scale_window': 1000}, 'data': {'flags': {'pretrain_dataset': True, 'pretrain_type': 'wiki_bc'}, 'datasets': {'pretrain_dataset': '512/wikicorpus_en'}}, 'training': {'num_epochs': 20, 'warmup_proportion': 0.02, 'learning_rate': 0.002, 'num_workers': 4, 'async_worker': True, 'decay_rate': 0.9, 'decay_step': 150, 'total_training_steps': 15000}}, config_file='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', data_path_prefix='/home/ec2-user/bert-data-nv', deepscale=False, deepscale_config=None, deepspeed=True, deepspeed_config='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', deepspeed_mpi=False, deepspeed_sparse_attention=False, deepspeed_transformer_kernel=True, do_lower_case=True, finetune=False, gelu_checkpoint=True, job_name='zero3_1node_profile_2021-07-06_19:29:22', load_checkpoint_id=None, load_training_checkpoint=None, local_rank=0, logger=<turing.logger.Logger object at 0x7f990884dfd0>, lr_offset=0.0, lr_schedule='EE', max_predictions_per_seq=80, max_seq_length=512, max_steps=5, max_steps_per_epoch=9223372036854775807, no_cuda=False, normalize_invertible=False, output_dir='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../outputs/zero3_1node_profile_2021-07-06_19:29:22/', print_steps=10, progressive_layer_drop=False, refresh_bucket_size=1, rewarmup=True, seed=42, stochastic_mode=False, use_nvidia_dataset=True, use_pretrain=False, validation_data_path_prefix=None)
Running Config File: zero3_1node_profile_2021-07-06_19:29:22
Args = Namespace(attention_dropout_checkpoint=True, ckpt_to_save=[200], config={'train_batch_size': 64, 'train_micro_batch_size_per_gpu': 2, 'steps_per_print': 100, 'prescale_gradients': False, 'bert_token_file': 'bert-large-uncased', 'bert_model_config': {'vocab_size_or_config_json_file': 32003, 'hidden_size': 2560, 'num_hidden_layers': 12, 'num_attention_heads': 40, 'intermediate_size': 10240, 'hidden_act': 'gelu', 'hidden_dropout_prob': 0.1, 'attention_probs_dropout_prob': 0.1, 'max_position_embeddings': 512, 'initializer_range': 0.02}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'reduce_scatter': False, 'contiguous_gradients': False}, 'zero_allow_untested_optimizer': True, 'optimizer': {'type': 'Adam', 'params': {'lr': 0.0001, 'weight_decay': 0.01, 'bias_correction': True, 'eps': 1e-06}}, 'gradient_clipping': 1.0, 'wall_clock_breakdown': True, 'fp16': {'enabled': True, 'loss_scale': 0, 'initial_scale_power': 20, 'loss_scale_window': 1000}, 'data': {'flags': {'pretrain_dataset': True, 'pretrain_type': 'wiki_bc'}, 'datasets': {'pretrain_dataset': '512/wikicorpus_en'}}, 'training': {'num_epochs': 20, 'warmup_proportion': 0.02, 'learning_rate': 0.002, 'num_workers': 4, 'async_worker': True, 'decay_rate': 0.9, 'decay_step': 150, 'total_training_steps': 15000}}, config_file='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', data_path_prefix='/home/ec2-user/bert-data-nv', deepscale=False, deepscale_config=None, deepspeed=True, deepspeed_config='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', deepspeed_mpi=False, deepspeed_sparse_attention=False, deepspeed_transformer_kernel=True, do_lower_case=True, finetune=False, gelu_checkpoint=True, job_name='zero3_1node_profile_2021-07-06_19:29:22', load_checkpoint_id=None, load_training_checkpoint=None, local_rank=2, logger=<turing.logger.Logger object at 0x7faf4fa37fd0>, lr_offset=0.0, lr_schedule='EE', max_predictions_per_seq=80, max_seq_length=512, max_steps=5, max_steps_per_epoch=9223372036854775807, no_cuda=False, normalize_invertible=False, output_dir='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../outputs/zero3_1node_profile_2021-07-06_19:29:22/', print_steps=10, progressive_layer_drop=False, refresh_bucket_size=1, rewarmup=True, seed=42, stochastic_mode=False, use_nvidia_dataset=True, use_pretrain=False, validation_data_path_prefix=None)
Running Config File: zero3_1node_profile_2021-07-06_19:29:22
Args = Namespace(attention_dropout_checkpoint=True, ckpt_to_save=[200], config={'train_batch_size': 64, 'train_micro_batch_size_per_gpu': 2, 'steps_per_print': 100, 'prescale_gradients': False, 'bert_token_file': 'bert-large-uncased', 'bert_model_config': {'vocab_size_or_config_json_file': 32003, 'hidden_size': 2560, 'num_hidden_layers': 12, 'num_attention_heads': 40, 'intermediate_size': 10240, 'hidden_act': 'gelu', 'hidden_dropout_prob': 0.1, 'attention_probs_dropout_prob': 0.1, 'max_position_embeddings': 512, 'initializer_range': 0.02}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'reduce_scatter': False, 'contiguous_gradients': False}, 'zero_allow_untested_optimizer': True, 'optimizer': {'type': 'Adam', 'params': {'lr': 0.0001, 'weight_decay': 0.01, 'bias_correction': True, 'eps': 1e-06}}, 'gradient_clipping': 1.0, 'wall_clock_breakdown': True, 'fp16': {'enabled': True, 'loss_scale': 0, 'initial_scale_power': 20, 'loss_scale_window': 1000}, 'data': {'flags': {'pretrain_dataset': True, 'pretrain_type': 'wiki_bc'}, 'datasets': {'pretrain_dataset': '512/wikicorpus_en'}}, 'training': {'num_epochs': 20, 'warmup_proportion': 0.02, 'learning_rate': 0.002, 'num_workers': 4, 'async_worker': True, 'decay_rate': 0.9, 'decay_step': 150, 'total_training_steps': 15000}}, config_file='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', data_path_prefix='/home/ec2-user/bert-data-nv', deepscale=False, deepscale_config=None, deepspeed=True, deepspeed_config='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../configs/zero3_1node_profile.json', deepspeed_mpi=False, deepspeed_sparse_attention=False, deepspeed_transformer_kernel=True, do_lower_case=True, finetune=False, gelu_checkpoint=True, job_name='zero3_1node_profile_2021-07-06_19:29:22', load_checkpoint_id=None, load_training_checkpoint=None, local_rank=1, logger=<turing.logger.Logger object at 0x7fe9d9750fd0>, lr_offset=0.0, lr_schedule='EE', max_predictions_per_seq=80, max_seq_length=512, max_steps=5, max_steps_per_epoch=9223372036854775807, no_cuda=False, normalize_invertible=False, output_dir='/home/ec2-user/DeepSpeedExamples/bing_bert/zero_opt_experiments/scripts/../outputs/zero3_1node_profile_2021-07-06_19:29:22/', print_steps=10, progressive_layer_drop=False, refresh_bucket_size=1, rewarmup=True, seed=42, stochastic_mode=False, use_nvidia_dataset=True, use_pretrain=False, validation_data_path_prefix=None)
07/06/2021 19:29:25 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/ec2-user/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
07/06/2021 19:29:25 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/ec2-user/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
07/06/2021 19:29:25 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/ec2-user/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
07/06/2021 19:29:25 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/ec2-user/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
07/06/2021 19:29:25 - WARNING - root - Skipping validation because validation_data_path_prefix is unspecified
07/06/2021 19:29:25 - WARNING - root - Early training exit is set after 5 global steps
[2021-07-06 19:29:25,241] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl
07/06/2021 19:29:25 - WARNING - root - Skipping validation because validation_data_path_prefix is unspecified
07/06/2021 19:29:25 - WARNING - root - Early training exit is set after 5 global steps
[2021-07-06 19:29:25,246] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl
07/06/2021 19:29:25 - WARNING - root - Skipping validation because validation_data_path_prefix is unspecified
07/06/2021 19:29:25 - WARNING - root - Early training exit is set after 5 global steps
[2021-07-06 19:29:25,248] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl
07/06/2021 19:29:25 - WARNING - root - Skipping validation because validation_data_path_prefix is unspecified
07/06/2021 19:29:25 - WARNING - root - Early training exit is set after 5 global steps
[2021-07-06 19:29:25,254] [INFO] [distributed.py:47:init_distributed] Initializing torch distributed with backend: nccl
VOCAB SIZE: 30528
VOCAB SIZE: 30528
VOCAB SIZE: 30528
VOCAB SIZE: 30528
DeepSpeed Transformer config is {'layer_id': 0, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 0, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 0, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
DeepSpeed Transformer config is {'layer_id': 0, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ec2-user/.cache/torch_extensions/transformer/build.ninja...
Building extension module transformer...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer...
Time to load transformer op: 0.16208815574645996 seconds
Loading extension module transformer...
Time to load transformer op: 0.2033708095550537 seconds
Loading extension module transformer...
Time to load transformer op: 0.2032008171081543 seconds
Loading extension module transformer...
Time to load transformer op: 0.20306730270385742 seconds
layer #0 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 1, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #0 is created with date type [half].
layer #0 is created with date type [half].
layer #0 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 1, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 1, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 1, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #1 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 2, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #1 is created with date type [half].
layer #1 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 2, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #1 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 2, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 2, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #2 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 3, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #2 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 3, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #2 is created with date type [half].
layer #2 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 3, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 3, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #3 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 4, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #3 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 4, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #3 is created with date type [half].
layer #3 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 4, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 4, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #4 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 5, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #4 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 5, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #4 is created with date type [half].
layer #4 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 5, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 5, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #5 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 6, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #5 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 6, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #5 is created with date type [half].
layer #5 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 6, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
DeepSpeed Transformer config is {'layer_id': 6, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #6 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 7, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #6 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 7, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #6 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 7, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #6 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 7, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #7 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 8, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #7 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 8, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #7 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 8, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #7 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 8, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #8 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 9, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #8 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 9, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #8 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 9, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #8 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 9, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #9 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 10, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #9 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 10, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #9 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 10, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #9 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 10, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #10 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 11, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 2, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #10 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 11, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 1, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #10 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 11, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 3, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
layer #10 is created with date type [half].
DeepSpeed Transformer config is {'layer_id': 11, 'batch_size': 2, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 40, 'attn_dropout_ratio': 0.1, 'hidden_dropout_ratio': 0.1, 'num_hidden_layers': 12, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': 0, 'seed': 42, 'normalize_invertible': False, 'gelu_checkpoint': True, 'adjust_init_range': True, 'test_gemm': False, 'layer_norm_eps': 1e-12, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': True, 'stochastic_mode': False, 'huggingface': False}
Accounting for accumulation on the residual path
layer #11 is created with date type [half].
layer #11 is created with date type [half].
layer #11 is created with date type [half].
layer #11 is created with date type [half].
07/06/2021 19:29:40 - INFO - nvidia.modelingpreln - Init BERT pretrain model
07/06/2021 19:29:40 - INFO - nvidia.modelingpreln - Init BERT pretrain model
07/06/2021 19:29:40 - INFO - nvidia.modelingpreln - Init BERT pretrain model
07/06/2021 19:29:40 - INFO - nvidia.modelingpreln - Init BERT pretrain model
[2021-07-06 19:29:42,470] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.3+edb04db, git-hash=edb04db, git-branch=zhen/pipeline_impr
[2021-07-06 19:29:43,983] [INFO] [utils.py:13:_initialize_parameter_parallel_groups] data_parallel_size: 4, parameter_parallel_size: 4
[2021-07-06 19:29:44,164] [INFO] [utils.py:13:_initialize_parameter_parallel_groups] data_parallel_size: 4, parameter_parallel_size: 4
[2021-07-06 19:29:44,331] [INFO] [utils.py:13:_initialize_parameter_parallel_groups] data_parallel_size: 4, parameter_parallel_size: 4
[2021-07-06 19:29:44,411] [INFO] [utils.py:13:_initialize_parameter_parallel_groups] data_parallel_size: 4, parameter_parallel_size: 4
[2021-07-06 19:29:44,471] [INFO] [engine.py:177:__init__] DeepSpeed Flops Profiler Enabled: False
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ec2-user/.cache/torch_extensions/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.19509577751159668 seconds
Loading extension module fused_adam...
Loading extension module fused_adam...
Time to load fused_adam op: 0.20230555534362793 seconds
Time to load fused_adam op: 0.20225286483764648 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 0.2023007869720459 seconds
[2021-07-06 19:29:44,841] [INFO] [engine.py:707:_configure_optimizer] Using DeepSpeed Optimizer param name adam as basic optimizer
[2021-07-06 19:29:44,841] [INFO] [engine.py:711:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-07-06 19:29:44,841] [INFO] [utils.py:44:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2021-07-06 19:29:44,841] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer
[2021-07-06 19:29:44,841] [INFO] [engine.py:935:_configure_zero_optimizer] Initializing ZeRO Stage 3
[2021-07-06 19:29:44,845] [INFO] [stage3.py:495:__init__] Reduce bucket size 500000000
[2021-07-06 19:29:44,845] [INFO] [stage3.py:496:__init__] Allgather bucket size 50000000
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Emitting ninja build file /home/ec2-user/.cache/torch_extensions/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.18239712715148926 seconds
Loading extension module utils...
Loading extension module utils...
Loading extension module utils...
Time to load utils op: 0.20194101333618164 seconds
Time to load utils op: 0.20187997817993164 seconds
Time to load utils op: 0.2019813060760498 seconds
[2021-07-06 19:29:45,733] [INFO] [logging.py:68:log_dist] [Rank 0] rank=0 time (ms) | init_optimizer_state: 54.60
[2021-07-06 19:29:45,733] [INFO] [stage3.py:686:__init__] optimizer state initialized
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Emitting ninja build file /home/ec2-user/.cache/torch_extensions/communication/build.ninja...
Building extension module communication...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module communication...
Time to load communication op: 0.18649578094482422 seconds
Loading extension module communication...Loading extension module communication...
Loading extension module communication...
Time to load communication op: 0.2023775577545166 secondsTime to load communication op: 0.2023792266845703 seconds
Time to load communication op: 0.20232820510864258 seconds
[2021-07-06 19:29:46,378] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = adam
[2021-07-06 19:29:46,378] [INFO] [engine.py:513:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-07-06 19:29:46,378] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
[2021-07-06 19:29:46,378] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0001, 0.0001], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-07-06 19:29:46,378] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
No modifications detected for re-loaded extension module utils, skipping build step...
No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...
Loading extension module utils...
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Time to load utils op: 0.0003921985626220703 seconds
Time to load utils op: 0.00039267539978027344 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00048732757568359375 seconds
[2021-07-06 19:29:46,378] [INFO] [config.py:904:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] amp_enabled .................. False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] amp_params ................... False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] disable_allgather ............ False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] dump_state ................... False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 1048576, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1}
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_enabled ........... False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] eigenvalue_verbose ........... False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] elasticity_enabled ........... False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] flops_profiler_config ........ {
"enabled": false,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] fp16_enabled ................. True
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] global_rank .................. 0
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] gradient_accumulation_steps .. 8
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] gradient_clipping ............ 1.0
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] initial_dynamic_scale ........ 1048576
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] loss_scale ................... 0
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] memory_breakdown ............. False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] optimizer_name ............... adam
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] optimizer_params ............. {'lr': 0.0001, 'weight_decay': 0.01, 'bias_correction': True, 'eps': 1e-06}
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] pld_enabled .................. False
[2021-07-06 19:29:46,379] [INFO] [config.py:904:print] pld_params ................... False
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] prescale_gradients ........... False
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_groups .............. 1
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_offset .............. 1000
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_period .............. 1000
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_rounding ............ 0
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_start_bits .......... 16
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_target_bits ......... 8
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_training_enabled .... False
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_type ................ 0
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] quantize_verbose ............. False
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] scheduler_name ............... None
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] scheduler_params ............. None
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] sparse_attention ............. None
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] steps_per_print .............. 100
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] tensorboard_enabled .......... False
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] tensorboard_output_path ......
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] train_batch_size ............. 64
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 2
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] use_quantizer_kernel ......... False
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] wall_clock_breakdown ......... True
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] world_size ................... 4
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] zero_allow_untested_optimizer True
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] zero_config .................. {
"stage": 3,
"contiguous_gradients": false,
"reduce_scatter": false,
"reduce_bucket_size": 5.000000e+08,
"allgather_partitions": true,
"allgather_bucket_size": 5.000000e+08,
"overlap_comm": true,
"load_from_fp32_weights": true,
"elastic_checkpoint": true,
"offload_param": null,
"offload_optimizer": null,
"sub_group_size": 1.000000e+09,
"prefetch_bucket_size": 5.000000e+07,
"param_persistence_threshold": 1.000000e+05,
"max_live_parameters": 1.000000e+09,
"max_reuse_distance": 1.000000e+09,
"gather_fp16_weights_on_model_save": false,
"ignore_unused_parameters": true,
"legacy_stage1": false
}
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] zero_enabled ................. True
[2021-07-06 19:29:46,380] [INFO] [config.py:904:print] zero_optimization_stage ...... 3
[2021-07-06 19:29:46,381] [INFO] [config.py:911:print] json = {
"train_batch_size": 64,
"train_micro_batch_size_per_gpu": 2,
"steps_per_print": 100,
"prescale_gradients": false,
"bert_token_file": "bert-large-uncased",
"bert_model_config": {
"vocab_size_or_config_json_file": 3.200300e+04,
"hidden_size": 2.560000e+03,
"num_hidden_layers": 12,
"num_attention_heads": 40,
"intermediate_size": 1.024000e+04,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_position_embeddings": 512,
"initializer_range": 0.02
},
"zero_optimization": {
"stage": 3,
"overlap_comm": true,
"reduce_scatter": false,
"contiguous_gradients": false
},
"zero_allow_untested_optimizer": true,
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.0001,
"weight_decay": 0.01,
"bias_correction": true,
"eps": 1e-06
}
},
"gradient_clipping": 1.0,
"wall_clock_breakdown": true,
"fp16": {
"enabled": true,
"loss_scale": 0,
"initial_scale_power": 20,
"loss_scale_window": 1000
},
"data": {
"flags": {
"pretrain_dataset": true,
"pretrain_type": "wiki_bc"
},
"mixed_seq_datasets": {
"128": {
"pretrain_dataset": "128/wikicorpus_en"
},
"512": {
"pretrain_dataset": "512/wikicorpus_en"
}
}
},
"mixed_seq_training": {
"128": {
"num_epochs": 16,
"warmup_proportion": 0.06,
"learning_rate": 0.011,
"num_workers": 4,
"async_worker": true,
"decay_rate": 0.9,
"decay_step": 250,
"total_training_steps": 0
},
"512": {
"num_epochs": 20,
"warmup_proportion": 0.02,
"learning_rate": 0.002,
"num_workers": 4,
"async_worker": true,
"decay_rate": 0.9,
"decay_step": 150,
"total_training_steps": 1.500000e+04
}
}
}
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0003666877746582031 seconds
07/06/2021 19:29:46 - INFO - turing.logger - NvidiaBertDatasetProvider - Initialization: num_files = 256
07/06/2021 19:29:46 - INFO - turing.logger - Training Epoch: 1
07/06/2021 19:29:51 - INFO - turing.logger - worker-0: begin epoch 1 current_sample_count 0 shard_length 305883 global_data_samples 0
0%| | 0/152942 [00:00<?, ?it/s]
0%| | 0/152613 [00:00<?, ?it/s]
0%| | 0/152966 [00:00<?, ?it/s]
0%| | 0/152794 [00:00<?, ?it/s]/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
0: M0 P[] avail 0.0e+00, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [0]
1: M1 P[] avail 0.0e+00, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [1]
2: M2 P[] avail 0.0e+00, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [2]
-gather param for module 3: {'id': 0, 'status': 'NOT_AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3}}
3: M3 P[0] avail 7.8e+07, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.8e+07, inflight [3]
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
/home/ec2-user/DeepSpeedExamples/bing_bert/nvidia_bert_dataset_provider.py:74: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:962.)
padded_mask_indices = (masked_lm_positions == 0).nonzero()
-gather param for module 4: {'id': 1, 'status': 'NOT_AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
4: M4 P[1] avail 7.9e+07, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 1.3e+06, inflight [4]
-gather param for module 5: {'id': 2, 'status': 'NOT_AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {5}}
5: M5 P[2] avail 7.9e+07, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [5]
-gather param for module 6: {'id': 3, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 6: {'id': 4, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
6: M6 P[3, 4] avail 7.9e+07, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [6]
7: M7 P[] avail 7.9e+07, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [7]
8: M8 P[] avail 7.9e+07, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [8]
-gather param for module 11: {'id': 7, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 8, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 9, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 10, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 11, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 12, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 13, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 14, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 15, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 16, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 17, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 18, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
9: M11 P[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] avail 1.6e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [11]
-gather param for module 12: {'id': 19, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 20, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 21, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 22, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 23, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 24, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 25, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 26, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 27, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 28, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 29, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 30, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
10: M12 P[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] avail 2.4e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [12]
-gather param for module 13: {'id': 31, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 32, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 34, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 35, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 36, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 38, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 40, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 41, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 42, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
11: M13 P[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] avail 3.2e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [13]
-gather param for module 14: {'id': 43, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 44, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 45, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 46, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 47, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 48, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 49, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 50, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 51, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 52, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 53, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 54, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
12: M14 P[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] avail 3.9e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [14]
-gather param for module 15: {'id': 55, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 56, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 57, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 58, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 59, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 60, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 61, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 62, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 63, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 64, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 65, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 66, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
13: M15 P[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66] avail 4.7e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [15]
-gather param for module 16: {'id': 67, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 68, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 69, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 70, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 71, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 72, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 73, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 74, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 75, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 76, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 77, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 78, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
14: M16 P[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78] avail 5.5e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [16]
-gather param for module 17: {'id': 79, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 80, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 81, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 82, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 83, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 84, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 85, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 86, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 87, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 88, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 89, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 90, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
15: M17 P[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] avail 6.3e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [17]
-gather param for module 18: {'id': 91, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 92, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 93, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 94, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 95, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 96, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 97, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 98, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 99, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 100, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 101, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 102, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
16: M18 P[91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] avail 7.1e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [18]
-gather param for module 19: {'id': 103, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 104, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 105, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 106, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 107, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 108, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 109, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 110, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 111, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 112, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 113, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 114, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
17: M19 P[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114] avail 7.9e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [19]
-gather param for module 20: {'id': 115, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 116, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 117, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 118, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 119, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 120, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 121, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 122, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 123, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 124, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 125, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 126, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
18: M20 P[115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126] avail 8.7e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [20]
-gather param for module 21: {'id': 127, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 128, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 129, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 130, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 131, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 132, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 133, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 134, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 135, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 136, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 137, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 138, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
19: M21 P[127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138] avail 9.4e+08, max_avail 5.0e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [21]
-gather param for module 22: {'id': 139, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 140, 'status': 'NOT_AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 141, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 142, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 143, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 144, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 145, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 146, 'status': 'NOT_AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 147, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 148, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 149, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 150, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
20: M22 P[139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150] avail 1.0e+09, max_avail -2.4e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [22]
-gather param for module 9: {'id': 5, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
-gather param for module 9: {'id': 6, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
21: M9 P[5, 6] avail 1.0e+09, max_avail -2.4e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [9]
22: M23 P[] avail 1.0e+09, max_avail -2.4e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [23]
-gather param for module 24: {'id': 151, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
-gather param for module 24: {'id': 152, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {24}}
23: M24 P[151, 152] avail 1.0e+09, max_avail -3.0e+07, queue_sz 0.0e+00, n_inflight 6.6e+06, inflight [24]
24: M25 P[] avail 1.0e+09, max_avail -3.0e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [25]
-gather param for module 26: {'id': 153, 'status': 'NOT_AVAILABLE', 'numel': 30528, 'persist': True, 'active_sub_modules': {26}}
25: M26 P[153] avail 1.0e+09, max_avail -3.0e+07, queue_sz 0.0e+00, n_inflight 3.1e+04, inflight [26]
26: M27 P[] avail 1.0e+09, max_avail -3.0e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [27]
-gather param for module 28: {'id': 154, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
-gather param for module 28: {'id': 155, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {28}}
27: M28 P[154, 155] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 6.6e+06, inflight [28]
-gather param for module 29: {'id': 156, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
-gather param for module 29: {'id': 157, 'status': 'NOT_AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
28: M29 P[156, 157] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [29]
-gather param for module 30: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
29: M30 P[0] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.8e+07, inflight [30]
-gather param for module 31: {'id': 158, 'status': 'NOT_AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {31}}
-gather param for module 31: {'id': 159, 'status': 'NOT_AVAILABLE', 'numel': 2, 'persist': True, 'active_sub_modules': {31}}
30: M31 P[158, 159] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [31]
31: M0 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [0]
32: M25 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [25]
-gather param for module 31: {'id': 158, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {31}}
-gather param for module 31: {'id': 159, 'status': 'AVAILABLE', 'numel': 2, 'persist': True, 'active_sub_modules': {31}}
33: M31 P[158, 159] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [31]
-gather param for module 26: {'id': 153, 'status': 'AVAILABLE', 'numel': 30528, 'persist': True, 'active_sub_modules': {26}}
34: M26 P[153] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 3.1e+04, inflight [26]
-gather param for module 30: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
35: M30 P[0] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.8e+07, inflight [30]
36: M27 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [27]
-gather param for module 29: {'id': 156, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
-gather param for module 29: {'id': 157, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
37: M29 P[156, 157] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [29]
-gather param for module 28: {'id': 154, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
-gather param for module 28: {'id': 155, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {28}}
38: M28 P[154, 155] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 6.6e+06, inflight [28]
39: M1 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [1]
40: M23 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [23]
-gather param for module 24: {'id': 151, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
-gather param for module 24: {'id': 152, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {24}}
41: M24 P[151, 152] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 6.6e+06, inflight [24]
-gather param for module 9: {'id': 5, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
-gather param for module 9: {'id': 6, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
42: M9 P[5, 6] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [9]
-gather param for module 22: {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 140, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 142, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 143, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 144, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 146, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 148, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 149, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 150, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
43: M22 P[139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [22]
-gather param for module 21: {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 128, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 130, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 131, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 132, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 134, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 136, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 137, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 138, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
44: M21 P[127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [21]
-gather param for module 20: {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 116, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 118, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 119, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 120, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 122, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 124, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 125, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 126, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
45: M20 P[115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [20]
-gather param for module 19: {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 104, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 106, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 107, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 108, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 110, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 112, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 113, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 114, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
46: M19 P[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [19]
-gather param for module 18: {'id': 91, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 92, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 93, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 94, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 95, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 96, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 97, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 98, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 99, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 100, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 101, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 102, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
47: M18 P[91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [18]
-gather param for module 17: {'id': 79, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 80, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 81, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 82, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 83, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 84, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 85, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 86, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 87, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 88, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 89, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 90, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
48: M17 P[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [17]
-gather param for module 16: {'id': 67, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 68, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 69, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 70, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 71, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 72, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 73, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 74, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 75, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 76, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 77, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 78, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
49: M16 P[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [16]
-gather param for module 15: {'id': 55, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 56, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 57, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 58, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 59, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 60, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 61, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 62, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 63, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 64, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 65, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 66, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
50: M15 P[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [15]
-gather param for module 14: {'id': 43, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 44, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 45, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 46, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 47, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 48, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 49, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 50, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 51, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 52, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 53, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 54, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
51: M14 P[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [14]
-gather param for module 13: {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 32, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 34, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 35, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 36, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 38, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 40, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 41, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 42, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
52: M13 P[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [13]
-gather param for module 12: {'id': 19, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 20, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 21, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 22, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 23, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 24, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 25, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 26, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 27, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 28, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 29, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 30, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
53: M12 P[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [12]
-gather param for module 11: {'id': 7, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 8, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 9, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 10, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 11, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 12, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 13, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 14, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 15, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 16, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 17, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 18, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
54: M11 P[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.9e+07, inflight [11]
55: M2 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [2]
56: M7 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 0.0e+00, inflight [7]
-gather param for module 6: {'id': 3, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 6: {'id': 4, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
57: M6 P[3, 4] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [6]
-gather param for module 5: {'id': 2, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {5}}
58: M5 P[2] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 5.1e+03, inflight [5]
-gather param for module 4: {'id': 1, 'status': 'AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
59: M4 P[1] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 1.3e+06, inflight [4]
-gather param for module 3: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
60: M3 P[0] avail 1.0e+09, max_avail -3.7e+07, queue_sz 0.0e+00, n_inflight 7.8e+07, inflight [3]
-release param: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 1, 'status': 'AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 7, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 9, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 13, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 15, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 19, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 21, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 25, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 27, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 43, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 45, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 49, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 51, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 55, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 57, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 61, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 63, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 67, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 69, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 73, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 75, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 79, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 81, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 85, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 87, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 91, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 93, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 97, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 99, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 151, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 154, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
[2021-07-06 19:29:52,749] [INFO] [logging.py:68:log_dist] [Rank 0] rank=0 time (ms) | forward_microstep: 624.36 | backward_microstep: 827.56 | backward_inner_microstep: 768.84 | backward_allreduce_microstep: 58.66 | step_microstep: 0.04
0%| | 1/152794 [00:01<64:17:22, 1.51s/it]
0%| | 1/152966 [00:01<66:51:16, 1.57s/it]
0%| | 1/152613 [00:01<67:10:00, 1.58s/it][2021-07-06 19:29:52,749] [INFO] [logging.py:68:log_dist] [Rank 0] rank=0 time (ms) | forward: 624.34 | backward: 827.54 | backward_inner: 768.77 | backward_allreduce: 58.64 | step: 0.02
0%| | 1/152942 [00:01<68:33:34, 1.61s/it]0: M0 P[] avail 4.6e+05, max_avail 5.0e+07, queue_sz 6.1e+01, n_inflight 0.0e+00, inflight [0]
-gather param for module 3: {'id': 0, 'status': 'NOT_AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3}}
1: M1 P[] avail 7.9e+07, max_avail 5.0e+07, queue_sz 5.7e+01, n_inflight 7.8e+07, inflight [1, 3, 2]
2: M2 P[] avail 7.9e+07, max_avail 5.0e+07, queue_sz 5.7e+01, n_inflight 7.8e+07, inflight [2, 3]
3: M3 P[0] avail 7.9e+07, max_avail 5.0e+07, queue_sz 5.7e+01, n_inflight 7.8e+07, inflight [3]
-gather param for module 4: {'id': 1, 'status': 'NOT_AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
4: M4 P[1] avail 8.0e+07, max_avail 5.0e+07, queue_sz 5.7e+01, n_inflight 1.3e+06, inflight [4]
-gather param for module 5: {'id': 2, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {5}}
-gather param for module 6: {'id': 3, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 6: {'id': 4, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 11: {'id': 7, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 8, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 9, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 10, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 11, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 12, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 13, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 14, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 15, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 16, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 17, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 18, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
5: M5 P[2] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [8, 5, 6, 7, 11]
6: M6 P[3, 4] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [8, 7, 11, 6]
7: M7 P[] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [8, 11, 7]
8: M8 P[] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [8, 11]
9: M11 P[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] avail 1.6e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [11]
-gather param for module 12: {'id': 19, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 20, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 21, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 22, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 23, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 24, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 25, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 26, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 27, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 28, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 29, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 30, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
10: M12 P[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] avail 2.4e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [12]
-gather param for module 13: {'id': 31, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 32, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 34, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 35, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 36, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 38, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 40, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 41, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 42, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
11: M13 P[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] avail 3.2e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [13]
-gather param for module 14: {'id': 43, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 44, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 45, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 46, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 47, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 48, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 49, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 50, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 51, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 52, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 53, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 54, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
12: M14 P[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] avail 3.9e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [14]
-gather param for module 15: {'id': 55, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 56, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 57, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 58, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 59, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 60, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 61, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 62, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 63, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 64, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 65, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 66, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
13: M15 P[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66] avail 4.7e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [15]
-gather param for module 16: {'id': 67, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 68, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 69, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 70, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 71, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 72, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 73, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 74, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 75, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 76, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 77, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 78, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
14: M16 P[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78] avail 5.5e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [16]
-gather param for module 17: {'id': 79, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 80, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 81, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 82, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 83, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 84, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 85, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 86, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 87, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 88, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 89, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 90, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
15: M17 P[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] avail 6.3e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [17]
-gather param for module 18: {'id': 91, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 92, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 93, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 94, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 95, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 96, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 97, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 98, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 99, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 100, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 101, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 102, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
16: M18 P[91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] avail 7.1e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [18]
-gather param for module 19: {'id': 103, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 104, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 105, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 106, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 107, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 108, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 109, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 110, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 111, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 112, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 113, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 114, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
17: M19 P[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114] avail 7.9e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [19]
-gather param for module 20: {'id': 115, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 116, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 117, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 118, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 119, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 120, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 121, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 122, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 123, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 124, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 125, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 126, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
18: M20 P[115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126] avail 8.7e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [20]
-gather param for module 21: {'id': 127, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 128, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 129, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 130, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 131, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 132, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 133, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 134, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 135, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 136, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 137, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 138, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
19: M21 P[127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138] avail 9.4e+08, max_avail 5.0e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [21]
-gather param for module 22: {'id': 139, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 140, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 141, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 142, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 143, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 144, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 145, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 146, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 147, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 148, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 149, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 150, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
20: M22 P[139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150] avail 1.0e+09, max_avail -2.4e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [22]
-gather param for module 9: {'id': 5, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
-gather param for module 9: {'id': 6, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
21: M9 P[5, 6] avail 1.0e+09, max_avail -2.4e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [9]
22: M23 P[] avail 1.0e+09, max_avail -2.4e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [23]
-gather param for module 24: {'id': 151, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
-gather param for module 24: {'id': 152, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {24}}
23: M24 P[151, 152] avail 1.0e+09, max_avail -3.0e+07, queue_sz 5.1e+01, n_inflight 6.6e+06, inflight [24]
24: M25 P[] avail 1.0e+09, max_avail -3.0e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [25]
-gather param for module 26: {'id': 153, 'status': 'AVAILABLE', 'numel': 30528, 'persist': True, 'active_sub_modules': {26}}
25: M26 P[153] avail 1.0e+09, max_avail -3.0e+07, queue_sz 5.1e+01, n_inflight 3.1e+04, inflight [26]
26: M27 P[] avail 1.0e+09, max_avail -3.0e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [27]
-gather param for module 28: {'id': 154, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
-gather param for module 28: {'id': 155, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {28}}
27: M28 P[154, 155] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 6.6e+06, inflight [28]
-gather param for module 29: {'id': 156, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
-gather param for module 29: {'id': 157, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
28: M29 P[156, 157] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [29]
-gather param for module 30: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
29: M30 P[0] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.8e+07, inflight [30]
-gather param for module 31: {'id': 158, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {31}}
-gather param for module 31: {'id': 159, 'status': 'AVAILABLE', 'numel': 2, 'persist': True, 'active_sub_modules': {31}}
30: M31 P[158, 159] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [31]
31: M0 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [0]
32: M25 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [25]
-gather param for module 31: {'id': 158, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {31}}
-gather param for module 31: {'id': 159, 'status': 'AVAILABLE', 'numel': 2, 'persist': True, 'active_sub_modules': {31}}
33: M31 P[158, 159] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [31]
-gather param for module 26: {'id': 153, 'status': 'AVAILABLE', 'numel': 30528, 'persist': True, 'active_sub_modules': {26}}
34: M26 P[153] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 3.1e+04, inflight [26]
-gather param for module 30: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
35: M30 P[0] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.8e+07, inflight [30]
36: M27 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [27]
-gather param for module 29: {'id': 156, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
-gather param for module 29: {'id': 157, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {29}}
37: M29 P[156, 157] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [29]
-gather param for module 28: {'id': 154, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {28}}
-gather param for module 28: {'id': 155, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {28}}
38: M28 P[154, 155] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 6.6e+06, inflight [28]
39: M1 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [1]
40: M23 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [23]
-gather param for module 24: {'id': 151, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {24}}
-gather param for module 24: {'id': 152, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {24}}
41: M24 P[151, 152] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 6.6e+06, inflight [24]
-gather param for module 9: {'id': 5, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
-gather param for module 9: {'id': 6, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {9}}
42: M9 P[5, 6] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [9]
-gather param for module 22: {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 140, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 142, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 143, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 144, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 146, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 148, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 149, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 150, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
43: M22 P[139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [22]
-gather param for module 21: {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 128, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 130, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 131, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 132, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 134, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 136, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 137, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 138, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
44: M21 P[127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [21]
-gather param for module 20: {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 116, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 118, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 119, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 120, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 122, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 124, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 125, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 126, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
45: M20 P[115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [20]
-gather param for module 19: {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 104, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 106, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 107, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 108, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 110, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 112, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 113, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 114, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
46: M19 P[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [19]
-gather param for module 18: {'id': 91, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 92, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 93, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 94, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 95, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 96, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 97, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 98, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 99, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 100, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 101, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 102, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
47: M18 P[91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [18]
-gather param for module 17: {'id': 79, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 80, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 81, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 82, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 83, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 84, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 85, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 86, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 87, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 88, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 89, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 90, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
48: M17 P[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [17]
-gather param for module 16: {'id': 67, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 68, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 69, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 70, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 71, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 72, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 73, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 74, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 75, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 76, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 77, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 78, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
49: M16 P[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [16]
-gather param for module 15: {'id': 55, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 56, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 57, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 58, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 59, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 60, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 61, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 62, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 63, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 64, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 65, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 66, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
50: M15 P[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [15]
-gather param for module 14: {'id': 43, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 44, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 45, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 46, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 47, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 48, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 49, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 50, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 51, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 52, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 53, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 54, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
51: M14 P[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [14]
-gather param for module 13: {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 32, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 34, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 35, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 36, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 38, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 40, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 41, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 42, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
52: M13 P[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [13]
-gather param for module 12: {'id': 19, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 20, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 21, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 22, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 23, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 24, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 25, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 26, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 27, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 28, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 29, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 30, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
53: M12 P[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [12]
-gather param for module 11: {'id': 7, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 8, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 9, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 10, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 11, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 12, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 13, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 14, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 15, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 16, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 17, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 18, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
54: M11 P[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.9e+07, inflight [11]
55: M2 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [2]
56: M7 P[] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 0.0e+00, inflight [7]
-gather param for module 6: {'id': 3, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 6: {'id': 4, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
57: M6 P[3, 4] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [6]
-gather param for module 5: {'id': 2, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {5}}
58: M5 P[2] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 5.1e+03, inflight [5]
-gather param for module 4: {'id': 1, 'status': 'AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
59: M4 P[1] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 1.3e+06, inflight [4]
-gather param for module 3: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3, 30}}
60: M3 P[0] avail 1.0e+09, max_avail -3.7e+07, queue_sz 5.1e+01, n_inflight 7.8e+07, inflight [3]
-release param: {'id': 0, 'status': 'AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 1, 'status': 'AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 7, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 9, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 13, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 15, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 19, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 21, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 25, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 27, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 31, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 33, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 37, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 39, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 43, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 45, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 49, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 51, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 55, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 57, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 61, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 63, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 67, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 69, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 73, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 75, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 79, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 81, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 85, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 87, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 91, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 93, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 97, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 99, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 103, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 105, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 109, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 111, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 115, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 117, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 121, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 123, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 127, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 129, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 133, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 135, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 139, 'status': 'AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 141, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 145, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 147, 'status': 'AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 151, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
-release param: {'id': 154, 'status': 'AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': set()}
[2021-07-06 19:29:54,295] [INFO] [logging.py:68:log_dist] [Rank 0] rank=0 time (ms) | forward_microstep: 730.98 | backward_microstep: 813.25 | backward_inner_microstep: 754.98 | backward_allreduce_microstep: 58.22 | step_microstep: 0.03
0%| | 2/152966 [00:03<65:41:13, 1.55s/it][2021-07-06 19:29:54,295] [INFO] [logging.py:68:log_dist] [Rank 0] rank=0 time (ms) | forward: 730.96 | backward: 813.24 | backward_inner: 754.94 | backward_allreduce: 58.20 | step: 0.02
0%| | 2/152794 [00:03<65:36:47, 1.55s/it]
0%| | 2/152613 [00:03<65:32:14, 1.55s/it]
0%| | 2/152942 [00:03<65:40:07, 1.55s/it]0: M0 P[] avail 4.6e+05, max_avail 5.0e+07, queue_sz 1.2e+02, n_inflight 0.0e+00, inflight [0]
-gather param for module 3: {'id': 0, 'status': 'NOT_AVAILABLE', 'numel': 78151680, 'persist': False, 'active_sub_modules': {3}}
1: M1 P[] avail 7.9e+07, max_avail 5.0e+07, queue_sz 1.2e+02, n_inflight 7.8e+07, inflight [1, 3, 2]
2: M2 P[] avail 7.9e+07, max_avail 5.0e+07, queue_sz 1.2e+02, n_inflight 7.8e+07, inflight [2, 3]
3: M3 P[0] avail 7.9e+07, max_avail 5.0e+07, queue_sz 1.2e+02, n_inflight 7.8e+07, inflight [3]
-gather param for module 4: {'id': 1, 'status': 'NOT_AVAILABLE', 'numel': 1310720, 'persist': False, 'active_sub_modules': {4}}
4: M4 P[1] avail 8.0e+07, max_avail 5.0e+07, queue_sz 1.2e+02, n_inflight 1.3e+06, inflight [4]
-gather param for module 5: {'id': 2, 'status': 'AVAILABLE', 'numel': 5120, 'persist': True, 'active_sub_modules': {5}}
-gather param for module 6: {'id': 3, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 6: {'id': 4, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {6}}
-gather param for module 11: {'id': 7, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 8, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 9, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 10, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 11, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 12, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 13, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 14, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 15, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 16, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 17, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
-gather param for module 11: {'id': 18, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {11}}
5: M5 P[2] avail 1.6e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [8, 5, 6, 7, 11]
6: M6 P[3, 4] avail 1.6e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [8, 7, 11, 6]
7: M7 P[] avail 1.6e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [8, 11, 7]
8: M8 P[] avail 1.6e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [8, 11]
9: M11 P[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] avail 1.6e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [11]
-gather param for module 12: {'id': 19, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 20, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 21, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 22, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 23, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 24, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 25, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 26, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 27, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 28, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 29, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
-gather param for module 12: {'id': 30, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {12}}
10: M12 P[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] avail 2.4e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [12]
-gather param for module 13: {'id': 31, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 32, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 33, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 34, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 35, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 36, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 37, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 38, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 39, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 40, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 41, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
-gather param for module 13: {'id': 42, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {13}}
11: M13 P[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] avail 3.2e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [13]
-gather param for module 14: {'id': 43, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 44, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 45, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 46, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 47, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 48, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 49, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 50, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 51, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 52, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 53, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
-gather param for module 14: {'id': 54, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {14}}
12: M14 P[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] avail 3.9e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [14]
-gather param for module 15: {'id': 55, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 56, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 57, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 58, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 59, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 60, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 61, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 62, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 63, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 64, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 65, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
-gather param for module 15: {'id': 66, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {15}}
13: M15 P[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66] avail 4.7e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [15]
-gather param for module 16: {'id': 67, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 68, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 69, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 70, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 71, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 72, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 73, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 74, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 75, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 76, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 77, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
-gather param for module 16: {'id': 78, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {16}}
14: M16 P[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78] avail 5.5e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [16]
-gather param for module 17: {'id': 79, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 80, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 81, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 82, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 83, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 84, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 85, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 86, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 87, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 88, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 89, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
-gather param for module 17: {'id': 90, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {17}}
15: M17 P[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] avail 6.3e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [17]
-gather param for module 18: {'id': 91, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 92, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 93, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 94, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 95, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 96, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 97, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 98, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 99, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 100, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 101, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
-gather param for module 18: {'id': 102, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {18}}
16: M18 P[91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] avail 7.1e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [18]
-gather param for module 19: {'id': 103, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 104, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 105, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 106, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 107, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 108, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 109, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 110, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 111, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 112, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 113, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
-gather param for module 19: {'id': 114, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {19}}
17: M19 P[103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114] avail 7.9e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [19]
-gather param for module 20: {'id': 115, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 116, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 117, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 118, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 119, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 120, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 121, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 122, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 123, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 124, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 125, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
-gather param for module 20: {'id': 126, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {20}}
18: M20 P[115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126] avail 8.7e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [20]
-gather param for module 21: {'id': 127, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 128, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 129, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 130, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 131, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 132, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 133, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 134, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 135, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 136, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 137, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
-gather param for module 21: {'id': 138, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {21}}
19: M21 P[127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138] avail 9.4e+08, max_avail 5.0e+07, queue_sz 1.1e+02, n_inflight 7.9e+07, inflight [21]
-gather param for module 22: {'id': 139, 'status': 'NOT_AVAILABLE', 'numel': 19660800, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 140, 'status': 'AVAILABLE', 'numel': 7680, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 141, 'status': 'NOT_AVAILABLE', 'numel': 6553600, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 142, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 143, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 144, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 145, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 146, 'status': 'AVAILABLE', 'numel': 10240, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 147, 'status': 'NOT_AVAILABLE', 'numel': 26214400, 'persist': False, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 148, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 149, 'status': 'AVAILABLE', 'numel': 2560, 'persist': True, 'active_sub_modules': {22}}
-gather param for module 22: {'id': 150, 'status': 'AVAILABLE', 'numel': 2560, 'pe