Skip to content

Instantly share code, notes, and snippets.

@trevor-m
Created June 2, 2023 22:36
Show Gist options
  • Save trevor-m/37fcb9ed26557cc221dfa80ac1e961a7 to your computer and use it in GitHub Desktop.
Save trevor-m/37fcb9ed26557cc221dfa80ac1e961a7 to your computer and use it in GitHub Desktop.
PAXML + PJRT Segfault with LOGGING_ENABLED
2023-06-02 22:27:23.476734: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu
2023-06-02 22:27:30.192610: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu
2023-06-02 22:27:30.192664: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu
2023-06-02 22:27:30.192699: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu
2023-06-02 22:27:30.192733: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu
2023-06-02 22:27:30.218344: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu
2023-06-02 22:27:30.218504: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[IREE-PJRT] DEBUG: Using IREE compiler binary: /usr/local/lib/python3.10/dist-packages/iree/compiler/_mlir_libs/libIREECompiler.so
[IREE-PJRT] DEBUG: Compiler Version: 20230601.538 @ 77ac727c7f6cbe32ca8972b8c69bd32cba17b690 (API version 1.2)
[IREE-PJRT] DEBUG: Using partitioner binary: /workspace/openxla-pjrt-plugin/bazel-bin/partitioner/libOpenXLAPartitioner.so
[IREE-PJRT] DEBUG: Partitioner version: <unknown> (API version 1.1)
[IREE-PJRT] DEBUG: CUDA driver created
I0602 22:27:30.226667 140467571585024 setup_jax.py:74] JAX process: 0 / 1
I0602 22:27:30.226812 140467571585024 setup_jax.py:75] JAX devices: [GPU-b0fbccec-7593-9c0c-35de-cbfc04b9d09a]
I0602 22:27:30.227033 140467571585024 setup_jax.py:76] jax.device_count(): 1
I0602 22:27:30.227144 140467571585024 setup_jax.py:77] jax.local_device_count(): 1
I0602 22:27:30.227181 140467571585024 setup_jax.py:78] jax.process_count(): 1
Registered experiment `paxml.tasks.lm.params.lm_cloud.LargeMlp`
Registered experiment `paxml.tasks.lm.params.lm_cloud.SmallMlp`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudTransformerAdam`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudTransformerAdamTest`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudTransformerAdamLimitSteps`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdTest`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd2B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd2BLimitSteps`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd32B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd64B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd128B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd256B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd512B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd1024B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipeline9B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipeline175B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdMultislice2B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipelineMultislice2B`
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipelineMultislice2BCircular`
Registered experiment `paxml.tasks.lm.params.c4.LmCloudSpmdAdam`
Registered experiment `paxml.tasks.lm.params.c4.LmCloudSpmdAdamLimitSteps`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdAdam`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdGpt3AdamOrgHPBS1p5k1536Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineAdam`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamOrgHPBS1p5k768Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS1p5k768Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS2k512Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS3k768Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS4k1024Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS8k1024Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd1BAdam4Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd1BAdam4ReplicasLimitSteps`
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd2BAdam4Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd16BAdam32Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd32BAdam64Replicas`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdGpt3L16AdamOrgHP`
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3SmallAdam8Replicas`
W0602 22:27:31.231858 140467571585024 gpu_fast_attention.py:41] jax_triton not found, please `pip install jax-triton`
Registered experiment `tasks.lm.params.nvidia.NVIDIA1_3B`
Registered experiment `tasks.lm.params.nvidia.NVIDIA5B`
Registered experiment `tasks.lm.params.nvidia.NVIDIA8_3B`
Registered experiment `tasks.lm.params.nvidia.NVIDIA10B`
Registered experiment `tasks.lm.params.nvidia.NVIDIA40BProxy`
Registered experiment `tasks.lm.params.nvidia.NVIDIA70BProxy`
Registered experiment `tasks.lm.params.nvidia.NVIDIA116BProxy`
Registered experiment `tasks.lm.params.nvidia.NVIDIA175BProxy`
Registered experiment `tasks.lm.params.nvidia.TestSmallConfig`
Registered experiment `tasks.lm.params.nvidia.NVIDIA1_3BPmap`
I0602 22:27:31.236617 140467571585024 local.py:45] Setting task status: process_index: 0, process_count: 1
I0602 22:27:31.236836 140467571585024 local.py:50] Created artifact job_log_dir of type ArtifactType.DIRECTORY and value log_NVIDIA1_3BPmap.
I0602 22:27:31.590877 140467571585024 local.py:45] Setting task status: Train experiment tasks.lm.params.nvidia.NVIDIA1_3BPmap at log_NVIDIA1_3BPmap
I0602 22:27:31.590993 140467571585024 train.py:139] [PAX STATUS] Starting `train_and_evaluate`
I0602 22:27:31.698075 140467571585024 train.py:146] [PAX STATUS] Obtaining and initializing datasets.
I0602 22:27:31.699886 140467571585024 train.py:162] [PAX STATUS]: Done initializing dataset objects
I0602 22:27:31.699933 140467571585024 train.py:164] train_input_p:
I0602 22:27:31.700371 140467571585024 train.py:168] allow_fixed_file_random_seed : False
I0602 22:27:31.700420 140467571585024 train.py:168] batch_padding_size : 0
I0602 22:27:31.700451 140467571585024 train.py:168] batch_size : NoneType
I0602 22:27:31.700481 140467571585024 train.py:168] cls : type/praxis.base_input/LingvoInputAdaptor
I0602 22:27:31.700511 140467571585024 train.py:168] cluster_do_eval : False
I0602 22:27:31.700541 140467571585024 train.py:168] custom_device_order : NoneType
I0602 22:27:31.700570 140467571585024 train.py:168] eval_loop_num_batches : 1
I0602 22:27:31.700599 140467571585024 train.py:168] experimental_remote_input : False
I0602 22:27:31.700628 140467571585024 train.py:168] infeed_host_index : 0
I0602 22:27:31.700657 140467571585024 train.py:168] input.activation_split_dims_mapping : NoneType
I0602 22:27:31.700687 140467571585024 train.py:168] input.add_name_to_theta : False
I0602 22:27:31.700716 140467571585024 train.py:168] input.allow_implicit_capture : NoneType
I0602 22:27:31.700745 140467571585024 train.py:168] input.batch_size : 1
I0602 22:27:31.700774 140467571585024 train.py:168] input.cls : type/paxml.tasks.lm.input_generator/SyntheticLmData
I0602 22:27:31.700803 140467571585024 train.py:168] input.decoder_samples_per_summary : NoneType
I0602 22:27:31.700833 140467571585024 train.py:168] input.device_mesh : NoneType
I0602 22:27:31.700862 140467571585024 train.py:168] input.dtype : float32
I0602 22:27:31.700891 140467571585024 train.py:168] input.eval_samples_per_summary : NoneType
I0602 22:27:31.700920 140467571585024 train.py:168] input.file_datasource : NoneType
I0602 22:27:31.700949 140467571585024 train.py:168] input.filter_sparse_tensors : False
I0602 22:27:31.700978 140467571585024 train.py:168] input.fprop_dtype : NoneType
I0602 22:27:31.701008 140467571585024 train.py:168] input.inference_driver_name : NoneType
I0602 22:27:31.701036 140467571585024 train.py:168] input.input_stats_summary_interval_steps : 10
I0602 22:27:31.701066 140467571585024 train.py:168] input.is_inference : NoneType
I0602 22:27:31.701095 140467571585024 train.py:168] input.name : 'input'
I0602 22:27:31.701132 140467571585024 train.py:168] input.num_partitions : NoneType
I0602 22:27:31.701162 140467571585024 train.py:168] input.num_samples : 0
I0602 22:27:31.701192 140467571585024 train.py:168] input.outfeed_in_logical_order : False
I0602 22:27:31.701222 140467571585024 train.py:168] input.params_init.custom_v_init : NoneType
I0602 22:27:31.701251 140467571585024 train.py:168] input.params_init.method : 'xavier'
I0602 22:27:31.701285 140467571585024 train.py:168] input.params_init.scale : 1.000001
I0602 22:27:31.701314 140467571585024 train.py:168] input.params_init.seed : NoneType
I0602 22:27:31.701343 140467571585024 train.py:168] input.random_seed : NoneType
I0602 22:27:31.701372 140467571585024 train.py:168] input.remote.max_inflights_per_target : 32
I0602 22:27:31.701400 140467571585024 train.py:168] input.resettable : False
I0602 22:27:31.701430 140467571585024 train.py:168] input.seq_len : 2048
I0602 22:27:31.701459 140467571585024 train.py:168] input.skip_lp_regularization : NoneType
I0602 22:27:31.701488 140467571585024 train.py:168] input.tpu_embedding_mode : 'train'
I0602 22:27:31.701517 140467571585024 train.py:168] input.tpu_infeed_parallelism : 1
I0602 22:27:31.701546 140467571585024 train.py:168] input.use_partitioned_infeed_queue : False
I0602 22:27:31.701575 140467571585024 train.py:168] input.use_per_core_infeed : False
I0602 22:27:31.701603 140467571585024 train.py:168] input.use_per_host_infeed : False
I0602 22:27:31.701632 140467571585024 train.py:168] input.vn.deterministic : NoneType
I0602 22:27:31.701662 140467571585024 train.py:168] input.vn.global_vn : False
I0602 22:27:31.701690 140467571585024 train.py:168] input.vn.per_step_vn : False
I0602 22:27:31.701719 140467571585024 train.py:168] input.vn.scale : NoneType
I0602 22:27:31.701749 140467571585024 train.py:168] input.vn.seed : NoneType
I0602 22:27:31.701778 140467571585024 train.py:168] input.vn.start_step : 0
I0602 22:27:31.701807 140467571585024 train.py:168] input.weight_split_dims_mapping : NoneType
I0602 22:27:31.701836 140467571585024 train.py:168] input_checkpointing_enabled : False
I0602 22:27:31.701865 140467571585024 train.py:168] input_random_seed : NoneType
I0602 22:27:31.701894 140467571585024 train.py:168] is_training : True
I0602 22:27:31.701923 140467571585024 train.py:168] name : ''
I0602 22:27:31.701952 140467571585024 train.py:168] num_batches : NoneType
I0602 22:27:31.701982 140467571585024 train.py:168] num_infeed_hosts : 0
I0602 22:27:31.702010 140467571585024 train.py:168] reset_for_eval : False
I0602 22:27:31.702039 140467571585024 train.py:168] tf_data_service_address : NoneType
I0602 22:27:31.702069 140467571585024 train.py:169] task_p:
I0602 22:27:31.710068 140467571585024 train.py:171] cls : type/paxml.tasks_lib/SingleTask
I0602 22:27:31.710117 140467571585024 train.py:171] decode.cls : type/paxml.tasks_lib/SingleTask.Decode
I0602 22:27:31.710151 140467571585024 train.py:171] decode.prng_key_fold_with_batch_index : False
I0602 22:27:31.710183 140467571585024 train.py:171] decode.prng_key_fold_with_global_step : True
I0602 22:27:31.710213 140467571585024 train.py:171] decode.random_seed : 1234
I0602 22:27:31.710242 140467571585024 train.py:171] early_stopping_fn : NoneType
I0602 22:27:31.710272 140467571585024 train.py:171] evaluate.apply_mutable_list : ['aux_loss', 'summaries', 'non_trainable']
I0602 22:27:31.710302 140467571585024 train.py:171] evaluate.cls : type/paxml.tasks_lib/SingleTask.Evaluate
I0602 22:27:31.710331 140467571585024 train.py:171] evaluate.random_seed : 1234
I0602 22:27:31.710360 140467571585024 train.py:171] infer.cls : type/paxml.tasks_lib/SingleTask.Infer
I0602 22:27:31.710390 140467571585024 train.py:171] infer.random_seed : 1234
I0602 22:27:31.710419 140467571585024 train.py:171] infer_writer : NoneType
I0602 22:27:31.710449 140467571585024 train.py:171] loss_aggregator : NoneType
I0602 22:27:31.710482 140467571585024 train.py:171] metrics : NoneType
I0602 22:27:31.710511 140467571585024 train.py:171] model.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.710540 140467571585024 train.py:171] model.apply_eval_sample_weights : False
I0602 22:27:31.710570 140467571585024 train.py:171] model.cls : type/praxis.layers.models/LanguageModel
I0602 22:27:31.710598 140467571585024 train.py:171] model.contiguous_submeshes : NoneType
I0602 22:27:31.710628 140467571585024 train.py:171] model.count_tokens : False
I0602 22:27:31.710657 140467571585024 train.py:171] model.dcn_mesh_shape : NoneType
I0602 22:27:31.710686 140467571585024 train.py:171] model.decoder_tpl.cls : type/praxis.decoder_hparams/GreedyDecoderHParams
I0602 22:27:31.710716 140467571585024 train.py:171] model.decoder_tpl.decode_loop_mesh_axes_transpose : NoneType
I0602 22:27:31.710744 140467571585024 train.py:171] model.decoder_tpl.emb_lookup_style : 'matmul'
I0602 22:27:31.710774 140467571585024 train.py:171] model.decoder_tpl.eos_id : 2
I0602 22:27:31.710803 140467571585024 train.py:171] model.decoder_tpl.fprop_for_prefix : False
I0602 22:27:31.710831 140467571585024 train.py:171] model.decoder_tpl.lazy_prefix_broadcast : False
I0602 22:27:31.710860 140467571585024 train.py:171] model.decoder_tpl.max_decode_steps : NoneType
I0602 22:27:31.710889 140467571585024 train.py:171] model.decoder_tpl.min_prefix_len : 5
I0602 22:27:31.710919 140467571585024 train.py:171] model.decoder_tpl.process_result_fn : NoneType
I0602 22:27:31.710948 140467571585024 train.py:171] model.decoder_tpl.seqlen : 0
I0602 22:27:31.710977 140467571585024 train.py:171] model.dtype : type/jax.numpy/float32
I0602 22:27:31.711006 140467571585024 train.py:171] model.fprop_dtype : dtype[float32]
I0602 22:27:31.711036 140467571585024 train.py:171] model.ici_mesh_shape : NoneType
I0602 22:27:31.711064 140467571585024 train.py:171] model.lm_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.711093 140467571585024 train.py:171] model.lm_tpl.cls : type/praxis.layers.transformer_models/TransformerLm
I0602 22:27:31.711122 140467571585024 train.py:171] model.lm_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.711152 140467571585024 train.py:171] model.lm_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.711181 140467571585024 train.py:171] model.lm_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.711210 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.711239 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm
I0602 22:27:31.711268 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.711297 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.711326 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.dim : 0
I0602 22:27:31.711355 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.711384 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.epsilon : 1e-06
I0602 22:27:31.711413 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.fprop_dtype : NoneType
I0602 22:27:31.711442 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.711472 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.mesh_axis_names : NoneType
I0602 22:27:31.711501 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.name : NoneType
I0602 22:27:31.711530 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.711559 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.params_init.method : 'xavier'
I0602 22:27:31.711588 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.params_init.scale : 1.000001
I0602 22:27:31.711617 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.reductions_in_fp32 : False
I0602 22:27:31.711647 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.711676 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.711705 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.use_bias : True
I0602 22:27:31.711734 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.use_scale : True
I0602 22:27:31.711763 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.711792 140467571585024 train.py:171] model.lm_tpl.fprop_dtype : NoneType
I0602 22:27:31.711822 140467571585024 train.py:171] model.lm_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.711851 140467571585024 train.py:171] model.lm_tpl.mesh_axis_names : NoneType
I0602 22:27:31.711880 140467571585024 train.py:171] model.lm_tpl.model_dims : 2048
I0602 22:27:31.711910 140467571585024 train.py:171] model.lm_tpl.model_type : 'causal'
I0602 22:27:31.711939 140467571585024 train.py:171] model.lm_tpl.name : NoneType
I0602 22:27:31.711968 140467571585024 train.py:171] model.lm_tpl.ngrammer_tpl : NoneType
I0602 22:27:31.711998 140467571585024 train.py:171] model.lm_tpl.packed_input : True
I0602 22:27:31.712027 140467571585024 train.py:171] model.lm_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.712056 140467571585024 train.py:171] model.lm_tpl.params_init.method : 'xavier'
I0602 22:27:31.712085 140467571585024 train.py:171] model.lm_tpl.params_init.scale : 1.000001
I0602 22:27:31.712115 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.activation_split_dims_mapping.emb_out_split_dims_mapping : NoneType
I0602 22:27:31.712143 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.712172 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.712201 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.cls : type/praxis.layers.base_ops/ArrayLookup
I0602 22:27:31.712230 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.712260 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.712288 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.712317 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.fprop_dtype : NoneType
I0602 22:27:31.712346 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.712375 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.mesh_axis_names : NoneType
I0602 22:27:31.712404 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.name : NoneType
I0602 22:27:31.712433 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.712462 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.params_init.method : 'xavier'
I0602 22:27:31.712491 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.params_init.scale : 1.000001
I0602 22:27:31.712520 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.712549 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.712578 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.712606 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.cls : type/praxis.layers.embedding_softmax/TrainablePositionalEmbedding
I0602 22:27:31.712636 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.712666 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.712696 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.712725 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.712754 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.712783 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.712812 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.712841 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.712870 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.712898 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.712927 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.712956 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.name : NoneType
I0602 22:27:31.712984 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.713014 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.713042 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.713071 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.713106 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.713136 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.713165 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.embedding_dims : 0
I0602 22:27:31.713195 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.fprop_dtype : NoneType
I0602 22:27:31.713224 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.713253 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.lookup_style : 'matmul'
I0602 22:27:31.713282 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.max_seq_length : 2048
I0602 22:27:31.713311 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.max_timescale : 10000
I0602 22:27:31.713340 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.mesh_axis_names : NoneType
I0602 22:27:31.713370 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.min_timescale : 1
I0602 22:27:31.713398 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.name : NoneType
I0602 22:27:31.713428 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.713457 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.params_init.method : 'xavier'
I0602 22:27:31.713486 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.params_init.scale : 1.000001
I0602 22:27:31.713515 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.713544 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.713574 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.713603 140467571585024 train.py:171] model.lm_tpl.post_attention_ngrammer_tpls : NoneType
I0602 22:27:31.713632 140467571585024 train.py:171] model.lm_tpl.record_activations_in_xent_output : False
I0602 22:27:31.713661 140467571585024 train.py:171] model.lm_tpl.separate_embedding_tpl : NoneType
I0602 22:27:31.713690 140467571585024 train.py:171] model.lm_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.713720 140467571585024 train.py:171] model.lm_tpl.skip_aux_loss : False
I0602 22:27:31.713748 140467571585024 train.py:171] model.lm_tpl.skip_compute_loss : False
I0602 22:27:31.713778 140467571585024 train.py:171] model.lm_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.713807 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.activation_split_dims_mapping.emb_out_split_dims_mapping : NoneType
I0602 22:27:31.713836 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.713865 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.713894 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.cls : type/praxis.layers.base_ops/ArrayLookup
I0602 22:27:31.713923 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.713952 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.713980 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.714009 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.fprop_dtype : NoneType
I0602 22:27:31.714038 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.714067 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.mesh_axis_names : NoneType
I0602 22:27:31.714096 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.name : NoneType
I0602 22:27:31.714125 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.714154 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.params_init.method : 'xavier'
I0602 22:27:31.714183 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.params_init.scale : 1.000001
I0602 22:27:31.714211 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.714240 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.714269 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.714297 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.bi_tempered_loss_tpl : NoneType
I0602 22:27:31.714326 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.bias_init : 0.0
I0602 22:27:31.714355 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.cls : type/praxis.layers.embedding_softmax/SharedEmbeddingSoftmax
I0602 22:27:31.714384 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.714413 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.714442 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.714471 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.714500 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.714529 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.714558 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.714587 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.714616 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.714645 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.714674 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.714703 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.name : NoneType
I0602 22:27:31.714732 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.714761 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.714790 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.714819 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.714848 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.714877 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.714906 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.714936 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.714965 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.cls : type/praxis.layers.activations/ReLU
I0602 22:27:31.714994 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.715023 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.715052 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.715080 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.fprop_dtype : NoneType
I0602 22:27:31.715109 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.715137 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.mesh_axis_names : NoneType
I0602 22:27:31.715166 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.name : NoneType
I0602 22:27:31.715195 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.715224 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.params_init.method : 'xavier'
I0602 22:27:31.715252 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.params_init.scale : 1.000001
I0602 22:27:31.715281 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.715310 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.715339 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.715367 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.bias_init : 0.0
I0602 22:27:31.715396 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.cls : type/praxis.layers.linears/FeedForward
I0602 22:27:31.715424 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.715453 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.715481 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.715510 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.fprop_dtype : NoneType
I0602 22:27:31.715539 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.has_bias : True
I0602 22:27:31.715568 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.715597 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.input_dims : 0
I0602 22:27:31.715626 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.715655 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.cls : type/praxis.layers.linears/Linear
I0602 22:27:31.715683 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.715712 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.715740 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.715769 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.715798 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.715827 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.715856 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.715885 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.715914 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.715942 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.715970 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.715999 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.name : NoneType
I0602 22:27:31.716028 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.716056 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.716085 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.716114 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.716143 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.716171 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.716200 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.fprop_dtype : NoneType
I0602 22:27:31.716228 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.716256 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.input_dims : 0
I0602 22:27:31.716284 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.mesh_axis_names : NoneType
I0602 22:27:31.716312 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.name : NoneType
I0602 22:27:31.716341 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.output_dims : 0
I0602 22:27:31.716369 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.716398 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.params_init.method : 'xavier'
I0602 22:27:31.716427 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.params_init.scale : 1.000001
I0602 22:27:31.716455 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.716484 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.716512 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.weight_init : NoneType
I0602 22:27:31.716540 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.716568 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.mesh_axis_names : NoneType
I0602 22:27:31.716597 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.name : NoneType
I0602 22:27:31.716625 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.output_dims : 0
I0602 22:27:31.716654 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.716682 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.params_init.method : 'xavier'
I0602 22:27:31.716711 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.params_init.scale : 1.000001
I0602 22:27:31.716740 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.716768 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.716797 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.weight_init : NoneType
I0602 22:27:31.716825 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.716854 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.fprop_dtype : NoneType
I0602 22:27:31.716882 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.716911 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.input_dims : 0
I0602 22:27:31.716940 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.label_smoothing_apply_for_eval : True
I0602 22:27:31.716969 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.label_smoothing_prob : 0.0
I0602 22:27:31.716998 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.lookup_style : 'index'
I0602 22:27:31.717026 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.mesh_axis_names : NoneType
I0602 22:27:31.717056 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.name : NoneType
I0602 22:27:31.717084 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.num_classes : 0
I0602 22:27:31.717129 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.params_init.method : 'gaussian'
I0602 22:27:31.717163 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.params_init.scale : 0.022097086912079608
I0602 22:27:31.717195 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.scale_sqrt_depth : True
I0602 22:27:31.717224 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.717253 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.717282 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.soft_cap_logits : 30.0
I0602 22:27:31.717311 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.717340 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.z_loss_weight : 0.0
I0602 22:27:31.717369 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.717397 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.717426 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.atten_dropout_prob : NoneType
I0602 22:27:31.717455 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.cls : type/praxis.layers.transformers/StackedTransformer
I0602 22:27:31.717484 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.contiguous_submeshes : NoneType
I0602 22:27:31.717512 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dcn_mesh_shape : NoneType
I0602 22:27:31.717541 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dim_per_head : 64
I0602 22:27:31.717570 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dropout_prob : 0.0
I0602 22:27:31.717600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dtype : type/jax.numpy/float32
I0602 22:27:31.717628 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.fold_padding_with_segment_mask : False
I0602 22:27:31.717657 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.fprop_dtype : NoneType
I0602 22:27:31.717686 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.gating_func : 'top2'
I0602 22:27:31.717715 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.hidden_dims : 8192
I0602 22:27:31.717744 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.ici_mesh_shape : NoneType
I0602 22:27:31.717772 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.input_dropout_prob : 0.0
I0602 22:27:31.717801 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.mask_self_attention : False
I0602 22:27:31.717830 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.mesh_axis_names : NoneType
I0602 22:27:31.717860 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.min_group_size : NoneType
I0602 22:27:31.717889 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.model_dims : 2048
I0602 22:27:31.717918 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.egch : NoneType
I0602 22:27:31.717947 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.egcm : NoneType
I0602 22:27:31.717976 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gec : NoneType
I0602 22:27:31.718006 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gecm : NoneType
I0602 22:27:31.718035 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gecs : NoneType
I0602 22:27:31.718063 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gs : NoneType
I0602 22:27:31.718092 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gsec : NoneType
I0602 22:27:31.718121 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gsm : NoneType
I0602 22:27:31.718151 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.718180 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.718209 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.cls : type/praxis.layers.activations/ReLU
I0602 22:27:31.718238 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.718267 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.718296 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.718325 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.fprop_dtype : NoneType
I0602 22:27:31.718354 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.718383 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.mesh_axis_names : NoneType
I0602 22:27:31.718412 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.name : NoneType
I0602 22:27:31.718441 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.718470 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.params_init.method : 'xavier'
I0602 22:27:31.718499 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.params_init.scale : 1.000001
I0602 22:27:31.718529 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.718558 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.718587 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.718616 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.add_skip_connection : True
I0602 22:27:31.718645 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.apply_padding_first : False
I0602 22:27:31.718673 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.cls : type/praxis.layers.transformers/TransformerFeedForwardMoe
I0602 22:27:31.718702 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.718730 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.718759 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.718787 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.expert_capacity_dim : 0
I0602 22:27:31.718815 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.expert_weight_shards : 1
I0602 22:27:31.718844 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.explicit_fan_in_fan_out_axes : False
I0602 22:27:31.718872 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.fprop_dtype : NoneType
I0602 22:27:31.718901 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.gating_func : 'top2'
I0602 22:27:31.718930 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.gating_logit_cap : 0.0
I0602 22:27:31.718958 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.hidden_dims : 0
I0602 22:27:31.718987 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.719016 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.input_dims : 0
I0602 22:27:31.719044 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.internal_gshard_variance_scaling_fan_in_init : True
I0602 22:27:31.719073 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.719101 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm
I0602 22:27:31.719131 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.719160 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.719189 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.dim : 0
I0602 22:27:31.719217 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.719246 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.epsilon : 1e-06
I0602 22:27:31.719275 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.fprop_dtype : NoneType
I0602 22:27:31.719304 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.719333 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.mesh_axis_names : NoneType
I0602 22:27:31.719362 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.name : NoneType
I0602 22:27:31.719390 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.719419 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.params_init.method : 'xavier'
I0602 22:27:31.719448 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.params_init.scale : 1.000001
I0602 22:27:31.719476 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.reductions_in_fp32 : False
I0602 22:27:31.719505 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.719533 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.719562 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.use_bias : True
I0602 22:27:31.719591 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.use_scale : True
I0602 22:27:31.719619 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.719648 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.mesh_axis_names : NoneType
I0602 22:27:31.719677 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.min_group_size : NoneType
I0602 22:27:31.719705 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.moe_gating_embedding_level : 'token'
I0602 22:27:31.719734 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.moe_load_balance_loss_weight : 1.0
I0602 22:27:31.719763 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.name : NoneType
I0602 22:27:31.719791 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.norm_policy : 'pre'
I0602 22:27:31.719819 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.num_experts : 0
I0602 22:27:31.719848 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.num_groups : 0
I0602 22:27:31.719876 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.719905 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.params_init.method : 'xavier'
I0602 22:27:31.719933 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.params_init.scale : 1.000001
I0602 22:27:31.719962 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_prob : 0.0
I0602 22:27:31.719990 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.720020 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout
I0602 22:27:31.720048 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.720077 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.720107 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.dropout_at_eval : False
I0602 22:27:31.720135 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.720164 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.fprop_dtype : NoneType
I0602 22:27:31.720193 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.720222 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.keep_prob : 1.0
I0602 22:27:31.720250 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.mesh_axis_names : NoneType
I0602 22:27:31.720278 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.name : NoneType
I0602 22:27:31.720307 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.noise_shape : NoneType
I0602 22:27:31.720336 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.noise_shape_broadcast_dims : NoneType
I0602 22:27:31.720365 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.720393 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.params_init.method : 'xavier'
I0602 22:27:31.720422 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.params_init.scale : 1.000001
I0602 22:27:31.720451 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.720480 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.720509 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.transpose_qk : False
I0602 22:27:31.720537 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.720566 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_prob : 0.0
I0602 22:27:31.720595 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.720623 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout
I0602 22:27:31.720652 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.720681 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.720710 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.dropout_at_eval : False
I0602 22:27:31.720739 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.720767 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.fprop_dtype : NoneType
I0602 22:27:31.720796 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.720825 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.keep_prob : 1.0
I0602 22:27:31.720854 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.mesh_axis_names : NoneType
I0602 22:27:31.720882 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.name : NoneType
I0602 22:27:31.720911 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.noise_shape : NoneType
I0602 22:27:31.720940 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType
I0602 22:27:31.720968 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.720997 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.params_init.method : 'xavier'
I0602 22:27:31.721026 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.params_init.scale : 1.000001
I0602 22:27:31.721055 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.721084 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.721122 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.transpose_qk : False
I0602 22:27:31.721153 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.721183 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_droppath_prob : 0.0
I0602 22:27:31.721212 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_weight : 1.0
I0602 22:27:31.721240 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.second_expert_policy : 'all'
I0602 22:27:31.721269 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.721297 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.721326 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.unadjusted_expert_capacity_factor : 2.0
I0602 22:27:31.721354 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.use_gated_activation : False
I0602 22:27:31.721383 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.ehm : NoneType
I0602 22:27:31.721411 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.emh : NoneType
I0602 22:27:31.721440 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.me : NoneType
I0602 22:27:31.721469 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.721497 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.name : NoneType
I0602 22:27:31.721526 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.ngrammer_tpls : NoneType
I0602 22:27:31.721555 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_experts : 0
I0602 22:27:31.721583 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_groups : 1
I0602 22:27:31.721612 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_heads : 32
I0602 22:27:31.721641 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_layers : 1
I0602 22:27:31.721670 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.packed_input : False
I0602 22:27:31.721699 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.721728 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.params_init.method : 'xavier'
I0602 22:27:31.721757 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.params_init.scale : 1.000001
I0602 22:27:31.721785 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.relu_dropout_prob : NoneType
I0602 22:27:31.721814 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.residual_dropout_prob : NoneType
I0602 22:27:31.721843 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.residual_droppath_prob : 0.0
I0602 22:27:31.721871 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.shared_weight_layer_id : NoneType
I0602 22:27:31.721899 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.skip_lp_regularization : NoneType
I0602 22:27:31.721928 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.721957 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.allow_skip_cross_attention : False
I0602 22:27:31.721986 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.atten_dropout_prob : 0.0
I0602 22:27:31.722014 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.cls : type/praxis.layers.transformers/Transformer
I0602 22:27:31.722043 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.722072 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.cross_atten_tpl : NoneType
I0602 22:27:31.722101 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.722130 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dim_per_head : NoneType
I0602 22:27:31.722159 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.722187 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.cls : type/praxis.layers.stochastics/Dropout
I0602 22:27:31.722218 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.722252 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.722280 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.dropout_at_eval : False
I0602 22:27:31.722309 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.722339 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.fprop_dtype : NoneType
I0602 22:27:31.722368 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.722396 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.keep_prob : 1.0
I0602 22:27:31.722425 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.mesh_axis_names : NoneType
I0602 22:27:31.722454 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.name : NoneType
I0602 22:27:31.722483 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.noise_shape : NoneType
I0602 22:27:31.722512 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.noise_shape_broadcast_dims : NoneType
I0602 22:27:31.722542 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.722571 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.params_init.method : 'xavier'
I0602 22:27:31.722600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.params_init.scale : 1.000001
I0602 22:27:31.722629 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.722658 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.722687 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.transpose_qk : False
I0602 22:27:31.722716 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.722745 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.722774 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.fprop_dtype : NoneType
I0602 22:27:31.722803 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.hidden_dims : 0
I0602 22:27:31.722832 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.722865 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.input_dims : 0
I0602 22:27:31.722895 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.722926 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm
I0602 22:27:31.722954 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.722983 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.723013 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.dim : 0
I0602 22:27:31.723042 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.723070 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.epsilon : 1e-06
I0602 22:27:31.723100 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.fprop_dtype : NoneType
I0602 22:27:31.723128 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.723157 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.mesh_axis_names : NoneType
I0602 22:27:31.723186 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.name : NoneType
I0602 22:27:31.723214 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.723243 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.params_init.method : 'xavier'
I0602 22:27:31.723273 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.params_init.scale : 1.000001
I0602 22:27:31.723302 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.reductions_in_fp32 : False
I0602 22:27:31.723331 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.723360 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.723389 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.use_bias : True
I0602 22:27:31.723417 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.use_scale : True
I0602 22:27:31.723446 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.723474 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.mesh_axis_names : NoneType
I0602 22:27:31.723503 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.name : NoneType
I0602 22:27:31.723531 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ngrammer_tpl : NoneType
I0602 22:27:31.723560 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.norm_policy : 'pre'
I0602 22:27:31.723588 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.num_heads : NoneType
I0602 22:27:31.723617 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.packed_input : False
I0602 22:27:31.723646 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.723675 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.params_init.method : 'xavier'
I0602 22:27:31.723705 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.params_init.scale : 1.000001
I0602 22:27:31.723733 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.relu_dropout_prob : 0.0
I0602 22:27:31.723762 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.residual_dropout_prob : 0.0
I0602 22:27:31.723791 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.residual_droppath_prob : 0.0
I0602 22:27:31.723820 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.723849 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.723878 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.activation_split_dims_mapping.bld : NoneType
I0602 22:27:31.723906 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.activation_split_dims_mapping.blnh : NoneType
I0602 22:27:31.723936 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.723965 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.atten_dropout_prob : 0.0
I0602 22:27:31.723994 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.atten_logit_cap : 50.0
I0602 22:27:31.724023 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.attention_extra_logit : NoneType
I0602 22:27:31.724051 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.attention_mask_summary : False
I0602 22:27:31.724080 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.cast_rotary_position_emb : True
I0602 22:27:31.724108 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.cls : type/praxis.layers.attentions/DotProductAttention
I0602 22:27:31.724137 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combine_qkv : True
I0602 22:27:31.724166 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.724195 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.attention_combine_dims : False
I0602 22:27:31.724224 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.cls : type/praxis.layers.attentions/CombinedQKVProjectionLayer
I0602 22:27:31.724254 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.724282 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.724311 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.dim_per_head : 0
I0602 22:27:31.724340 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.724368 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.724398 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.724426 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.724455 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.724484 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.724513 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.724541 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.724570 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.724600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.name : NoneType
I0602 22:27:31.724628 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.724657 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.724686 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.724714 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.724743 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.724772 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.724801 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.explicit_fan_in_fan_out_axes : False
I0602 22:27:31.724830 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.fprop_dtype : NoneType
I0602 22:27:31.724858 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.724887 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.input_dim : 0
I0602 22:27:31.724916 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.mesh_axis_names : NoneType
I0602 22:27:31.724945 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.name : NoneType
I0602 22:27:31.724974 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.num_heads : 0
I0602 22:27:31.725003 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.725032 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.params_init.method : 'xavier'
I0602 22:27:31.725060 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.params_init.scale : 1.000001
I0602 22:27:31.725089 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.725125 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.725154 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.use_bias : True
I0602 22:27:31.725186 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.725216 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.725245 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.725274 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dconv_kernel_size : 3
I0602 22:27:31.725303 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dconv_qkv : False
I0602 22:27:31.725332 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.decode_cache : True
I0602 22:27:31.725361 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dim_per_head : NoneType
I0602 22:27:31.725390 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.725418 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.cls : type/praxis.layers.stochastics/Dropout
I0602 22:27:31.725447 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.725476 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.725505 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.dropout_at_eval : False
I0602 22:27:31.725535 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.725564 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.fprop_dtype : NoneType
I0602 22:27:31.725593 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.725622 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.keep_prob : 1.0
I0602 22:27:31.725651 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.mesh_axis_names : NoneType
I0602 22:27:31.725679 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.name : NoneType
I0602 22:27:31.725708 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.noise_shape : NoneType
I0602 22:27:31.725736 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.noise_shape_broadcast_dims : NoneType
I0602 22:27:31.725765 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.725794 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.params_init.method : 'xavier'
I0602 22:27:31.725823 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.params_init.scale : 1.000001
I0602 22:27:31.725851 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.725880 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.725909 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.transpose_qk : False
I0602 22:27:31.725938 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.725967 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.725996 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.fprop_dtype : NoneType
I0602 22:27:31.726024 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.hidden_dim : 0
I0602 22:27:31.726053 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.726082 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.input_dim : 0
I0602 22:27:31.726110 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.internal_enable_per_dim_scale : True
I0602 22:27:31.726139 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.internal_enable_query_scale : True
I0602 22:27:31.726168 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.internal_gshard_gaussian_init : False
I0602 22:27:31.726196 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.mesh_axis_names : NoneType
I0602 22:27:31.726225 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.name : NoneType
I0602 22:27:31.726253 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.ngrammer_tpl : NoneType
I0602 22:27:31.726282 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.num_heads : 1
I0602 22:27:31.726311 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.output_proj_use_nhd_shape : False
I0602 22:27:31.726339 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.726368 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.params_init.method : 'xavier'
I0602 22:27:31.726397 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.params_init.scale : 1.000001
I0602 22:27:31.726425 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.726454 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.attention_combine_dims : False
I0602 22:27:31.726484 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.cls : type/praxis.layers.attentions/AttentionProjection
I0602 22:27:31.726512 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.726541 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.726570 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.dim_per_head : 0
I0602 22:27:31.726599 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.726628 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.726658 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.726686 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.726715 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.726748 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.726781 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.726811 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.726840 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.726871 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.name : NoneType
I0602 22:27:31.726902 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.726932 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.726960 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.726989 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.727019 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.727047 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.727076 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.explicit_fan_in_fan_out_axes : False
I0602 22:27:31.727105 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.fprop_dtype : NoneType
I0602 22:27:31.727134 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.727163 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.input_dim : 0
I0602 22:27:31.727192 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.is_output_projection : False
I0602 22:27:31.727221 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.mesh_axis_names : NoneType
I0602 22:27:31.727249 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.name : NoneType
I0602 22:27:31.727277 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.num_heads : 0
I0602 22:27:31.727306 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.727335 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.params_init.method : 'xavier'
I0602 22:27:31.727364 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.params_init.scale : 1.000001
I0602 22:27:31.727393 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.727422 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.727451 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.use_bias : True
I0602 22:27:31.727480 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.use_nhd_shape : False
I0602 22:27:31.727509 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.727538 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.727567 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.727596 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.727624 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.727653 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.727681 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.727710 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.727739 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.727768 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.name : NoneType
I0602 22:27:31.727797 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.727825 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.727853 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.727881 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.727910 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.727939 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.727968 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.727996 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.728025 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.728054 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.728082 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.728111 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.728141 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.728170 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.728199 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.name : NoneType
I0602 22:27:31.728228 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.728257 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.728286 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.728314 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.728344 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.728373 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.728401 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.relative_bias_tpl : NoneType
I0602 22:27:31.728430 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.728459 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.cast_as_fprop_dtype : True
I0602 22:27:31.728488 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.cls : type/praxis.layers.embedding_softmax/RotaryPositionalEmbedding
I0602 22:27:31.728517 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.728546 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.728575 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.728603 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.embedding_dims : 0
I0602 22:27:31.728632 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.fprop_dtype : NoneType
I0602 22:27:31.728661 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.728690 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.max_timescale : 10000
I0602 22:27:31.728719 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.mesh_axis_names : NoneType
I0602 22:27:31.728748 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.min_timescale : 1
I0602 22:27:31.728777 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.name : NoneType
I0602 22:27:31.728806 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.728835 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.params_init.method : 'xavier'
I0602 22:27:31.728864 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.params_init.scale : 1.000001
I0602 22:27:31.728893 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.728921 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.728950 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.728979 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.scale_logits_by_head_dims : False
I0602 22:27:31.729009 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.729038 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.729066 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.use_bias : False
I0602 22:27:31.729095 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.use_rotary_position_emb : False
I0602 22:27:31.729138 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.weight_split_dims_mapping.dconv : NoneType
I0602 22:27:31.729169 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.weight_split_dims_mapping.proj : NoneType
I0602 22:27:31.729198 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.729227 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.zero_fully_masked : False
I0602 22:27:31.729256 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_split_dims_mapping.ffn0 : NoneType
I0602 22:27:31.729285 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_split_dims_mapping.ffn1 : NoneType
I0602 22:27:31.729314 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.729343 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.729372 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.approximate : True
I0602 22:27:31.729401 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.cls : type/praxis.layers.activations/GELU
I0602 22:27:31.729430 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.729460 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.729488 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.729518 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.fprop_dtype : NoneType
I0602 22:27:31.729548 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.729576 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.mesh_axis_names : NoneType
I0602 22:27:31.729605 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.name : NoneType
I0602 22:27:31.729633 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.729662 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.params_init.method : 'xavier'
I0602 22:27:31.729691 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.params_init.scale : 1.000001
I0602 22:27:31.729720 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.729749 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.729778 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.729807 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.add_skip_connection : True
I0602 22:27:31.729836 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.apply_padding_first : False
I0602 22:27:31.729865 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.cls : type/praxis.layers.transformers/TransformerFeedForward
I0602 22:27:31.729897 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.729926 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.729955 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.729984 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.730012 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.730040 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.cls : type/praxis.layers.activations/ReLU
I0602 22:27:31.730069 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.730098 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.730126 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.730155 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.fprop_dtype : NoneType
I0602 22:27:31.730184 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.730213 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.mesh_axis_names : NoneType
I0602 22:27:31.730242 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.name : NoneType
I0602 22:27:31.730271 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.730300 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.params_init.method : 'xavier'
I0602 22:27:31.730328 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.params_init.scale : 1.000001
I0602 22:27:31.730357 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.730386 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.730414 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.730443 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.bias_init : 0.0
I0602 22:27:31.730472 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.cls : type/praxis.layers.linears/FeedForward
I0602 22:27:31.730500 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.730529 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.730558 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.730588 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.fprop_dtype : NoneType
I0602 22:27:31.730617 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.has_bias : True
I0602 22:27:31.730646 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.730674 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.input_dims : 0
I0602 22:27:31.730703 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.730732 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.cls : type/praxis.layers.linears/Linear
I0602 22:27:31.730761 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.730790 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.730818 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.730847 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.730876 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum
I0602 22:27:31.730905 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.730934 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.730963 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.730992 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.fprop_dtype : NoneType
I0602 22:27:31.731020 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.731049 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.mesh_axis_names : NoneType
I0602 22:27:31.731077 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.name : NoneType
I0602 22:27:31.731106 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.731135 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.params_init.method : 'xavier'
I0602 22:27:31.731164 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.params_init.scale : 1.000001
I0602 22:27:31.731193 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.731221 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.731251 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.731279 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.fprop_dtype : NoneType
I0602 22:27:31.731308 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.731337 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.input_dims : 0
I0602 22:27:31.731366 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.mesh_axis_names : NoneType
I0602 22:27:31.731395 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.name : NoneType
I0602 22:27:31.731424 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.output_dims : 0
I0602 22:27:31.731452 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.731481 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.params_init.method : 'xavier'
I0602 22:27:31.731510 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.params_init.scale : 1.000001
I0602 22:27:31.731539 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.731568 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.731596 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.weight_init : NoneType
I0602 22:27:31.731625 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.731653 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.mesh_axis_names : NoneType
I0602 22:27:31.731682 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.name : NoneType
I0602 22:27:31.731712 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.output_dims : 0
I0602 22:27:31.731740 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.731769 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.method : 'xavier'
I0602 22:27:31.731798 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.scale : 1.000001
I0602 22:27:31.731827 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.731856 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.731884 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.weight_init : NoneType
I0602 22:27:31.731913 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.731942 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fprop_dtype : NoneType
I0602 22:27:31.731971 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.has_bias : True
I0602 22:27:31.732000 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.hidden_dims : 0
I0602 22:27:31.732029 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.732057 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.input_dims : 0
I0602 22:27:31.732086 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.internal_gshard_variance_scaling_fan_in_init : False
I0602 22:27:31.732115 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.732144 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm
I0602 22:27:31.732172 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.732202 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.732230 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.dim : 0
I0602 22:27:31.732259 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.732288 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.epsilon : 1e-06
I0602 22:27:31.732317 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.fprop_dtype : NoneType
I0602 22:27:31.732346 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.732375 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.mesh_axis_names : NoneType
I0602 22:27:31.732403 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.name : NoneType
I0602 22:27:31.732432 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.732461 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.params_init.method : 'xavier'
I0602 22:27:31.732490 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.params_init.scale : 1.000001
I0602 22:27:31.732519 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.reductions_in_fp32 : False
I0602 22:27:31.732547 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.732576 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.732605 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.use_bias : True
I0602 22:27:31.732634 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.use_scale : True
I0602 22:27:31.732663 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.732692 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.mesh_axis_names : NoneType
I0602 22:27:31.732721 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.name : NoneType
I0602 22:27:31.732751 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.norm_policy : 'pre'
I0602 22:27:31.732779 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.output_dims : 0
I0602 22:27:31.732809 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.732837 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.params_init.method : 'xavier'
I0602 22:27:31.732866 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.params_init.scale : 1.000001
I0602 22:27:31.732895 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_prob : 0.0
I0602 22:27:31.732924 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.732953 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout
I0602 22:27:31.732981 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.733010 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.733039 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.dropout_at_eval : False
I0602 22:27:31.733068 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.733098 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.fprop_dtype : NoneType
I0602 22:27:31.733132 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.733162 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.keep_prob : 1.0
I0602 22:27:31.733191 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.mesh_axis_names : NoneType
I0602 22:27:31.733219 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.name : NoneType
I0602 22:27:31.733248 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.noise_shape : NoneType
I0602 22:27:31.733276 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.noise_shape_broadcast_dims : NoneType
I0602 22:27:31.733305 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.733334 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.params_init.method : 'xavier'
I0602 22:27:31.733362 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.params_init.scale : 1.000001
I0602 22:27:31.733391 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.733420 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.733449 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.transpose_qk : False
I0602 22:27:31.733477 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.733506 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_prob : 0.0
I0602 22:27:31.733535 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.activation_split_dims_mapping.out : NoneType
I0602 22:27:31.733563 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout
I0602 22:27:31.733592 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.733620 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.733649 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.dropout_at_eval : False
I0602 22:27:31.733679 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.733708 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.fprop_dtype : NoneType
I0602 22:27:31.733737 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.733766 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.keep_prob : 1.0
I0602 22:27:31.733795 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.mesh_axis_names : NoneType
I0602 22:27:31.733824 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.name : NoneType
I0602 22:27:31.733852 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape : NoneType
I0602 22:27:31.733881 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType
I0602 22:27:31.733911 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.733939 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.method : 'xavier'
I0602 22:27:31.733968 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.scale : 1.000001
I0602 22:27:31.733997 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.734025 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.734054 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.transpose_qk : False
I0602 22:27:31.734083 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.734111 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_droppath_prob : 0.0
I0602 22:27:31.734140 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_weight : 1.0
I0602 22:27:31.734169 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.734198 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.734226 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.use_gated_activation : False
I0602 22:27:31.734255 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.weight_split_dims_mapping.ffn0 : NoneType
I0602 22:27:31.734284 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.weight_split_dims_mapping.ffn1 : NoneType
I0602 22:27:31.734313 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.734342 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.use_cross_attention : False
I0602 22:27:31.734370 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.734399 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.unadjusted_expert_capacity_factor : 2.0
I0602 22:27:31.734427 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.use_cross_attention : False
I0602 22:27:31.734457 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.734486 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.checkpoint_policy : 'save_nothing'
I0602 22:27:31.734514 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.cls : type/praxis.layers.transformers/StackedTransformerRepeated
I0602 22:27:31.734543 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.contiguous_submeshes : NoneType
I0602 22:27:31.734571 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.dcn_mesh_shape : NoneType
I0602 22:27:31.734600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.dtype : type/jax.numpy/float32
I0602 22:27:31.734629 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.fprop_dtype : NoneType
I0602 22:27:31.734658 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.ici_mesh_shape : NoneType
I0602 22:27:31.734687 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.mesh_axis_names : NoneType
I0602 22:27:31.734716 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.name : NoneType
I0602 22:27:31.734745 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.nd_prefix_shape : NoneType
I0602 22:27:31.734774 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.params_init.cls : type/praxis.base_layer/WeightInit
I0602 22:27:31.734802 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.params_init.method : 'xavier'
I0602 22:27:31.734831 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.params_init.scale : 1.000001
I0602 22:27:31.734859 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.repeat_layer_name : 'repeat'
I0602 22:27:31.734888 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.repeat_optimizer_dims_mapping : NoneType
I0602 22:27:31.734916 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.shared_weight_layer_id : NoneType
I0602 22:27:31.734945 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.skip_lp_regularization : NoneType
I0602 22:27:31.734973 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.sublayer_name : 'sub'
I0602 22:27:31.735002 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.unroll_in_decode : True
I0602 22:27:31.735031 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.weight_split_dims_mapping.block : NoneType
I0602 22:27:31.735060 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.735089 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.x_times : 24
I0602 22:27:31.735118 140467571585024 train.py:171] model.lm_tpl.vocab_size : 51200
I0602 22:27:31.735147 140467571585024 train.py:171] model.lm_tpl.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.735176 140467571585024 train.py:171] model.mesh_axis_names : NoneType
I0602 22:27:31.735205 140467571585024 train.py:171] model.model_type : 'causal'
I0602 22:27:31.735234 140467571585024 train.py:171] model.name : 'xformer_lm'
I0602 22:27:31.735263 140467571585024 train.py:171] model.params_init.method : 'gaussian'
I0602 22:27:31.735292 140467571585024 train.py:171] model.params_init.scale : 0.023
I0602 22:27:31.735321 140467571585024 train.py:171] model.report_strict_acc : False
I0602 22:27:31.735351 140467571585024 train.py:171] model.return_predictions : False
I0602 22:27:31.735379 140467571585024 train.py:171] model.shared_weight_layer_id : NoneType
I0602 22:27:31.735408 140467571585024 train.py:171] model.skip_lp_regularization : NoneType
I0602 22:27:31.735437 140467571585024 train.py:171] model.weight_split_dims_mapping.wt : NoneType
I0602 22:27:31.735466 140467571585024 train.py:171] name : 'xformer_task'
I0602 22:27:31.735495 140467571585024 train.py:171] summary_verbosity : 3
I0602 22:27:31.735524 140467571585024 train.py:171] train.always_use_train_for_model_init : True
I0602 22:27:31.735553 140467571585024 train.py:171] train.apply_mutable_list : ['aux_loss', 'summaries', 'non_trainable', 'batch_stats', 'params_axes']
I0602 22:27:31.735582 140467571585024 train.py:171] train.async_summary_writing : True
I0602 22:27:31.735610 140467571585024 train.py:171] train.cls : type/paxml.tasks_lib/SingleTask.Train
I0602 22:27:31.735640 140467571585024 train.py:171] train.decode_interval_steps : NoneType
I0602 22:27:31.735669 140467571585024 train.py:171] train.decode_start_after_n_steps : 0
I0602 22:27:31.735697 140467571585024 train.py:171] train.decode_use_ema_states : False
I0602 22:27:31.735726 140467571585024 train.py:171] train.device_sync_interval_steps : NoneType
I0602 22:27:31.735755 140467571585024 train.py:171] train.enable_input_checkpointing : False
I0602 22:27:31.735785 140467571585024 train.py:171] train.enforce_input_specs : False
I0602 22:27:31.735813 140467571585024 train.py:171] train.eval_interval_steps : 100
I0602 22:27:31.735843 140467571585024 train.py:171] train.eval_skip_train : False
I0602 22:27:31.735871 140467571585024 train.py:171] train.eval_use_ema_states : False
I0602 22:27:31.735900 140467571585024 train.py:171] train.external_checkpoint_handler : NoneType
I0602 22:27:31.735929 140467571585024 train.py:171] train.external_checkpoint_path : NoneType
I0602 22:27:31.735958 140467571585024 train.py:171] train.inputs_split_mapping : NoneType
I0602 22:27:31.735987 140467571585024 train.py:171] train.learner.check_valid_step : True
I0602 22:27:31.736016 140467571585024 train.py:171] train.learner.cls : type/paxml.learners/Learner
I0602 22:27:31.736045 140467571585024 train.py:171] train.learner.enable_skip_step_on_gradient_anomalies : True
I0602 22:27:31.736074 140467571585024 train.py:171] train.learner.force_repeat_prefix_structure : False
I0602 22:27:31.736103 140467571585024 train.py:171] train.learner.grad_norm_individual_vars : False
I0602 22:27:31.736132 140467571585024 train.py:171] train.learner.grad_norm_summary : True
I0602 22:27:31.736161 140467571585024 train.py:171] train.learner.keep_optimizer_state_for_excluded_vars : False
I0602 22:27:31.736190 140467571585024 train.py:171] train.learner.loss_name : 'total_loss'
I0602 22:27:31.736219 140467571585024 train.py:171] train.learner.name : ''
I0602 22:27:31.736248 140467571585024 train.py:171] train.learner.optimizer.beta1 : 0.9
I0602 22:27:31.736277 140467571585024 train.py:171] train.learner.optimizer.beta2 : 0.95
I0602 22:27:31.736306 140467571585024 train.py:171] train.learner.optimizer.clip_gradient_norm_to_value : 1.0
I0602 22:27:31.736335 140467571585024 train.py:171] train.learner.optimizer.clip_gradient_single_norm_to_value : 0.0
I0602 22:27:31.736364 140467571585024 train.py:171] train.learner.optimizer.clip_threshold : 1.0
I0602 22:27:31.736393 140467571585024 train.py:171] train.learner.optimizer.cls : type/praxis.optimizers/Adam
I0602 22:27:31.736422 140467571585024 train.py:171] train.learner.optimizer.decoupled_weight_decay : NoneType
I0602 22:27:31.736451 140467571585024 train.py:171] train.learner.optimizer.ema_decay : 0.0
I0602 22:27:31.736480 140467571585024 train.py:171] train.learner.optimizer.epsilon : 1e-08
I0602 22:27:31.736509 140467571585024 train.py:171] train.learner.optimizer.epsilon_root : 0.0
I0602 22:27:31.736538 140467571585024 train.py:171] train.learner.optimizer.ewc_regularizer_weight : 0.0
I0602 22:27:31.736567 140467571585024 train.py:171] train.learner.optimizer.ewc_weight_per_var : NoneType
I0602 22:27:31.736596 140467571585024 train.py:171] train.learner.optimizer.l1_regularizer_weight : NoneType
I0602 22:27:31.736625 140467571585024 train.py:171] train.learner.optimizer.l2_regularizer_weight : NoneType
I0602 22:27:31.736653 140467571585024 train.py:171] train.learner.optimizer.learning_rate : 0.0006
I0602 22:27:31.736682 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.cls : type/praxis.schedules/LinearRampupCosineDecay
I0602 22:27:31.736711 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.decay_end : 500000
I0602 22:27:31.736740 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.decay_start : 1
I0602 22:27:31.736769 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.max : 1.0
I0602 22:27:31.736798 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.min_ratio : 0.1
I0602 22:27:31.736827 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.name : ''
I0602 22:27:31.736856 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.warmup_steps : 0
I0602 22:27:31.736885 140467571585024 train.py:171] train.learner.optimizer.maybe_inf_to_nan : True
I0602 22:27:31.736913 140467571585024 train.py:171] train.learner.optimizer.name : ''
I0602 22:27:31.736943 140467571585024 train.py:171] train.learner.optimizer.sharded_adam : True
I0602 22:27:31.736972 140467571585024 train.py:171] train.learner.optimizer.skip_lp_1d_vectors : False
I0602 22:27:31.737001 140467571585024 train.py:171] train.learner.optimizer.weight_decay : 0.001
I0602 22:27:31.737030 140467571585024 train.py:171] train.learner.repeat_prefix_sep : '#'
I0602 22:27:31.737059 140467571585024 train.py:171] train.learner.skip_step_gradient_norm_value : 0.0
I0602 22:27:31.737088 140467571585024 train.py:171] train.learner.skip_zero_gradients : NoneType
I0602 22:27:31.737123 140467571585024 train.py:171] train.learner.stochastic_gradient : NoneType
I0602 22:27:31.737153 140467571585024 train.py:171] train.learner.var_norm_summary : True
I0602 22:27:31.737183 140467571585024 train.py:171] train.learner.vectorize_on_repeat_prefix : True
I0602 22:27:31.737212 140467571585024 train.py:171] train.log_train_output_interval_steps : NoneType
I0602 22:27:31.737240 140467571585024 train.py:171] train.max_inflight_steps : 2
I0602 22:27:31.737269 140467571585024 train.py:171] train.num_train_steps : 10000000.0
I0602 22:27:31.737298 140467571585024 train.py:171] train.profiler_capture_step : NoneType
I0602 22:27:31.737327 140467571585024 train.py:171] train.profiler_max_num_hosts : NoneType
I0602 22:27:31.737356 140467571585024 train.py:171] train.profiler_min_duration_sec : 1
I0602 22:27:31.737385 140467571585024 train.py:171] train.profiler_num_steps : 2
I0602 22:27:31.737413 140467571585024 train.py:171] train.random_seed : 1234
I0602 22:27:31.737442 140467571585024 train.py:171] train.restore_transformations : NoneType
I0602 22:27:31.737471 140467571585024 train.py:171] train.save_interval_steps : 100000
I0602 22:27:31.737500 140467571585024 train.py:171] train.save_keep_interval_duration : '12h'
I0602 22:27:31.737528 140467571585024 train.py:171] train.save_max_to_keep : 10
I0602 22:27:31.737558 140467571585024 train.py:171] train.summary_accumulate_interval_steps : NoneType
I0602 22:27:31.737586 140467571585024 train.py:171] train.summary_interval_steps : 100
I0602 22:27:31.737615 140467571585024 train.py:171] train.tensorstore_metadata_key : NoneType
I0602 22:27:31.737645 140467571585024 train.py:171] train.variable_norm_summary : True
I0602 22:27:31.737679 140467571585024 train.py:171] vn.cls : type/paxml.tasks_lib/SingleTask.VariationalNoise
I0602 22:27:31.737711 140467571585024 train.py:171] vn.vn_regex : ''
I0602 22:27:31.737752 140467571585024 train.py:171] vn.vn_scale : 0.0
I0602 22:27:31.737785 140467571585024 train.py:171] vn.vn_start_step : 0
I0602 22:27:31.737824 140467571585024 train.py:173] [PAX STATUS]: Initializing decoder
I0602 22:27:31.737904 140467571585024 checkpoint_creators.py:564] [PAX STATUS]: Creating checkpointer.
I0602 22:27:31.737980 140467571585024 py_utils.py:338] Starting sync_global_devices checkpointer:makedirs:log_NVIDIA1_3BPmap/checkpoints across 1 devices globally
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e18934820 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:0}, signal={0x556e18933900:1} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:1}, signal={0x556e18933900:2} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1894d0d0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1894d120 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1894d0d0, semaphore=0x556e18933900, value=2 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1894d120, semaphore=0x556e18933900, value=2 (OK)
W0602 22:27:31.808284 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0005655288696289062 sec
W0602 22:27:31.808686 140467571585024 dispatch.py:272] Finished tracing + transforming _psum for pjit in 0.0013377666473388672 sec
W0602 22:27:31.809392 140467571585024 pxla.py:1882] Compiling _psum for with global shapes and types [ShapedArray(uint32[1])]. Argument mapping: (GSPMDSharding({maximal device=0}),).
W0602 22:27:31.811983 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_psum) in 0.0024673938751220703 sec
W0602 22:27:31.880504 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_psum) in 0.06763148307800293 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1988d550 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1988d550, semaphore=0x556e189338c0, value=0 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=1, fence=0x556e191b40c0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1988d550, from_fence=0x556e1894d0d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1894d120, semaphore=0x556e189338c0, value=1 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e1988d550 {0x556e189338c0:0, 0x556e18933900:2}, signal_fence=0x556e191b40c0 {0x556e189338c0:1} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18aebc50 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16eed850 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18aebc50, semaphore=0x556e189338c0, value=1 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e16eed850, semaphore=0x556e189338c0, value=1 (OK)
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebfa38 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:1}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1894cf50, wait={0x556e18933900:2, 0x556e189338c0:1}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1894cf50, wait={0x556e189338c0:1}, signal={} (OK)
I0602 22:27:31.882042 140467571585024 py_utils.py:341] Finished sync_global_devices checkpointer:makedirs:log_NVIDIA1_3BPmap/checkpoints across 1 devices globally
I0602 22:27:31.882626 140467571585024 checkpointer.py:96] Restoring item from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata.
I0602 22:27:31.882973 140467571585024 checkpointer.py:98] Finished restoring checkpoint from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata.
I0602 22:27:31.883031 140467571585024 checkpoint_managers.py:197] Found existing checkpoint with version: 1.1, step: 0
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19401cb0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:2}, signal={0x556e18933900:3} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:3}, signal={0x556e18933900:4} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18aede80 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e112a1af0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18aede80, semaphore=0x556e18933900, value=4 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e112a1af0, semaphore=0x556e18933900, value=4 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a43f70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a43f70, semaphore=0x556e189338c0, value=1 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=2, fence=0x556e18aebf60 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18a43f70, from_fence=0x556e18aede80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e112a1af0, semaphore=0x556e189338c0, value=2 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e18a43f70 {0x556e189338c0:1, 0x556e18933900:4}, signal_fence=0x556e18aebf60 {0x556e189338c0:2} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16eed850 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e170685a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e16eed850, semaphore=0x556e189338c0, value=2 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e189338c0, value=2 (OK)
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf318 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:2}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a48190, wait={0x556e18933900:4, 0x556e189338c0:2}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a48190, wait={0x556e189338c0:2}, signal={} (OK)
I0602 22:27:31.885030 140467571585024 utils.py:366] Cleaning up existing temporary directories at log_NVIDIA1_3BPmap/checkpoints.
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e18a65f90 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:4}, signal={0x556e18933900:5} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:5}, signal={0x556e18933900:6} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e170685a0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16ef1c50 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e18933900, value=6 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e16ef1c50, semaphore=0x556e18933900, value=6 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a43f70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a43f70, semaphore=0x556e189338c0, value=2 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=3, fence=0x556e18a48a90 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18a43f70, from_fence=0x556e170685a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e16ef1c50, semaphore=0x556e189338c0, value=3 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e18a43f70 {0x556e189338c0:2, 0x556e18933900:6}, signal_fence=0x556e18a48a90 {0x556e189338c0:3} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e112a3a60 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e110f47f0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e112a3a60, semaphore=0x556e189338c0, value=3 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e110f47f0, semaphore=0x556e189338c0, value=3 (OK)
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebef78 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:3}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a45050, wait={0x556e18933900:6, 0x556e189338c0:3}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a45050, wait={0x556e189338c0:3}, signal={} (OK)
I0602 22:27:31.887058 140467571585024 train.py:206] [PAX STATUS]: Creating task
I0602 22:27:32.067810 140467571585024 train.py:217] [PAX STATUS]: Initializing partitioner
I0602 22:27:32.067943 140467571585024 partitioning.py:576] Using pmap for data parallelism.
I0602 22:27:32.067994 140467571585024 train.py:245] [PAX STATUS]: Creating executor.
I0602 22:27:32.068036 140467571585024 train.py:249] [PAX STATUS]: Setting up executor.
W0602 22:27:32.071034 140467571585024 dispatch.py:272] Finished tracing + transforming jit(convert_element_type) in 0.00023603439331054688 sec
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e16f726e0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:6}, signal={0x556e18933900:7} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:7}, signal={0x556e18933900:8} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16eed850 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e170685a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e16eed850, semaphore=0x556e18933900, value=8 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e18933900, value=8 (OK)
W0602 22:27:32.072897 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003733634948730469 sec
W0602 22:27:32.073698 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_seed for pjit in 0.0016872882843017578 sec
W0602 22:27:32.074185 140467571585024 pxla.py:1882] Compiling _threefry_seed for with global shapes and types [ShapedArray(int32[])]. Argument mapping: (GSPMDSharding({replicated}),).
W0602 22:27:32.076742 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_seed) in 0.0024423599243164062 sec
W0602 22:27:32.169981 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_seed) in 0.09298300743103027 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1860c810 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1860c810, semaphore=0x556e189338c0, value=3 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=4, fence=0x556e199dd790 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1860c810, from_fence=0x556e16eed850 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e189338c0, value=4 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e19946ad0, f=0, wait_fence=0x556e1860c810 {0x556e189338c0:3, 0x556e18933900:8}, signal_fence=0x556e199dd790 {0x556e189338c0:4} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19869950 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190fd2e0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19869950, semaphore=0x556e189338c0, value=4 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e190fd2e0, semaphore=0x556e189338c0, value=4 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e189dabc0, wait={0x556e18933900:8, 0x556e189338c0:4}, signal={} (OK)
I0602 22:27:32.171376 140467571585024 partitioning.py:420] input_p.tf_data_service_address: None
I0602 22:27:32.171604 140467571585024 executors.py:163] [PAX STATUS]: Instantiating train input pipeline.
I0602 22:27:32.174570 140467571585024 executors.py:222] [PAX STATUS]: Setting up partitioner
I0602 22:27:32.174625 140467571585024 partitioning.py:353] [PAX STATUS]: Getting input shapes from first batch.
I0602 22:27:32.778908 140467571585024 local.py:50] Created artifact Input specs of type ArtifactType.FILE and value log_NVIDIA1_3BPmap/input_specs.json.
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1907d0e0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:8}, signal={0x556e18933900:9} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:9}, signal={0x556e18933900:10} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e198cfb80 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b0cc70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e198cfb80, semaphore=0x556e18933900, value=10 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b0cc70, semaphore=0x556e18933900, value=10 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19103da0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19103da0, semaphore=0x556e189338c0, value=4 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=5, fence=0x556e18b62000 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e19103da0, from_fence=0x556e198cfb80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b0cc70, semaphore=0x556e189338c0, value=5 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e19946ad0, f=0, wait_fence=0x556e19103da0 {0x556e189338c0:4, 0x556e18933900:10}, signal_fence=0x556e18b62000 {0x556e189338c0:5} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19882980 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18eaa130 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19882980, semaphore=0x556e189338c0, value=5 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18eaa130, semaphore=0x556e189338c0, value=5 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18b691b0, wait={0x556e18933900:10, 0x556e189338c0:5}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1907d0e0, wait={0x556e189338c0:5}, signal={} (OK)
W0602 22:27:33.229183 140467571585024 optimizers.py:1170] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update).
I0602 22:27:33.229286 140467571585024 optimizers.py:1173] Using sharded_adam.
W0602 22:27:33.229323 140467571585024 optimizers.py:580] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update).
W0602 22:27:33.241345 140467571585024 optimizers.py:1170] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update).
I0602 22:27:33.241400 140467571585024 optimizers.py:1173] Using sharded_adam.
W0602 22:27:33.241433 140467571585024 optimizers.py:580] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update).
I0602 22:27:33.253473 140467571585024 trainer_lib.py:197] post_init_model_params: log_NVIDIA1_3BPmap/post_init_model_params.txt
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19213e30 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:10}, signal={0x556e18933900:11} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:11}, signal={0x556e18933900:12} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1982ded0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a814b0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1982ded0, semaphore=0x556e18933900, value=12 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a814b0, semaphore=0x556e18933900, value=12 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b8c7a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b8c7a0, semaphore=0x556e189338c0, value=5 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=6, fence=0x556e19156ee0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18b8c7a0, from_fence=0x556e1982ded0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a814b0, semaphore=0x556e189338c0, value=6 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e19946ad0, f=0, wait_fence=0x556e18b8c7a0 {0x556e189338c0:5, 0x556e18933900:12}, signal_fence=0x556e19156ee0 {0x556e189338c0:6} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199a9390 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b769a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199a9390, semaphore=0x556e189338c0, value=6 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b769a0, semaphore=0x556e189338c0, value=6 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18b83130, wait={0x556e18933900:12, 0x556e189338c0:6}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e199b25b0, wait={0x556e189338c0:6}, signal={} (OK)
I0602 22:27:33.415624 140467571585024 checkpointer.py:96] Restoring item from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/state.
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e197c8fd0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:12}, signal={0x556e18933900:13} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:13}, signal={0x556e18933900:14} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b69fa0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1914ed00 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b69fa0, semaphore=0x556e18933900, value=14 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1914ed00, semaphore=0x556e18933900, value=14 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b24760 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b24760, semaphore=0x556e189338c0, value=6 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=7, fence=0x556e19976620 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18b24760, from_fence=0x556e18b69fa0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1914ed00, semaphore=0x556e189338c0, value=7 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e18b24760 {0x556e189338c0:6, 0x556e18933900:14}, signal_fence=0x556e19976620 {0x556e189338c0:7} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199b21a0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b817a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199b21a0, semaphore=0x556e189338c0, value=7 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b817a0, semaphore=0x556e189338c0, value=7 (OK)
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebe7d8 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:7}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e195ce1e0, wait={0x556e18933900:14, 0x556e189338c0:7}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e195ce1e0, wait={0x556e189338c0:7}, signal={} (OK)
I0602 22:27:49.403822 140467571585024 checkpointer.py:98] Finished restoring checkpoint from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/state.
I0602 22:27:49.404018 140467571585024 checkpointer.py:96] Restoring item from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata.
I0602 22:27:49.404202 140467571585024 checkpointer.py:98] Finished restoring checkpoint from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata.
W0602 22:27:49.405730 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.0001819133758544922 sec
W0602 22:27:49.406473 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0014467239379882812 sec
W0602 22:27:49.407426 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.002722501754760742 sec
W0602 22:27:49.407973 140467571585024 pxla.py:1882] Compiling _threefry_split_original for with global shapes and types [ShapedArray(uint32[2])]. Argument mapping: (GSPMDSharding({replicated}),).
W0602 22:27:49.410572 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002830028533935547 sec
W0602 22:27:49.411499 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002779960632324219 sec
W0602 22:27:49.412356 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002703666687011719 sec
W0602 22:27:49.413204 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002875328063964844 sec
W0602 22:27:49.413828 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026679039001464844 sec
W0602 22:27:49.450353 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_split_original) in 0.04226422309875488 sec
W0602 22:27:49.958152 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_split_original) in 0.5075216293334961 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191d5a50 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e191d5a50, semaphore=0x556e189338c0, value=7 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=8, fence=0x556e18a78d00 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e191d5a50, from_fence=0x556e19869950 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e190fd2e0, semaphore=0x556e189338c0, value=8 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e1a3de240, f=0, wait_fence=0x556e191d5a50 {0x556e189338c0:7}, signal_fence=0x556e18a78d00 {0x556e189338c0:8} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a9c820 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18e42a10 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a9c820, semaphore=0x556e189338c0, value=8 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18e42a10, semaphore=0x556e189338c0, value=8 (OK)
W0602 22:27:49.960502 140467571585024 dispatch.py:272] Finished tracing + transforming _unstack for pjit in 0.0006079673767089844 sec
W0602 22:27:49.961032 140467571585024 pxla.py:1882] Compiling _unstack for with global shapes and types [ShapedArray(uint32[2,2])]. Argument mapping: (GSPMDSharding({replicated}),).
W0602 22:27:49.962872 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_unstack) in 0.001695394515991211 sec
W0602 22:27:50.023515 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_unstack) in 0.06038188934326172 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a286b60 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a286b60, semaphore=0x556e189338c0, value=8 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=9, fence=0x556e1994b6a0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a286b60, from_fence=0x556e18a9c820 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18e42a10, semaphore=0x556e189338c0, value=9 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e19ade6f0, f=0, wait_fence=0x556e1a286b60 {0x556e189338c0:8}, signal_fence=0x556e1994b6a0 {0x556e189338c0:9} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e195d2630 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a368cc0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d2630, semaphore=0x556e189338c0, value=9 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a368cc0, semaphore=0x556e189338c0, value=9 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2517e0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19ad8820 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2517e0, semaphore=0x556e189338c0, value=9 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ad8820, semaphore=0x556e189338c0, value=9 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1a440ec0, wait={0x556e189338c0:9}, signal={} (OK)
I0602 22:27:50.024492 140467571585024 partitioning.py:631] train state shapes: TrainState(step=(), mdl_vars={'params': {'lm': {'final_ln': {'bias': (2048,), 'scale': (2048,)}, 'position_emb': {'emb_var': (2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (51200,)}, 'linear': {'w': (2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (24, 8192)}, 'linear': {'w': (24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (24, 2048)}, 'linear': {'w': (24, 8192, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}, 'self_attention': {'combined_qkv': {'w': (24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (24, 64)}, 'post': {'w': (24, 2048, 32, 64)}}}}}}}}}, opt_states=[{'no_prefix': ({'count': ()}, {'count': ()}, {'count': (), 'm': {'params': {'lm': {'final_ln': {'bias': (2048,), 'scale': (2048,)}, 'position_emb': {'emb_var': (2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (51200,)}, 'linear': {'w': (2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': (2048,), 'scale': (2048,)}, 'position_emb': {'emb_var': (2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (51200,)}, 'linear': {'w': (2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}}, {'count': ()}), 'p#24#i-1': ({'count': (24,)}, {'count': (24,)}, {'count': (24,), 'm': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (24, 8192)}, 'linear': {'w': (24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (24, 2048)}, 'linear': {'w': (24, 8192, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}, 'self_attention': {'combined_qkv': {'w': (24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (24, 64)}, 'post': {'w': (24, 2048, 32, 64)}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (24, 8192)}, 'linear': {'w': (24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (24, 2048)}, 'linear': {'w': (24, 8192, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}, 'self_attention': {'combined_qkv': {'w': (24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (24, 64)}, 'post': {'w': (24, 2048, 32, 64)}}}}}}}}}}, {'count': (24,)})}])
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e18b2b390 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:14}, signal={0x556e18933900:15} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:15}, signal={0x556e18933900:16} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191a6280 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a470c0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e191a6280, semaphore=0x556e18933900, value=16 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a470c0, semaphore=0x556e18933900, value=16 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e18b9adf0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:16}, signal={0x556e18933900:17} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:17}, signal={0x556e18933900:18} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1962ee50 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1962ea60 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1962ee50, semaphore=0x556e18933900, value=18 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1962ea60, semaphore=0x556e18933900, value=18 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1996e330 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:18}, signal={0x556e18933900:19} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:19}, signal={0x556e18933900:20} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3f5e20 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a422390 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3f5e20, semaphore=0x556e18933900, value=20 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a422390, semaphore=0x556e18933900, value=20 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=16777216, buffer=0x556e1a3d6580 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=16777216, wait={0x556e18933900:20}, signal={0x556e18933900:21} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:21}, signal={0x556e18933900:22} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e197797a0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3d6610 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e197797a0, semaphore=0x556e18933900, value=22 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3d6610, semaphore=0x556e18933900, value=22 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=204800, buffer=0x556e1a369cf0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=204800, wait={0x556e18933900:22}, signal={0x556e18933900:23} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:23}, signal={0x556e18933900:24} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2a73d0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19ffd400 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2a73d0, semaphore=0x556e18933900, value=24 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ffd400, semaphore=0x556e18933900, value=24 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=419430400, buffer=0x556e19d12af0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=419430400, wait={0x556e18933900:24}, signal={0x556e18933900:25} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:25}, signal={0x556e18933900:26} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a420280 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2afb00 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a420280, semaphore=0x556e18933900, value=26 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2afb00, semaphore=0x556e18933900, value=26 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=786432, buffer=0x556e1a2aef40 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=786432, wait={0x556e18933900:26}, signal={0x556e18933900:27} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:27}, signal={0x556e18933900:28} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a14ab90 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a14abe0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a14ab90, semaphore=0x556e18933900, value=28 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a14abe0, semaphore=0x556e18933900, value=28 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a3c77b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:28}, signal={0x556e18933900:29} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:29}, signal={0x556e18933900:30} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c07d10 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c07d60 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c07d10, semaphore=0x556e18933900, value=30 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c07d60, semaphore=0x556e18933900, value=30 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:30}, signal={0x556e18933900:31} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:31}, signal={0x556e18933900:32} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e193ef470 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19140df0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e193ef470, semaphore=0x556e18933900, value=32 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19140df0, semaphore=0x556e18933900, value=32 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a41f5c0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:32}, signal={0x556e18933900:33} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:33}, signal={0x556e18933900:34} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c04ab0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c04b00 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04ab0, semaphore=0x556e18933900, value=34 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04b00, semaphore=0x556e18933900, value=34 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:34}, signal={0x556e18933900:35} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:35}, signal={0x556e18933900:36} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18baf800 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19aded80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18baf800, semaphore=0x556e18933900, value=36 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aded80, semaphore=0x556e18933900, value=36 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:36}, signal={0x556e18933900:37} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:37}, signal={0x556e18933900:38} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1922bd90 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a41c090 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1922bd90, semaphore=0x556e18933900, value=38 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a41c090, semaphore=0x556e18933900, value=38 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:38}, signal={0x556e18933900:39} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:39}, signal={0x556e18933900:40} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3a2b90 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3a2be0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3a2b90, semaphore=0x556e18933900, value=40 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3a2be0, semaphore=0x556e18933900, value=40 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:40}, signal={0x556e18933900:41} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:41}, signal={0x556e18933900:42} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a419dd0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191e42f0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a419dd0, semaphore=0x556e18933900, value=42 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e191e42f0, semaphore=0x556e18933900, value=42 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1207959552, buffer=0x556e19607b10 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1207959552, wait={0x556e18933900:42}, signal={0x556e18933900:43} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:43}, signal={0x556e18933900:44} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19829890 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e198298e0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19829890, semaphore=0x556e18933900, value=44 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e198298e0, semaphore=0x556e18933900, value=44 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=6144, buffer=0x556e1a3c77b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=6144, wait={0x556e18933900:44}, signal={0x556e18933900:45} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:45}, signal={0x556e18933900:46} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19f4deb0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19f4df00 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f4deb0, semaphore=0x556e18933900, value=46 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f4df00, semaphore=0x556e18933900, value=46 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=402653184, buffer=0x556e1a390de0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=402653184, wait={0x556e18933900:46}, signal={0x556e18933900:47} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:47}, signal={0x556e18933900:48} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19d0a1e0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19d0a230 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d0a1e0, semaphore=0x556e18933900, value=48 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d0a230, semaphore=0x556e18933900, value=48 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19f4df50 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:48}, signal={0x556e18933900:49} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:49}, signal={0x556e18933900:50} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199c4af0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19535260 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c4af0, semaphore=0x556e18933900, value=50 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535260, semaphore=0x556e18933900, value=50 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a419690 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:50}, signal={0x556e18933900:51} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:51}, signal={0x556e18933900:52} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b2be70 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19533540 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b2be70, semaphore=0x556e18933900, value=52 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533540, semaphore=0x556e18933900, value=52 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a419690 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:52}, signal={0x556e18933900:53} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:53}, signal={0x556e18933900:54} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1a1d0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1a220 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1a1d0, semaphore=0x556e18933900, value=54 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1a220, semaphore=0x556e18933900, value=54 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e19962980 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:54}, signal={0x556e18933900:55} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:55}, signal={0x556e18933900:56} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a151ab0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c23f90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a151ab0, semaphore=0x556e18933900, value=56 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c23f90, semaphore=0x556e18933900, value=56 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e19c24390 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:56}, signal={0x556e18933900:57} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:57}, signal={0x556e18933900:58} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b5a2f0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b5a340 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b5a2f0, semaphore=0x556e18933900, value=58 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b5a340, semaphore=0x556e18933900, value=58 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=16777216, buffer=0x556e1a397060 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=16777216, wait={0x556e18933900:58}, signal={0x556e18933900:59} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:59}, signal={0x556e18933900:60} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19533ed0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19533f20 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533ed0, semaphore=0x556e18933900, value=60 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533f20, semaphore=0x556e18933900, value=60 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=204800, buffer=0x556e18b470a0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=204800, wait={0x556e18933900:60}, signal={0x556e18933900:61} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:61}, signal={0x556e18933900:62} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c24420 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b54260 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c24420, semaphore=0x556e18933900, value=62 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b54260, semaphore=0x556e18933900, value=62 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=419430400, buffer=0x556e19c09c60 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=419430400, wait={0x556e18933900:62}, signal={0x556e18933900:63} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:63}, signal={0x556e18933900:64} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19535d90 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19535de0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535d90, semaphore=0x556e18933900, value=64 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535de0, semaphore=0x556e18933900, value=64 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1a396fc0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:64}, signal={0x556e18933900:65} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:65}, signal={0x556e18933900:66} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c09d90 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191ecc60 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c09d90, semaphore=0x556e18933900, value=66 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e191ecc60, semaphore=0x556e18933900, value=66 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e19b76cd0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:66}, signal={0x556e18933900:67} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:67}, signal={0x556e18933900:68} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c05760 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c057b0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c05760, semaphore=0x556e18933900, value=68 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c057b0, semaphore=0x556e18933900, value=68 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=16777216, buffer=0x556e19c05800 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=16777216, wait={0x556e18933900:68}, signal={0x556e18933900:69} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:69}, signal={0x556e18933900:70} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a24d3f0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a24d440 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d3f0, semaphore=0x556e18933900, value=70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d440, semaphore=0x556e18933900, value=70 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=204800, buffer=0x556e19c05800 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=204800, wait={0x556e18933900:70}, signal={0x556e18933900:71} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:71}, signal={0x556e18933900:72} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a250bf0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a250c40 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a250bf0, semaphore=0x556e18933900, value=72 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a250c40, semaphore=0x556e18933900, value=72 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=419430400, buffer=0x556e1a2507b0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=419430400, wait={0x556e18933900:72}, signal={0x556e18933900:73} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:73}, signal={0x556e18933900:74} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199c3e30 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199c3e80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c3e30, semaphore=0x556e18933900, value=74 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c3e80, semaphore=0x556e18933900, value=74 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a2508f0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:74}, signal={0x556e18933900:75} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:75}, signal={0x556e18933900:76} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b4a060 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19aeb4a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b4a060, semaphore=0x556e18933900, value=76 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aeb4a0, semaphore=0x556e18933900, value=76 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e1a2508f0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:76}, signal={0x556e18933900:77} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:77}, signal={0x556e18933900:78} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b9fe70 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b9fec0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b9fe70, semaphore=0x556e18933900, value=78 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b9fec0, semaphore=0x556e18933900, value=78 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e18ba0130 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:78}, signal={0x556e18933900:79} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:79}, signal={0x556e18933900:80} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bbfce0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bbfd30 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bbfce0, semaphore=0x556e18933900, value=80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bbfd30, semaphore=0x556e18933900, value=80 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e19b76cd0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:80}, signal={0x556e18933900:81} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:81}, signal={0x556e18933900:82} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a253340 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a253390 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a253340, semaphore=0x556e18933900, value=82 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a253390, semaphore=0x556e18933900, value=82 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=786432, buffer=0x556e1a439960 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=786432, wait={0x556e18933900:82}, signal={0x556e18933900:83} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:83}, signal={0x556e18933900:84} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a24d930 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b473e0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d930, semaphore=0x556e18933900, value=84 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b473e0, semaphore=0x556e18933900, value=84 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a288940 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:84}, signal={0x556e18933900:85} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:85}, signal={0x556e18933900:86} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b67f70 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b67fc0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b67f70, semaphore=0x556e18933900, value=86 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b67fc0, semaphore=0x556e18933900, value=86 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a289780 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:86}, signal={0x556e18933900:87} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:87}, signal={0x556e18933900:88} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2ae8a0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a289880 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2ae8a0, semaphore=0x556e18933900, value=88 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a289880, semaphore=0x556e18933900, value=88 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e19bc1fa0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:88}, signal={0x556e18933900:89} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:89}, signal={0x556e18933900:90} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190f07e0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190f0830 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e190f07e0, semaphore=0x556e18933900, value=90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e190f0830, semaphore=0x556e18933900, value=90 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3615d0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:90}, signal={0x556e18933900:91} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:91}, signal={0x556e18933900:92} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a361100 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a361150 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a361100, semaphore=0x556e18933900, value=92 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a361150, semaphore=0x556e18933900, value=92 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19bc1dc0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:92}, signal={0x556e18933900:93} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:93}, signal={0x556e18933900:94} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bc0660 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bc06b0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc0660, semaphore=0x556e18933900, value=94 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc06b0, semaphore=0x556e18933900, value=94 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19bc0700 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:94}, signal={0x556e18933900:95} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:95}, signal={0x556e18933900:96} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19740d80 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19740dd0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19740d80, semaphore=0x556e18933900, value=96 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19740dd0, semaphore=0x556e18933900, value=96 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19740e20 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:96}, signal={0x556e18933900:97} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:97}, signal={0x556e18933900:98} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bc0a10 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18bb3b10 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc0a10, semaphore=0x556e18933900, value=98 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bb3b10, semaphore=0x556e18933900, value=98 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1207959552, buffer=0x556e18bb4000 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1207959552, wait={0x556e18933900:98}, signal={0x556e18933900:99} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:99}, signal={0x556e18933900:100} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e195d0b40 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e195d0510 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d0b40, semaphore=0x556e18933900, value=100 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d0510, semaphore=0x556e18933900, value=100 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=6144, buffer=0x556e19740e20 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=6144, wait={0x556e18933900:100}, signal={0x556e18933900:101} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:101}, signal={0x556e18933900:102} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19955b30 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19955b80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19955b30, semaphore=0x556e18933900, value=102 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19955b80, semaphore=0x556e18933900, value=102 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=402653184, buffer=0x556e19629880 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=402653184, wait={0x556e18933900:102}, signal={0x556e18933900:103} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:103}, signal={0x556e18933900:104} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19809610 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19809660 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809610, semaphore=0x556e18933900, value=104 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809660, semaphore=0x556e18933900, value=104 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=786432, buffer=0x556e19629490 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=786432, wait={0x556e18933900:104}, signal={0x556e18933900:105} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:105}, signal={0x556e18933900:106} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19809860 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e198098b0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809860, semaphore=0x556e18933900, value=106 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e198098b0, semaphore=0x556e18933900, value=106 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a3c0c90 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:106}, signal={0x556e18933900:107} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:107}, signal={0x556e18933900:108} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3c1090 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f796d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c1090, semaphore=0x556e18933900, value=108 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f796d0, semaphore=0x556e18933900, value=108 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19629490 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:108}, signal={0x556e18933900:109} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:109}, signal={0x556e18933900:110} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3c06d0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3c0920 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c06d0, semaphore=0x556e18933900, value=110 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c0920, semaphore=0x556e18933900, value=110 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e18b29560 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:110}, signal={0x556e18933900:111} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:111}, signal={0x556e18933900:112} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a10ca60 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a10cab0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10ca60, semaphore=0x556e18933900, value=112 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10cab0, semaphore=0x556e18933900, value=112 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19629490 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:112}, signal={0x556e18933900:113} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:113}, signal={0x556e18933900:114} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19623470 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e196234c0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19623470, semaphore=0x556e18933900, value=114 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e196234c0, semaphore=0x556e18933900, value=114 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19623510 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:114}, signal={0x556e18933900:115} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:115}, signal={0x556e18933900:116} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19623760 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a10d670 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19623760, semaphore=0x556e18933900, value=116 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10d670, semaphore=0x556e18933900, value=116 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19623510 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:116}, signal={0x556e18933900:117} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:117}, signal={0x556e18933900:118} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1af20 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1af70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1af20, semaphore=0x556e18933900, value=118 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1af70, semaphore=0x556e18933900, value=118 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19623510 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:118}, signal={0x556e18933900:119} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:119}, signal={0x556e18933900:120} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19acf2d0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19acf320 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19acf2d0, semaphore=0x556e18933900, value=120 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19acf320, semaphore=0x556e18933900, value=120 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1207959552, buffer=0x556e199b3320 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1207959552, wait={0x556e18933900:120}, signal={0x556e18933900:121} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:121}, signal={0x556e18933900:122} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3621f0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a362240 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3621f0, semaphore=0x556e18933900, value=122 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362240, semaphore=0x556e18933900, value=122 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=6144, buffer=0x556e1a10cc40 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=6144, wait={0x556e18933900:122}, signal={0x556e18933900:123} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:123}, signal={0x556e18933900:124} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3626f0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a362740 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3626f0, semaphore=0x556e18933900, value=124 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362740, semaphore=0x556e18933900, value=124 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=402653184, buffer=0x556e18a85800 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=402653184, wait={0x556e18933900:124}, signal={0x556e18933900:125} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:125}, signal={0x556e18933900:126} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b41e10 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b41e60 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b41e10, semaphore=0x556e18933900, value=126 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b41e60, semaphore=0x556e18933900, value=126 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e1a363330 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:126}, signal={0x556e18933900:127} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:127}, signal={0x556e18933900:128} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b42150 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b421a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b42150, semaphore=0x556e18933900, value=128 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b421a0, semaphore=0x556e18933900, value=128 (OK)
I0602 22:27:57.174826 140467571585024 partitioning.py:637] replicated train state shapes: TrainState(step=(1,), mdl_vars={'params': {'lm': {'final_ln': {'bias': (1, 2048), 'scale': (1, 2048)}, 'position_emb': {'emb_var': (1, 2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (1, 51200)}, 'linear': {'w': (1, 2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (1, 24, 8192)}, 'linear': {'w': (1, 24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (1, 24, 2048)}, 'linear': {'w': (1, 24, 8192, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}, 'self_attention': {'combined_qkv': {'w': (1, 24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (1, 24, 64)}, 'post': {'w': (1, 24, 2048, 32, 64)}}}}}}}}}, opt_states=[{'no_prefix': ({'count': (1,)}, {'count': (1,)}, {'count': (1,), 'm': {'params': {'lm': {'final_ln': {'bias': (1, 2048), 'scale': (1, 2048)}, 'position_emb': {'emb_var': (1, 2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (1, 51200)}, 'linear': {'w': (1, 2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': (1, 2048), 'scale': (1, 2048)}, 'position_emb': {'emb_var': (1, 2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (1, 51200)}, 'linear': {'w': (1, 2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}}, {'count': (1,)}), 'p#24#i-1': ({'count': (1, 24)}, {'count': (1, 24)}, {'count': (1, 24), 'm': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (1, 24, 8192)}, 'linear': {'w': (1, 24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (1, 24, 2048)}, 'linear': {'w': (1, 24, 8192, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}, 'self_attention': {'combined_qkv': {'w': (1, 24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (1, 24, 64)}, 'post': {'w': (1, 24, 2048, 32, 64)}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (1, 24, 8192)}, 'linear': {'w': (1, 24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (1, 24, 2048)}, 'linear': {'w': (1, 24, 8192, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}, 'self_attention': {'combined_qkv': {'w': (1, 24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (1, 24, 64)}, 'post': {'w': (1, 24, 2048, 32, 64)}}}}}}}}}}, {'count': (1, 24)})}])
W0602 22:27:57.175599 140467571585024 dispatch.py:272] Finished tracing + transforming jit(convert_element_type) in 0.00022101402282714844 sec
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e199b2fc0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:128}, signal={0x556e18933900:129} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:129}, signal={0x556e18933900:130} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c04f70 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199a91b0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04f70, semaphore=0x556e18933900, value=130 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199a91b0, semaphore=0x556e18933900, value=130 (OK)
W0602 22:27:57.177212 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.000293731689453125 sec
W0602 22:27:57.177804 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_seed for pjit in 0.001306295394897461 sec
W0602 22:27:57.179432 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00016808509826660156 sec
W0602 22:27:57.180124 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0016508102416992188 sec
W0602 22:27:57.181025 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_fold_in for pjit in 0.0047168731689453125 sec
W0602 22:27:57.181611 140467571585024 pxla.py:1882] Compiling _threefry_fold_in for with global shapes and types [ShapedArray(uint32[2]), ShapedArray(uint32[])]. Argument mapping: (GSPMDSharding({replicated}), GSPMDSharding({replicated})).
W0602 22:27:57.185242 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002880096435546875 sec
W0602 22:27:57.186213 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003724098205566406 sec
W0602 22:27:57.187031 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.000263214111328125 sec
W0602 22:27:57.187665 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002658367156982422 sec
W0602 22:27:57.224001 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_fold_in) in 0.042261362075805664 sec
W0602 22:27:57.672591 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_fold_in) in 0.4483067989349365 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5f7860 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5f7860, semaphore=0x556e189338c0, value=9 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=10, fence=0x556e1a3387d0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5f7860, from_fence=0x556e195d2630 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a368cc0, semaphore=0x556e189338c0, value=10 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5f7860, from_fence=0x556e19c04f70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199a91b0, semaphore=0x556e189338c0, value=10 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e1a5731a0, f=0, wait_fence=0x556e1a5f7860 {0x556e189338c0:9, 0x556e18933900:130}, signal_fence=0x556e1a3387d0 {0x556e189338c0:10} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19813510 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a05a820 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19813510, semaphore=0x556e189338c0, value=10 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a05a820, semaphore=0x556e189338c0, value=10 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ffb880, wait={0x556e18933900:130, 0x556e189338c0:10}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ad8460, wait={0x556e189338c0:10}, signal={} (OK)
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf198 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:10}, signal={} (OK)
I0602 22:27:57.673998 140467571585024 partitioning.py:647] root prng key: [3199903509 2250625448]
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ad8640, wait={0x556e189338c0:9}, signal={} (OK)
W0602 22:27:58.050272 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00019407272338867188 sec
W0602 22:27:58.051110 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0016100406646728516 sec
W0602 22:27:58.052080 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.0029850006103515625 sec
W0602 22:27:58.052662 140467571585024 pxla.py:1882] Compiling _threefry_split_original for with global shapes and types [ShapedArray(uint32[2])]. Argument mapping: (GSPMDSharding({replicated}),).
W0602 22:27:58.055990 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00040984153747558594 sec
W0602 22:27:58.056857 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00027370452880859375 sec
W0602 22:27:58.057676 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00027060508728027344 sec
W0602 22:27:58.058297 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002770423889160156 sec
W0602 22:27:58.095024 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_split_original) in 0.04224109649658203 sec
W0602 22:27:58.572319 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_split_original) in 0.477022647857666 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5b4770 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5b4770, semaphore=0x556e189338c0, value=10 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=11, fence=0x556e195265a0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5b4770, from_fence=0x556e19813510 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a05a820, semaphore=0x556e189338c0, value=11 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e19266b10, f=0, wait_fence=0x556e1a5b4770 {0x556e189338c0:10}, signal_fence=0x556e195265a0 {0x556e189338c0:11} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2b64a0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19ef0180 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2b64a0, semaphore=0x556e189338c0, value=11 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ef0180, semaphore=0x556e189338c0, value=11 (OK)
W0602 22:27:58.574334 140467571585024 dispatch.py:272] Finished tracing + transforming _unstack for pjit in 0.0007600784301757812 sec
W0602 22:27:58.574897 140467571585024 pxla.py:1882] Compiling _unstack for with global shapes and types [ShapedArray(uint32[3,2])]. Argument mapping: (GSPMDSharding({replicated}),).
W0602 22:27:58.576957 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_unstack) in 0.0019330978393554688 sec
W0602 22:27:58.638722 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_unstack) in 0.061475276947021484 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a1cb40 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a1cb40, semaphore=0x556e189338c0, value=11 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=12, fence=0x556e1a2a2ec0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18a1cb40, from_fence=0x556e1a2b64a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ef0180, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e196c1cf0, f=0, wait_fence=0x556e18a1cb40 {0x556e189338c0:11}, signal_fence=0x556e1a2a2ec0 {0x556e189338c0:12} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19fa7550 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190c26b0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19fa7550, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e190c26b0, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18c40140 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a450b40 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18c40140, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a450b40, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19047700 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18bec470 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19047700, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bec470, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1962c9e0, wait={0x556e189338c0:12}, signal={} (OK)
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf538 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:12}, signal={} (OK)
I0602 22:27:58.639493 140467571585024 executors.py:260] train prng seed: [3373580220 3771856083]
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf538 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:12}, signal={} (OK)
I0602 22:27:58.640165 140467571585024 executors.py:261] eval prng seed: [3893388808 331134876]
W0602 22:27:58.642186 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.001253366470336914 sec
W0602 22:27:58.642764 140467571585024 pxla.py:1882] Compiling _threefry_split_original for with global shapes and types [ShapedArray(uint32[2])]. Argument mapping: (GSPMDSharding({replicated}),).
W0602 22:27:58.683536 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_split_original) in 0.04065346717834473 sec
W0602 22:27:59.126293 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_split_original) in 0.4424901008605957 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f19d00 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f19d00, semaphore=0x556e189338c0, value=12 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=13, fence=0x556e19d8df10 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18f19d00, from_fence=0x556e18c40140 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a450b40, semaphore=0x556e189338c0, value=13 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e19dcb0c0, f=0, wait_fence=0x556e18f19d00 {0x556e189338c0:12}, signal_fence=0x556e19d8df10 {0x556e189338c0:13} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a280840 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2825c0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a280840, semaphore=0x556e189338c0, value=13 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2825c0, semaphore=0x556e189338c0, value=13 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1a210440, wait={0x556e189338c0:13}, signal={} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19e98090 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19e98090, semaphore=0x556e189338c0, value=13 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=14, fence=0x556e18c3b9a0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e19e98090, from_fence=0x556e19047700 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bec470, semaphore=0x556e189338c0, value=14 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e19dcb0c0, f=0, wait_fence=0x556e19e98090 {0x556e189338c0:13}, signal_fence=0x556e18c3b9a0 {0x556e189338c0:14} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18ca6110 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a598320 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18ca6110, semaphore=0x556e189338c0, value=14 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a598320, semaphore=0x556e189338c0, value=14 (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1a210620, wait={0x556e189338c0:14}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18b937f0, wait={0x556e189338c0:11}, signal={} (OK)
I0602 22:27:59.127763 140467571585024 executors.py:295] Starting executor.
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ec10d8 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:16}, signal={} (OK)
I0602 22:27:59.128399 140467571585024 executors.py:454] Model initial global_step=0
I0602 22:27:59.128465 140467571585024 executors.py:461] [PAX STATUS]: Starting training loop.
I0602 22:27:59.128512 140467571585024 programs.py:210] [PAX STATUS]: Setting up BaseTrainProgram.
I0602 22:27:59.128588 140467571585024 summary_utils.py:281] Opening SummaryWriter `log_NVIDIA1_3BPmap/summaries/train`...
I0602 22:27:59.129504 140467571585024 summary_utils.py:281] Opening SummaryWriter `log_NVIDIA1_3BPmap/summaries/eval_train`...
I0602 22:27:59.132162 140467571585024 py_utils.py:338] Starting sync_global_devices Start training loop from step: 0 across 1 devices globally
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19f52f60 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:130}, signal={0x556e18933900:131} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:131}, signal={0x556e18933900:132} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f601c0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f60020 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f601c0, semaphore=0x556e18933900, value=132 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f60020, semaphore=0x556e18933900, value=132 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19f57630 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f57630, semaphore=0x556e189338c0, value=14 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=15, fence=0x556e19f52ec0 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e19f57630, from_fence=0x556e18f601c0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f60020, semaphore=0x556e189338c0, value=15 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e19f57630 {0x556e189338c0:14, 0x556e18933900:132}, signal_fence=0x556e19f52ec0 {0x556e189338c0:15} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19db8490 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a1d0f80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19db8490, semaphore=0x556e189338c0, value=15 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a1d0f80, semaphore=0x556e189338c0, value=15 (OK)
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf9f8 (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:15}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ffb880, wait={0x556e18933900:132, 0x556e189338c0:15}, signal={} (OK)
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ffb880, wait={0x556e189338c0:15}, signal={} (OK)
I0602 22:27:59.133880 140467571585024 py_utils.py:341] Finished sync_global_devices Start training loop from step: 0 across 1 devices globally
W0602 22:27:59.344522 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003952980041503906 sec
W0602 22:27:59.345551 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0004057884216308594 sec
W0602 22:27:59.346375 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00029778480529785156 sec
I0602 22:27:59.357974 140467571585024 base_layer.py:632] Creating var /lm/softmax/logits_ffn/linear/w with shape=[2048, 51200], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.022097086912079608
W0602 22:27:59.359756 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00039649009704589844 sec
W0602 22:27:59.360463 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00029850006103515625 sec
W0602 22:27:59.361151 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003151893615722656 sec
W0602 22:27:59.361839 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003058910369873047 sec
W0602 22:27:59.362359 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.003681659698486328 sec
W0602 22:27:59.362930 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.004486560821533203 sec
W0602 22:27:59.363209 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.004973649978637695 sec
W0602 22:27:59.363873 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003452301025390625 sec
W0602 22:27:59.366754 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003452301025390625 sec
W0602 22:27:59.367571 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00014781951904296875 sec
W0602 22:27:59.368822 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003464221954345703 sec
I0602 22:27:59.370981 140467571585024 base_layer.py:632] Creating var /lm/position_emb/emb_var with shape=[2048, 2048], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
W0602 22:27:59.372668 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003902912139892578 sec
W0602 22:27:59.373650 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030803680419921875 sec
W0602 22:27:59.374335 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030493736267089844 sec
W0602 22:27:59.374852 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.003212451934814453 sec
W0602 22:27:59.375406 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.0039823055267333984 sec
W0602 22:27:59.375681 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0044536590576171875 sec
W0602 22:27:59.376334 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003409385681152344 sec
W0602 22:27:59.380460 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003094673156738281 sec
W0602 22:27:59.380913 140467571585024 dispatch.py:272] Finished tracing + transforming _one_hot for pjit in 0.0011699199676513672 sec
W0602 22:27:59.381712 140467571585024 dispatch.py:272] Finished tracing + transforming matmul for pjit in 0.0004634857177734375 sec
W0602 22:27:59.384545 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005903244018554688 sec
W0602 22:27:59.386854 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031876564025878906 sec
W0602 22:27:59.388913 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003006458282470703 sec
W0602 22:27:59.389868 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029277801513671875 sec
W0602 22:27:59.392195 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003139972686767578 sec
W0602 22:27:59.392928 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029277801513671875 sec
W0602 22:27:59.393807 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00031948089599609375 sec
W0602 22:27:59.460369 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003571510314941406 sec
W0602 22:27:59.461174 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003254413604736328 sec
W0602 22:27:59.461993 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00037980079650878906 sec
W0602 22:27:59.463433 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00034689903259277344 sec
W0602 22:27:59.464406 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030922889709472656 sec
W0602 22:27:59.465304 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00026702880859375 sec
W0602 22:27:59.466746 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003063678741455078 sec
W0602 22:27:59.467547 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002110004425048828 sec
W0602 22:27:59.468261 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00034117698669433594 sec
W0602 22:27:59.469110 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.00035643577575683594 sec
W0602 22:27:59.470227 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003745555877685547 sec
W0602 22:27:59.470706 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0011000633239746094 sec
W0602 22:27:59.471545 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.0003387928009033203 sec
I0602 22:27:59.472298 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
W0602 22:27:59.473121 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004374980926513672 sec
I0602 22:27:59.473878 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
W0602 22:27:59.475531 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0004634857177734375 sec
W0602 22:27:59.476112 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0012922286987304688 sec
W0602 22:27:59.476956 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.000301361083984375 sec
W0602 22:27:59.478550 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033736228942871094 sec
W0602 22:27:59.479698 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00035262107849121094 sec
W0602 22:27:59.480726 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0005667209625244141 sec
W0602 22:27:59.481723 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004661083221435547 sec
I0602 22:27:59.492099 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/combined_qkv/w with shape=[3, 2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
W0602 22:27:59.493881 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003955364227294922 sec
W0602 22:27:59.494585 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00029778480529785156 sec
W0602 22:27:59.495265 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003154277801513672 sec
W0602 22:27:59.495957 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003161430358886719 sec
W0602 22:27:59.496483 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.0036971569061279297 sec
W0602 22:27:59.497040 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.004477024078369141 sec
W0602 22:27:59.497337 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.004982948303222656 sec
W0602 22:27:59.497991 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034499168395996094 sec
W0602 22:27:59.500785 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005729198455810547 sec
I0602 22:27:59.502290 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/per_dim_scale/per_dim_scale with shape=[64], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
W0602 22:27:59.503006 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003504753112792969 sec
W0602 22:27:59.505191 140467571585024 dispatch.py:272] Finished tracing + transforming logaddexp for pjit in 0.0010294914245605469 sec
W0602 22:27:59.505631 140467571585024 dispatch.py:272] Finished tracing + transforming softplus for pjit in 0.0018105506896972656 sec
W0602 22:27:59.506253 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029921531677246094 sec
W0602 22:27:59.507096 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00048279762268066406 sec
W0602 22:27:59.508324 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004889965057373047 sec
W0602 22:27:59.509315 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003619194030761719 sec
W0602 22:27:59.510055 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029921531677246094 sec
W0602 22:27:59.511349 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0006844997406005859 sec
W0602 22:27:59.511945 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0015289783477783203 sec
W0602 22:27:59.512629 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003342628479003906 sec
W0602 22:27:59.513368 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.00036644935607910156 sec
W0602 22:27:59.514435 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003521442413330078 sec
W0602 22:27:59.514896 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010592937469482422 sec
W0602 22:27:59.516114 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002925395965576172 sec
W0602 22:27:59.516677 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002224445343017578 sec
W0602 22:27:59.517291 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030422210693359375 sec
W0602 22:27:59.519034 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0004436969757080078 sec
W0602 22:27:59.519812 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003006458282470703 sec
W0602 22:27:59.520391 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002224445343017578 sec
W0602 22:27:59.521248 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0005037784576416016 sec
W0602 22:27:59.521986 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00030612945556640625 sec
W0602 22:27:59.523357 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005538463592529297 sec
I0602 22:27:59.524224 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/post/w with shape=[2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
W0602 22:27:59.525957 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030350685119628906 sec
W0602 22:27:59.526643 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00029659271240234375 sec
W0602 22:27:59.527315 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031280517578125 sec
W0602 22:27:59.528625 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0009379386901855469 sec
W0602 22:27:59.529157 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.004189968109130859 sec
W0602 22:27:59.529712 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.0049741268157958984 sec
W0602 22:27:59.529983 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0055196285247802734 sec
W0602 22:27:59.530637 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003428459167480469 sec
W0602 22:27:59.533398 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005085468292236328 sec
W0602 22:27:59.536686 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003006458282470703 sec
W0602 22:27:59.537475 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00039386749267578125 sec
W0602 22:27:59.539196 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003559589385986328 sec
W0602 22:27:59.540110 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00038814544677734375 sec
W0602 22:27:59.541549 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.0003504753112792969 sec
W0602 22:27:59.542280 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029349327087402344 sec
W0602 22:27:59.543831 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035500526428222656 sec
W0602 22:27:59.544297 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010623931884765625 sec
I0602 22:27:59.555739 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.556720 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.567184 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/linear/w with shape=[2048, 8192], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
W0602 22:27:59.568845 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030517578125 sec
W0602 22:27:59.569918 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004093647003173828 sec
W0602 22:27:59.570603 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003085136413574219 sec
W0602 22:27:59.571115 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.003255605697631836 sec
W0602 22:27:59.571674 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.004033803939819336 sec
W0602 22:27:59.571937 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0045070648193359375 sec
W0602 22:27:59.572589 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003426074981689453 sec
W0602 22:27:59.575275 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004978179931640625 sec
I0602 22:27:59.575988 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/bias/b with shape=[8192], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
W0602 22:27:59.576705 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003559589385986328 sec
W0602 22:27:59.577903 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004894733428955078 sec
W0602 22:27:59.579039 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003445148468017578 sec
W0602 22:27:59.579769 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029587745666503906 sec
W0602 22:27:59.580426 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029397010803222656 sec
W0602 22:27:59.581023 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00022649765014648438 sec
W0602 22:27:59.581819 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004436969757080078 sec
W0602 22:27:59.582886 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003018379211425781 sec
W0602 22:27:59.583923 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031375885009765625 sec
W0602 22:27:59.584674 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.0003490447998046875 sec
W0602 22:27:59.585723 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.000362396240234375 sec
W0602 22:27:59.586187 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001088857650756836 sec
I0602 22:27:59.592514 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/linear/w with shape=[8192, 2048], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
W0602 22:27:59.594151 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002982616424560547 sec
W0602 22:27:59.595182 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004038810729980469 sec
W0602 22:27:59.595859 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003058910369873047 sec
W0602 22:27:59.596365 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.0031800270080566406 sec
W0602 22:27:59.596910 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.003953456878662109 sec
W0602 22:27:59.597192 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0044362545013427734 sec
W0602 22:27:59.597837 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034165382385253906 sec
W0602 22:27:59.600503 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004875659942626953 sec
I0602 22:27:59.601223 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/bias/b with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.647464 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.648468 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.662634 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/combined_qkv/w with shape=[3, 2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
I0602 22:27:59.666589 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/per_dim_scale/per_dim_scale with shape=[64], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.675507 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/post/w with shape=[2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
I0602 22:27:59.695976 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.696956 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.707431 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/linear/w with shape=[2048, 8192], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
I0602 22:27:59.710544 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/bias/b with shape=[8192], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.721235 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/linear/w with shape=[8192, 2048], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023
I0602 22:27:59.724257 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/bias/b with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
W0602 22:27:59.748244 140467571585024 dispatch.py:272] Finished tracing + transforming logaddexp for pjit in 0.0008337497711181641 sec
W0602 22:27:59.748823 140467571585024 dispatch.py:272] Finished tracing + transforming real for pjit in 0.00013375282287597656 sec
W0602 22:27:59.750124 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021195411682128906 sec
W0602 22:27:59.750622 140467571585024 dispatch.py:272] Finished tracing + transforming real for pjit in 0.000125885009765625 sec
I0602 22:27:59.836073 140467571585024 base_layer.py:632] Creating var /lm/final_ln/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
I0602 22:27:59.837118 140467571585024 base_layer.py:632] Creating var /lm/final_ln/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
W0602 22:27:59.854961 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004992485046386719 sec
I0602 22:27:59.857668 140467571585024 base_layer.py:632] Creating var /lm/softmax/logits_ffn/bias/b with shape=[51200], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0
W0602 22:27:59.858443 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034928321838378906 sec
W0602 22:27:59.859645 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004897117614746094 sec
W0602 22:27:59.862132 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003387928009033203 sec
W0602 22:27:59.863983 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002231597900390625 sec
W0602 22:27:59.866493 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034046173095703125 sec
W0602 22:27:59.868813 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0004448890686035156 sec
W0602 22:27:59.869667 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00037789344787597656 sec
W0602 22:27:59.870247 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021958351135253906 sec
W0602 22:27:59.871076 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00047278404235839844 sec
W0602 22:27:59.871719 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00023937225341796875 sec
W0602 22:27:59.872289 140467571585024 dispatch.py:272] Finished tracing + transforming log_softmax for pjit in 0.004140615463256836 sec
W0602 22:27:59.877013 140467571585024 dispatch.py:272] Finished tracing + transforming _squeeze for pjit in 0.00019550323486328125 sec
W0602 22:27:59.878072 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030040740966796875 sec
W0602 22:27:59.878505 140467571585024 dispatch.py:272] Finished tracing + transforming _one_hot for pjit in 0.0011343955993652344 sec
W0602 22:27:59.879173 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003135204315185547 sec
W0602 22:27:59.881138 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00036787986755371094 sec
W0602 22:27:59.882631 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00019931793212890625 sec
W0602 22:27:59.884168 140467571585024 dispatch.py:272] Finished tracing + transforming _argmax for pjit in 0.00023031234741210938 sec
W0602 22:27:59.885844 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030541419982910156 sec
W0602 22:27:59.887813 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003502368927001953 sec
W0602 22:27:59.889712 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003333091735839844 sec
W0602 22:27:59.893325 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00037479400634765625 sec
W0602 22:27:59.895086 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002925395965576172 sec
W0602 22:27:59.896665 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002980232238769531 sec
W0602 22:27:59.897483 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00045609474182128906 sec
W0602 22:27:59.898527 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003457069396972656 sec
W0602 22:27:59.899624 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00019860267639160156 sec
W0602 22:27:59.903453 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002846717834472656 sec
W0602 22:27:59.915879 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00040531158447265625 sec
W0602 22:27:59.919377 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003483295440673828 sec
W0602 22:28:00.023173 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00034737586975097656 sec
W0602 22:28:00.024725 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003299713134765625 sec
W0602 22:28:00.025552 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003440380096435547 sec
W0602 22:28:00.026346 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003674030303955078 sec
W0602 22:28:00.027400 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00030732154846191406 sec
W0602 22:28:00.027714 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009222030639648438 sec
W0602 22:28:00.028578 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003437995910644531 sec
W0602 22:28:00.029335 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003364086151123047 sec
W0602 22:28:00.031265 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033664703369140625 sec
W0602 22:28:00.033033 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0005025863647460938 sec
W0602 22:28:00.033688 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00028896331787109375 sec
W0602 22:28:00.034654 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032401084899902344 sec
W0602 22:28:00.035815 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.00042939186096191406 sec
W0602 22:28:00.036924 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028324127197265625 sec
W0602 22:28:00.037694 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035881996154785156 sec
W0602 22:28:00.038773 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028324127197265625 sec
W0602 22:28:00.039510 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035309791564941406 sec
W0602 22:28:00.040193 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003631114959716797 sec
W0602 22:28:00.040920 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00034737586975097656 sec
W0602 22:28:00.041531 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002837181091308594 sec
W0602 22:28:00.042264 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003497600555419922 sec
W0602 22:28:00.042867 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002853870391845703 sec
W0602 22:28:00.043591 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003466606140136719 sec
W0602 22:28:00.044194 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002808570861816406 sec
W0602 22:28:00.044924 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003497600555419922 sec
W0602 22:28:00.045540 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002884864807128906 sec
W0602 22:28:00.046352 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0004208087921142578 sec
W0602 22:28:00.046952 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028133392333984375 sec
W0602 22:28:00.047699 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035762786865234375 sec
W0602 22:28:00.050174 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028586387634277344 sec
W0602 22:28:00.050910 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003540515899658203 sec
W0602 22:28:00.051504 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002827644348144531 sec
W0602 22:28:00.052228 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003445148468017578 sec
W0602 22:28:00.052894 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003573894500732422 sec
W0602 22:28:00.053647 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003571510314941406 sec
W0602 22:28:00.055830 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003528594970703125 sec
W0602 22:28:00.056593 140467571585024 dispatch.py:272] Finished tracing + transforming isfinite for pjit in 0.00018739700317382812 sec
W0602 22:28:00.057262 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_all for pjit in 0.00037097930908203125 sec
W0602 22:28:00.058438 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00034618377685546875 sec
W0602 22:28:00.059085 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002803802490234375 sec
W0602 22:28:00.060156 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002760887145996094 sec
W0602 22:28:00.060764 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002777576446533203 sec
W0602 22:28:00.061400 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003063678741455078 sec
W0602 22:28:00.062021 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028133392333984375 sec
W0602 22:28:00.062642 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002751350402832031 sec
W0602 22:28:00.063256 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00027942657470703125 sec
W0602 22:28:00.064822 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034427642822265625 sec
W0602 22:28:00.065452 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002810955047607422 sec
W0602 22:28:00.066066 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002791881561279297 sec
W0602 22:28:00.075807 140467571585024 optimizers.py:1170] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update).
I0602 22:28:00.075856 140467571585024 optimizers.py:1173] Using sharded_adam.
W0602 22:28:00.075891 140467571585024 optimizers.py:580] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update).
W0602 22:28:00.076794 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003323554992675781 sec
W0602 22:28:00.078552 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002219676971435547 sec
W0602 22:28:00.079495 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003407001495361328 sec
W0602 22:28:00.079883 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009777545928955078 sec
W0602 22:28:00.080916 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.003301382064819336 sec
W0602 22:28:00.082036 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00021529197692871094 sec
W0602 22:28:00.083037 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004119873046875 sec
W0602 22:28:00.083425 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010447502136230469 sec
W0602 22:28:00.084460 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002839803695678711 sec
W0602 22:28:00.085329 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00022673606872558594 sec
W0602 22:28:00.086240 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003211498260498047 sec
W0602 22:28:00.086613 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009417533874511719 sec
W0602 22:28:00.087617 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0027151107788085938 sec
W0602 22:28:00.088449 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00020623207092285156 sec
W0602 22:28:00.089448 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00032806396484375 sec
W0602 22:28:00.089833 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010499954223632812 sec
W0602 22:28:00.090841 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0027959346771240234 sec
W0602 22:28:00.091972 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030994415283203125 sec
W0602 22:28:00.092691 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.00031757354736328125 sec
W0602 22:28:00.093427 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00032520294189453125 sec
W0602 22:28:00.097660 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002770423889160156 sec
W0602 22:28:00.098491 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028824806213378906 sec
W0602 22:28:00.111270 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028967857360839844 sec
W0602 22:28:00.112138 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003113746643066406 sec
W0602 22:28:00.118681 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002868175506591797 sec
W0602 22:28:00.119528 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002980232238769531 sec
W0602 22:28:00.126013 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028777122497558594 sec
W0602 22:28:00.126874 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003070831298828125 sec
W0602 22:28:00.128772 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033593177795410156 sec
W0602 22:28:00.129413 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002238750457763672 sec
W0602 22:28:00.130366 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002868175506591797 sec
W0602 22:28:00.132015 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003223419189453125 sec
W0602 22:28:00.132609 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002219676971435547 sec
W0602 22:28:00.133493 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002834796905517578 sec
W0602 22:28:00.134187 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003380775451660156 sec
W0602 22:28:00.134777 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00022077560424804688 sec
W0602 22:28:00.136005 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003387928009033203 sec
W0602 22:28:00.136672 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031280517578125 sec
W0602 22:28:00.137268 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002319812774658203 sec
W0602 22:28:00.138124 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002760887145996094 sec
W0602 22:28:00.138686 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00020956993103027344 sec
W0602 22:28:00.139617 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00034236907958984375 sec
W0602 22:28:00.140043 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010128021240234375 sec
W0602 22:28:00.141171 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00021147727966308594 sec
W0602 22:28:00.142399 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0017132759094238281 sec
W0602 22:28:00.143631 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00027441978454589844 sec
W0602 22:28:00.145866 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00021409988403320312 sec
W0602 22:28:00.146854 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00041794776916503906 sec
W0602 22:28:00.147282 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010738372802734375 sec
W0602 22:28:00.148948 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002741813659667969 sec
W0602 22:28:00.149488 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00020432472229003906 sec
W0602 22:28:00.150406 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00034236907958984375 sec
W0602 22:28:00.150839 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010149478912353516 sec
W0602 22:28:00.152539 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002808570861816406 sec
W0602 22:28:00.153557 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.0006973743438720703 sec
W0602 22:28:00.154273 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003600120544433594 sec
W0602 22:28:00.156442 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00029158592224121094 sec
W0602 22:28:00.157175 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003440380096435547 sec
W0602 22:28:00.159234 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033092498779296875 sec
W0602 22:28:00.163224 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032711029052734375 sec
W0602 22:28:00.163944 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00031685829162597656 sec
W0602 22:28:00.180238 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00021576881408691406 sec
W0602 22:28:00.181189 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003566741943359375 sec
W0602 22:28:00.181575 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009889602661132812 sec
W0602 22:28:00.182617 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0028810501098632812 sec
W0602 22:28:00.185919 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002105236053466797 sec
W0602 22:28:00.186836 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003268718719482422 sec
W0602 22:28:00.187237 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009739398956298828 sec
W0602 22:28:00.188269 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0027616024017333984 sec
W0602 22:28:00.194182 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002090930938720703 sec
W0602 22:28:00.195191 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004177093505859375 sec
W0602 22:28:00.195587 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.001054525375366211 sec
W0602 22:28:00.196628 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002859830856323242 sec
W0602 22:28:00.201195 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00022411346435546875 sec
W0602 22:28:00.202216 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004239082336425781 sec
W0602 22:28:00.202624 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010750293731689453 sec
W0602 22:28:00.203661 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002897977828979492 sec
W0602 22:28:00.206920 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002090930938720703 sec
W0602 22:28:00.207919 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004134178161621094 sec
W0602 22:28:00.208302 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010347366333007812 sec
W0602 22:28:00.209338 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0028257369995117188 sec
W0602 22:28:00.212531 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002090930938720703 sec
W0602 22:28:00.213589 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004582405090332031 sec
W0602 22:28:00.213998 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.00112152099609375 sec
W0602 22:28:00.215037 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002917766571044922 sec
W0602 22:28:00.226901 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028133392333984375 sec
W0602 22:28:00.229168 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003066062927246094 sec
W0602 22:28:00.238538 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028777122497558594 sec
W0602 22:28:00.240322 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030875205993652344 sec
W0602 22:28:00.260538 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002918243408203125 sec
W0602 22:28:00.262394 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003674030303955078 sec
W0602 22:28:00.310421 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002956390380859375 sec
W0602 22:28:00.312259 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003032684326171875 sec
W0602 22:28:00.323301 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.000286102294921875 sec
W0602 22:28:00.332812 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030231475830078125 sec
W0602 22:28:00.334606 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002968311309814453 sec
W0602 22:28:00.337694 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003323554992675781 sec
W0602 22:28:00.339194 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002148151397705078 sec
W0602 22:28:00.340575 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002872943878173828 sec
W0602 22:28:00.341731 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032591819763183594 sec
W0602 22:28:00.342816 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021719932556152344 sec
W0602 22:28:00.344238 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003428459167480469 sec
W0602 22:28:00.347850 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033020973205566406 sec
W0602 22:28:00.348942 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021958351135253906 sec
W0602 22:28:00.350367 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003342628479003906 sec
W0602 22:28:00.357018 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033855438232421875 sec
W0602 22:28:00.358186 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002727508544921875 sec
W0602 22:28:00.359584 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002987384796142578 sec
W0602 22:28:00.360738 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003199577331542969 sec
W0602 22:28:00.361827 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021886825561523438 sec
W0602 22:28:00.363179 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002841949462890625 sec
W0602 22:28:00.364387 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032711029052734375 sec
W0602 22:28:00.365492 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002205371856689453 sec
W0602 22:28:00.366867 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002865791320800781 sec
W0602 22:28:00.367908 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.000213623046875 sec
W0602 22:28:00.369345 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00044918060302734375 sec
W0602 22:28:00.369788 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0011372566223144531 sec
W0602 22:28:00.375707 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.000278472900390625 sec
W0602 22:28:00.376975 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00020933151245117188 sec
W0602 22:28:00.378338 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003464221954345703 sec
W0602 22:28:00.378786 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001043081283569336 sec
W0602 22:28:00.381543 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00028705596923828125 sec
W0602 22:28:00.386663 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.0002148151397705078 sec
W0602 22:28:00.388015 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003528594970703125 sec
W0602 22:28:00.388454 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001035451889038086 sec
W0602 22:28:00.391259 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002779960632324219 sec
W0602 22:28:00.402782 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00022363662719726562 sec
W0602 22:28:00.404162 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00036597251892089844 sec
W0602 22:28:00.404610 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001064300537109375 sec
W0602 22:28:00.407434 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.000347137451171875 sec
W0602 22:28:00.408698 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00021696090698242188 sec
W0602 22:28:00.410056 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003528594970703125 sec
W0602 22:28:00.410495 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010399818420410156 sec
W0602 22:28:00.413275 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002903938293457031 sec
W0602 22:28:00.414558 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00021409988403320312 sec
W0602 22:28:00.415879 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003445148468017578 sec
W0602 22:28:00.416320 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010304450988769531 sec
W0602 22:28:00.419043 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00028204917907714844 sec
W0602 22:28:00.420496 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003352165222167969 sec
W0602 22:28:00.430241 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003299713134765625 sec
W0602 22:28:00.466179 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003275871276855469 sec
W0602 22:28:00.466578 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009796619415283203 sec
W0602 22:28:00.467983 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003292560577392578 sec
W0602 22:28:00.468378 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009663105010986328 sec
W0602 22:28:00.469430 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00033593177795410156 sec
W0602 22:28:00.469815 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009608268737792969 sec
W0602 22:28:00.470983 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003285408020019531 sec
W0602 22:28:00.471376 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010793209075927734 sec
W0602 22:28:00.472426 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003216266632080078 sec
W0602 22:28:00.472812 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009558200836181641 sec
W0602 22:28:00.473871 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00032591819763183594 sec
W0602 22:28:00.474267 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009617805480957031 sec
W0602 22:28:00.475304 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003190040588378906 sec
W0602 22:28:00.475695 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009546279907226562 sec
W0602 22:28:00.476808 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003247261047363281 sec
W0602 22:28:00.477206 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.001033782958984375 sec
W0602 22:28:00.479650 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00032830238342285156 sec
W0602 22:28:00.480049 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009765625 sec
W0602 22:28:00.481175 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00040793418884277344 sec
W0602 22:28:00.481562 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010426044464111328 sec
W0602 22:28:00.482620 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003299713134765625 sec
W0602 22:28:00.483013 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009722709655761719 sec
W0602 22:28:00.483891 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00017547607421875 sec
W0602 22:28:00.484178 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0007045269012451172 sec
W0602 22:28:00.487202 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00031948089599609375 sec
W0602 22:28:00.487582 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009427070617675781 sec
W0602 22:28:00.513556 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002880096435546875 sec
W0602 22:28:00.514215 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031638145446777344 sec
W0602 22:28:00.514865 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029468536376953125 sec
W0602 22:28:00.515513 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003058910369873047 sec
W0602 22:28:00.517177 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003752708435058594 sec
W0602 22:28:00.517815 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028586387634277344 sec
W0602 22:28:00.518432 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028514862060546875 sec
W0602 22:28:00.519467 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003421306610107422 sec
W0602 22:28:00.520108 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00020766258239746094 sec
W0602 22:28:00.520784 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003330707550048828 sec
W0602 22:28:00.521442 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003376007080078125 sec
W0602 22:28:00.526046 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00021767616271972656 sec
W0602 22:28:00.526714 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032639503479003906 sec
W0602 22:28:00.528863 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002040863037109375 sec
W0602 22:28:00.529608 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00038170814514160156 sec
W0602 22:28:00.531756 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002048015594482422 sec
W0602 22:28:00.532421 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032019615173339844 sec
W0602 22:28:00.534582 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0001976490020751953 sec
W0602 22:28:00.535311 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00038623809814453125 sec
W0602 22:28:00.536197 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002865791320800781 sec
W0602 22:28:00.537872 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00020456314086914062 sec
W0602 22:28:00.538537 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032448768615722656 sec
W0602 22:28:00.539419 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002818107604980469 sec
W0602 22:28:00.541140 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002739429473876953 sec
W0602 22:28:00.541811 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032782554626464844 sec
W0602 22:28:00.542690 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00028395652770996094 sec
W0602 22:28:00.544338 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00019788742065429688 sec
W0602 22:28:00.544997 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003192424774169922 sec
W0602 22:28:00.545884 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002818107604980469 sec
W0602 22:28:00.555858 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002009868621826172 sec
W0602 22:28:00.556550 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003445148468017578 sec
W0602 22:28:00.557439 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002849102020263672 sec
W0602 22:28:00.559089 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0001995563507080078 sec
W0602 22:28:00.559751 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00031757354736328125 sec
W0602 22:28:00.560696 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003495216369628906 sec
W0602 22:28:00.562363 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002009868621826172 sec
W0602 22:28:00.563033 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003275871276855469 sec
W0602 22:28:00.563914 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002868175506591797 sec
W0602 22:28:00.601922 140467571585024 dispatch.py:272] Finished tracing + transforming _wrapped_step_fn for pmap in 1.2830684185028076 sec
W0602 22:28:00.602606 140467571585024 pxla.py:859] Compiling _wrapped_step_fn (140457517638464) for 1 devices with args (ShapedArray(uint32[1]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048,2048]), ShapedArray(float32[1,51200]), ShapedArray(float32[1,2048,51200]), ShapedArray(float32[1,24,8192]), ShapedArray(float32[1,24,2048,8192]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,8192,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,3,2048,32,64]), ShapedArray(float32[1,24,64]), ShapedArray(float32[1,24,2048,32,64]), ShapedArray(int32[1]), ShapedArray(int32[1]), ShapedArray(int32[1]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048,2048]), ShapedArray(float32[1,51200]), ShapedArray(float32[1,2048,51200]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048,2048]), ShapedArray(float32[1,51200]), ShapedArray(float32[1,2048,51200]), ShapedArray(int32[1]), ShapedArray(int32[1,24]), ShapedArray(int32[1,24]), ShapedArray(int32[1,24]), ShapedArray(float32[1,24,8192]), ShapedArray(float32[1,24,2048,8192]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,8192,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,3,2048,32,64]), ShapedArray(float32[1,24,64]), ShapedArray(float32[1,24,2048,32,64]), ShapedArray(float32[1,24,8192]), ShapedArray(float32[1,24,2048,8192]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,8192,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,3,2048,32,64]), ShapedArray(float32[1,24,64]), ShapedArray(float32[1,24,2048,32,64]), ShapedArray(int32[1,24]), ShapedArray(uint32[1,2]), ShapedArray(float32[1,1]), ShapedArray(int32[1,1,2048]), ShapedArray(int32[1,1,2048]), ShapedArray(float32[1,1,2048]), ShapedArray(int32[1,1,2048]), ShapedArray(int32[1,1,2048]), ShapedArray(float32[1,1,2048])). (num_replicas=1)
/workspace/jax/jax/_src/interpreters/mlir.py:618: UserWarning: Some donated buffers were not usable: ShapedArray(uint32[]), ShapedArray(float32[2048]), ShapedArray(float32[2048]), ShapedArray(float32[2048,2048]), ShapedArray(float32[51200]), ShapedArray(float32[2048,51200]), ShapedArray(float32[24,8192]), ShapedArray(float32[24,2048,8192]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,8192,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,3,2048,32,64]), ShapedArray(float32[24,64]), ShapedArray(float32[24,2048,32,64]), ShapedArray(int32[]), ShapedArray(int32[]), ShapedArray(int32[]), ShapedArray(float32[2048]), ShapedArray(float32[2048]), ShapedArray(float32[2048,2048]), ShapedArray(float32[51200]), ShapedArray(float32[2048,51200]), ShapedArray(float32[2048]), ShapedArray(float32[2048]), ShapedArray(float32[2048,2048]), ShapedArray(float32[51200]), ShapedArray(float32[2048,51200]), ShapedArray(int32[]), ShapedArray(int32[24]), ShapedArray(int32[24]), ShapedArray(int32[24]), ShapedArray(float32[24,8192]), ShapedArray(float32[24,2048,8192]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,8192,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,3,2048,32,64]), ShapedArray(float32[24,64]), ShapedArray(float32[24,2048,32,64]), ShapedArray(float32[24,8192]), ShapedArray(float32[24,2048,8192]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,8192,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,3,2048,32,64]), ShapedArray(float32[24,64]), ShapedArray(float32[24,2048,32,64]), ShapedArray(int32[24]).
Donation is not implemented for iree_cuda.
See an explanation at https://jax.readthedocs.io/en/latest/faq.html#buffer-donation.
warnings.warn(f"Some donated buffers were not usable: {', '.join(unused_donations)}.\n{msg}")
W0602 22:28:00.607805 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002810955047607422 sec
W0602 22:28:00.608350 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_seed for pjit in 0.00139617919921875 sec
W0602 22:28:00.609600 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00016379356384277344 sec
W0602 22:28:00.610242 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012764930725097656 sec
W0602 22:28:00.611082 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_fold_in for pjit in 0.004296541213989258 sec
W0602 22:28:00.614205 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00035190582275390625 sec
W0602 22:28:00.615066 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00026988983154296875 sec
W0602 22:28:00.615887 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002651214599609375 sec
W0602 22:28:00.616645 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026297569274902344 sec
W0602 22:28:00.617237 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026726722717285156 sec
W0602 22:28:00.654618 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00016427040100097656 sec
W0602 22:28:00.655276 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012590885162353516 sec
W0602 22:28:00.656152 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.0023670196533203125 sec
W0602 22:28:00.658758 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00026988983154296875 sec
W0602 22:28:00.659666 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034999847412109375 sec
W0602 22:28:00.660421 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00025963783264160156 sec
W0602 22:28:00.661008 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002598762512207031 sec
W0602 22:28:00.698722 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.0001583099365234375 sec
W0602 22:28:00.699368 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012388229370117188 sec
W0602 22:28:00.700251 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.002351522445678711 sec
W0602 22:28:00.703298 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0007131099700927734 sec
W0602 22:28:00.704129 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002689361572265625 sec
W0602 22:28:00.704889 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002589225769042969 sec
W0602 22:28:00.705490 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002582073211669922 sec
W0602 22:28:00.750689 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.0001697540283203125 sec
W0602 22:28:00.751362 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012905597686767578 sec
W0602 22:28:00.752371 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.002538919448852539 sec
W0602 22:28:00.755024 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002722740173339844 sec
W0602 22:28:00.755850 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002663135528564453 sec
W0602 22:28:00.756608 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002589225769042969 sec
W0602 22:28:00.757198 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026679039001464844 sec
W0602 22:28:00.800924 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002942085266113281 sec
W0602 22:28:00.802230 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.000308990478515625 sec
W0602 22:28:00.823777 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002770423889160156 sec
W0602 22:28:00.938540 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003228187561035156 sec
W0602 22:28:00.939468 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030684471130371094 sec
W0602 22:28:00.940159 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002620220184326172 sec
W0602 22:28:00.940891 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00025773048400878906 sec
W0602 22:28:00.959371 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00027489662170410156 sec
W0602 22:28:01.484800 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion pmap(_wrapped_step_fn) in 0.8817405700683594 sec
W0602 22:28:10.412033 140467571585024 dispatch.py:272] Finished XLA compilation of _wrapped_step_fn in 8.916320085525513 sec
W0602 22:28:10.421547 140467571585024 dispatch.py:272] Finished tracing + transforming _multi_slice for pjit in 0.0004761219024658203 sec
W0602 22:28:10.422143 140467571585024 pxla.py:1882] Compiling _multi_slice for with global shapes and types [ShapedArray(uint32[1,2])]. Argument mapping: (GSPMDSharding({replicated}),).
W0602 22:28:10.423828 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_multi_slice) in 0.0015578269958496094 sec
W0602 22:28:10.486968 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_multi_slice) in 0.0628662109375 sec
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1c697e70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c697e70, semaphore=0x556e189338c0, value=15 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=16, fence=0x556e19f6c630 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1c697e70, from_fence=0x556e1a280840 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2825c0, semaphore=0x556e189338c0, value=16 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e1b0149a0, f=0, wait_fence=0x556e1c697e70 {0x556e189338c0:15}, signal_fence=0x556e19f6c630 {0x556e189338c0:16} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19d4ba50 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18df85d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d4ba50, semaphore=0x556e189338c0, value=16 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18df85d0, semaphore=0x556e189338c0, value=16 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a0226e0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:132}, signal={0x556e18933900:133} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:133}, signal={0x556e18933900:134} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1c697e70 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2586a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c697e70, semaphore=0x556e18933900, value=134 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2586a0, semaphore=0x556e18933900, value=134 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e18df7630 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:134}, signal={0x556e18933900:135} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:135}, signal={0x556e18933900:136} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1ad9d840 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18fbca50 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1ad9d840, semaphore=0x556e18933900, value=136 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18fbca50, semaphore=0x556e18933900, value=136 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e193b88a0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:136}, signal={0x556e18933900:137} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:137}, signal={0x556e18933900:138} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1b87c050 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1b87c230 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1b87c050, semaphore=0x556e18933900, value=138 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1b87c230, semaphore=0x556e18933900, value=138 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1ca2f600 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:138}, signal={0x556e18933900:139} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:139}, signal={0x556e18933900:140} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3672d0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1c9dd550 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3672d0, semaphore=0x556e18933900, value=140 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c9dd550, semaphore=0x556e18933900, value=140 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1a0226e0 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:140}, signal={0x556e18933900:141} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:141}, signal={0x556e18933900:142} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5be9c0 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1af12c90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5be9c0, semaphore=0x556e18933900, value=142 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1af12c90, semaphore=0x556e18933900, value=142 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1ca2f600 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:142}, signal={0x556e18933900:143} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:143}, signal={0x556e18933900:144} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a4d6d90 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5e3ae0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a4d6d90, semaphore=0x556e18933900, value=144 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5e3ae0, semaphore=0x556e18933900, value=144 (OK)
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1ca2f600 (OK)
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:144}, signal={0x556e18933900:145} (OK)
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:145}, signal={0x556e18933900:146} (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19fd4610 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a0b4920 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19fd4610, semaphore=0x556e18933900, value=146 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a0b4920, semaphore=0x556e18933900, value=146 (OK)
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5e31f0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5e31f0, semaphore=0x556e189338c0, value=16 (OK)
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=17, fence=0x556e19669370 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e191a6280 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a470c0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1962ee50 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1962ea60, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3f5e20 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a422390, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e197797a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3d6610, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a2a73d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ffd400, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a420280 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2afb00, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a14ab90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a14abe0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c07d10 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c07d60, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e193ef470 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19140df0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c04ab0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04b00, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18baf800 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aded80, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1922bd90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a41c090, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3a2b90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3a2be0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a419dd0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e191e42f0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19829890 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e198298e0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19f4deb0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f4df00, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19d0a1e0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d0a230, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e199c4af0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535260, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b2be70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533540, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c1a1d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1a220, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a151ab0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c23f90, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19b5a2f0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b5a340, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19533ed0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533f20, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c24420 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b54260, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19535d90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535de0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c09d90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e191ecc60, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c05760 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c057b0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a24d3f0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d440, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a250bf0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a250c40, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e199c3e30 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c3e80, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b4a060 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aeb4a0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b9fe70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b9fec0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19bbfce0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bbfd30, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a253340 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a253390, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a24d930 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b473e0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19b67f70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b67fc0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a2ae8a0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a289880, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e190f07e0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e190f0830, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a361100 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a361150, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19bc0660 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc06b0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19740d80 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19740dd0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19bc0a10 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bb3b10, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e195d0b40 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d0510, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19955b30 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19955b80, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19809610 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809660, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19809860 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e198098b0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3c1090 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f796d0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3c06d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c0920, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a10ca60 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10cab0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19623470 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e196234c0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19623760 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10d670, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c1af20 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1af70, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19acf2d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e19acf320, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3621f0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362240, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3626f0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362740, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b41e10 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b41e60, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b42150 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b421a0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19d4ba50 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18df85d0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1c697e70 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2586a0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1ad9d840 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e18fbca50, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1b87c050 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1b87c230, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3672d0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c9dd550, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a5be9c0 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1af12c90, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a4d6d90 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5e3ae0, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19fd4610 (OK)
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a0b4920, semaphore=0x556e189338c0, value=17 (OK)
:: IREE INVOKE (vm_invoke[async]): context=0x556e1add98f0, f=0, wait_fence=0x556e1a5e31f0 {0x556e189338c0:16, 0x556e18933900:146}, signal_fence=0x556e19669370 {0x556e189338c0:17}Fatal Python error: Segmentation fault
Current thread 0x00007fc127b4f000 (most recent call first):
File "/workspace/jax/jax/_src/interpreters/pxla.py", line 1349 in __call__
File "/workspace/jax/jax/_src/profiler.py", line 314 in wrapper
File "/workspace/jax/jax/_src/api.py", line 1774 in cache_miss
File "/workspace/jax/jax/_src/traceback_util.py", line 166 in reraise_with_filtered_traceback
File "/opt/paxml/paxml/partitioning.py", line 712 in _wrapped_partitioned_step
File "/opt/paxml/paxml/programs.py", line 559 in train_step
File "/opt/paxml/paxml/programs.py", line 294 in run
File "/opt/paxml/paxml/executors.py", line 529 in _train_and_evaluate_common
File "/opt/paxml/paxml/executors.py", line 297 in start
File "/opt/paxml/paxml/train.py", line 264 in train_and_evaluate
File "/opt/paxml/paxml/main.py", line 277 in run_experiment
File "/opt/paxml/paxml/main.py", line 400 in run
File "/opt/paxml/paxml/main.py", line 456 in main
File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254 in _run_main
File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308 in run
File "/opt/paxml/paxml/main.py", line 486 in <module>
Extension modules: jaxlib.cpu_feature_guard, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, zstandard.backend_c, msgpack._cmsgpack, yaml._yaml, google.protobuf.pyext._message, tensorflow.python.framework.fast_tensor_util, charset_normalizer.md, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, PIL._imaging, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pandas._libs.tslib, pandas._libs.ops, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, numpy.linalg.lapack_lite, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._mvn, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, editdistance.bycython, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, matplotlib._image, scipy.cluster._vq, scipy.cluster._hierarchy, scipy.cluster._optimal_leaf_ordering, lxml._elementpath, lxml.etree, sklearn.__check_build._check_build, sklearn.utils.murmurhash, sklearn.utils._isfinite, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.utils._typedefs, sklearn.utils._readonly_array_wrapper, sklearn.metrics._dist_metrics, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, regex._regex, sklearn.feature_extraction._hashing_fast, sklearn.svm._libsvm, sklearn.svm._liblinear, sklearn.svm._libsvm_sparse, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.utils.arrayfuncs, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, sklearn.datasets._svmlight_format_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils (total: 238)
@stellaraccident
Copy link

Note ASAN stack Trace: https://gist.github.com/trevor-m/d1d8912f8ab0d96da315a0f6e2f4aff5

AddressSanitizer:DEADLYSIGNAL
=================================================================
==54208==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000d3c0 (pc 0x7f8464007a7c bp 0x00000000d3c0 sp 0x6310013ea1f0 T0)
==54208==The signal is caused by a READ memory access.
    #0 0x7f8464007a7c in pthread_kill (/usr/lib/x86_64-linux-gnu/libc.so.6+0x96a7c) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
    #1 0x7f8463fb3475 in gsignal (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42475) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
    #2 0x7f8463fb351f  (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
    #3 0x7f8282aff8f6 in iree_hal_resource_set_free resource_set.c
    #4 0x7f8282afb705 in iree_hal_deferred_command_buffer_destroy deferred_command_buffer.c
    #5 0x7f828231da39 in iree_vm_ref_move ref.c
    #6 0x7f82822f2699 in iree_vm_bytecode_issue_import_call dispatch.c
    #7 0x7f82822f05c1 in iree_vm_bytecode_call_import dispatch.c
    #8 0x7f82822e41ad in iree_vm_bytecode_dispatch dispatch.c
    #9 0x7f82822db8c8 in iree_vm_bytecode_dispatch_begin dispatch.c
    #10 0x7f82822d9abc in iree_vm_bytecode_module_begin_call module.c
    #11 0x7f8282315ff7 in iree_vm_begin_invoke invocation.c
    #12 0x7f8282314f2b in iree_vm_invoke invocation.c
    #13 0x7f828229cd78 in iree::pjrt::LoadedExecutableInstance::BatchExecute(PJRT_LoadedExecutable_Execute_Args*) api_impl.cc
    #14 0x7f82822a8549 in iree::pjrt::LoadedExecutableInstance::BindApi(PJRT_Api*)::$_54::__invoke(PJRT_LoadedExecutable_Execute_Args*) api_impl.cc
    #15 0x7f845a2c48fa in xla::PjRtCApiLoadedExecutable::Execute(absl::lts_20230125::Span<std::vector<xla::PjRtBuffer*, std::allocator<xla::PjRtBuffer*> > const>, xla::ExecuteOptions const&, std::optional<std::vector<xla::PjRtFuture<absl::lts_20230125::Status>, std::allocator<xla::PjRtFuture<absl::lts_20230125::Status> > > >&) (/usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so+0xc838fa) (BuildId: 87a9c5e3db2565f8631e59fe5e690269)
    #16 0x7f845c885dd6 in xla::ifrt::PjRtLoadedExecutable::Execute(absl::lts_20230125::Span<tsl::RCReference<xla::ifrt::Array> >, xla::ExecuteOptions const&, std::optional<xla::ifrt::DeviceList>) (/usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so+0x3244dd6) (BuildId: 87a9c5e3db2565f8631e59fe5e690269)
    #17 0x7f845a273126 in absl::lts_20230125::StatusOr<xla::PyExecuteResults> xla::(anonymous namespace)::ExecuteShardedOnLocalDevicesInternal<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > >, xla::(anonymous namespace)::ShardedBufferAdapter<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > > > >(xla::ExecuteOptions const&, std::shared_ptr<xla::PyClient> const&, xla::ifrt::LoadedExecutable*, absl::lts_20230125::Span<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > > const>, std::optional<std::vector<xla::PjRtFuture<absl::lts_20230125::Status>, std::allocator<xla::PjRtFuture<absl::lts_20230125::Status> > > >&) py_executable.cc
    #18 0x7f845a27447d in xla::PyLoadedExecutable::ExecuteSharded(std::vector<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > >, std::allocator<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > > > >, bool) (/usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so+0xc3347d) (BuildId: 87a9c5e3db2565f8631e59fe5e690269)
    #19 0x7f8459f68be3 in void pybind11::cpp_function::initialize<xla::ValueOrThrowWrapper<absl::lts_20230125::StatusOr<xla::PyExecuteResults> (std::vector<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > >, std::allocator<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > > > >, bool), xla::PyLoadedExecutable>, xla::PyExecuteResults, xla::PyLoadedExecutable&, std::vector<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > >, std::allocator<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > > > >, bool, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v>(xla::ValueOrThrowWrapper<absl::lts_20230125::StatusOr<xla::PyExecuteResults> (std::vector<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > >, std::allocator<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > > > >, bool), xla::PyLoadedExecutable>&&, xla::PyExecuteResults (*)(xla::PyLoadedExecutable&, std::vector<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > >, std::allocator<std::variant<xla::PyArray, std::vector<xla::PyArray, std::allocator<xla::PyArray> > > > >, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg_v const&)::'lambda1'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const (/usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so+0x927be3) (BuildId: 87a9c5e3db2565f8631e59fe5e690269)
    #20 0x7f8459f3d8e0 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (/usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so+0x8fc8e0) (BuildId: 87a9c5e3db2565f8631e59fe5e690269)
    #21 0x55e76416799d  (/usr/bin/python3.10+0x15c99d) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #22 0x55e76415e4aa in _PyObject_MakeTpCall (/usr/bin/python3.10+0x1534aa) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #23 0x55e764175f0a  (/usr/bin/python3.10+0x16af0a) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #24 0x55e764156461 in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x14b461) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #25 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #26 0x55e764152aef in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x147aef) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #27 0x55e76415d633 in _PyObject_FastCallDictTstate (/usr/bin/python3.10+0x152633) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #28 0x55e764172d10 in _PyObject_Call_Prepend (/usr/bin/python3.10+0x167d10) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #29 0x55e76429060f  (/usr/bin/python3.10+0x28560f) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #30 0x55e76417687a in PyObject_Call (/usr/bin/python3.10+0x16b87a) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #31 0x55e764152aef in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x147aef) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #32 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #33 0x55e764152aef in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x147aef) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #34 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #35 0x7f845a16bb9c in jax::PmapFunction::Call(pybind11::handle, _object* const*, unsigned long, _object*) (/usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so+0xb2ab9c) (BuildId: 87a9c5e3db2565f8631e59fe5e690269)
    #36 0x7f845a16c39a in JaxPmapFunction_tp_vectorcall (/usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so+0xb2b39a) (BuildId: 87a9c5e3db2565f8631e59fe5e690269)
    #37 0x55e764150784 in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x145784) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #38 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #39 0x55e764150784 in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x145784) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #40 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #41 0x55e7641508ca in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x1458ca) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #42 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #43 0x55e7641508ca in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x1458ca) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #44 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #45 0x55e7641768e1 in PyObject_Call (/usr/bin/python3.10+0x16b8e1) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #46 0x55e764152aef in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x147aef) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #47 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #48 0x55e7641508ca in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x1458ca) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #49 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #50 0x55e764151ade in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x146ade) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #51 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #52 0x55e764151ade in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x146ade) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #53 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #54 0x55e764151ade in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x146ade) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #55 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #56 0x55e764150784 in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x145784) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #57 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #58 0x55e764150784 in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x145784) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #59 0x55e7641681eb in _PyFunction_Vectorcall (/usr/bin/python3.10+0x15d1eb) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #60 0x55e764151ade in _PyEval_EvalFrameDefault (/usr/bin/python3.10+0x146ade) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #61 0x55e76414ced5  (/usr/bin/python3.10+0x141ed5) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #62 0x55e764243365 in PyEval_EvalCode (/usr/bin/python3.10+0x238365) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #63 0x55e764270107  (/usr/bin/python3.10+0x265107) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #64 0x55e764268f5a  (/usr/bin/python3.10+0x25df5a) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #65 0x55e76426fe54  (/usr/bin/python3.10+0x264e54) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #66 0x55e76426f337 in _PyRun_SimpleFileObject (/usr/bin/python3.10+0x264337) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #67 0x55e76426f032 in _PyRun_AnyFileObject (/usr/bin/python3.10+0x264032) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #68 0x55e7642602dd in Py_RunMain (/usr/bin/python3.10+0x2552dd) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #69 0x55e76423632c in Py_BytesMain (/usr/bin/python3.10+0x22b32c) (BuildId: 148e086667839ef13939196984d6f717c331bd76)
    #70 0x7f8463f9ad8f  (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
    #71 0x7f8463f9ae3f in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
    #72 0x55e764236224 in _start (/usr/bin/python3.10+0x22b224) (BuildId: 148e086667839ef13939196984d6f717c331bd76)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libc.so.6+0x96a7c) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d) in pthread_kill
==54208==ABORTING

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment