Created
June 2, 2023 22:36
-
-
Save trevor-m/37fcb9ed26557cc221dfa80ac1e961a7 to your computer and use it in GitHub Desktop.
PAXML + PJRT Segfault with LOGGING_ENABLED
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2023-06-02 22:27:23.476734: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu | |
2023-06-02 22:27:30.192610: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu | |
2023-06-02 22:27:30.192664: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu | |
2023-06-02 22:27:30.192699: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu | |
2023-06-02 22:27:30.192733: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu | |
2023-06-02 22:27:30.218344: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu | |
2023-06-02 22:27:30.218504: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. | |
Skipping registering GPU devices... | |
[IREE-PJRT] DEBUG: Using IREE compiler binary: /usr/local/lib/python3.10/dist-packages/iree/compiler/_mlir_libs/libIREECompiler.so | |
[IREE-PJRT] DEBUG: Compiler Version: 20230601.538 @ 77ac727c7f6cbe32ca8972b8c69bd32cba17b690 (API version 1.2) | |
[IREE-PJRT] DEBUG: Using partitioner binary: /workspace/openxla-pjrt-plugin/bazel-bin/partitioner/libOpenXLAPartitioner.so | |
[IREE-PJRT] DEBUG: Partitioner version: <unknown> (API version 1.1) | |
[IREE-PJRT] DEBUG: CUDA driver created | |
I0602 22:27:30.226667 140467571585024 setup_jax.py:74] JAX process: 0 / 1 | |
I0602 22:27:30.226812 140467571585024 setup_jax.py:75] JAX devices: [GPU-b0fbccec-7593-9c0c-35de-cbfc04b9d09a] | |
I0602 22:27:30.227033 140467571585024 setup_jax.py:76] jax.device_count(): 1 | |
I0602 22:27:30.227144 140467571585024 setup_jax.py:77] jax.local_device_count(): 1 | |
I0602 22:27:30.227181 140467571585024 setup_jax.py:78] jax.process_count(): 1 | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LargeMlp` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.SmallMlp` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudTransformerAdam` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudTransformerAdamTest` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudTransformerAdamLimitSteps` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdTest` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd2B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd2BLimitSteps` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd32B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd64B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd128B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd256B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd512B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmd1024B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipeline9B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipeline175B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdMultislice2B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipelineMultislice2B` | |
Registered experiment `paxml.tasks.lm.params.lm_cloud.LmCloudSpmdPipelineMultislice2BCircular` | |
Registered experiment `paxml.tasks.lm.params.c4.LmCloudSpmdAdam` | |
Registered experiment `paxml.tasks.lm.params.c4.LmCloudSpmdAdamLimitSteps` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdAdam` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdGpt3AdamOrgHPBS1p5k1536Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineAdam` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamOrgHPBS1p5k768Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS1p5k768Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS2k512Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS3k768Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS4k1024Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3AdamMLPerfHPBS8k1024Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd1BAdam4Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd1BAdam4ReplicasLimitSteps` | |
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd2BAdam4Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd16BAdam32Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4Spmd32BAdam64Replicas` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdGpt3L16AdamOrgHP` | |
Registered experiment `paxml.tasks.lm.params.c4.C4SpmdPipelineGpt3SmallAdam8Replicas` | |
W0602 22:27:31.231858 140467571585024 gpu_fast_attention.py:41] jax_triton not found, please `pip install jax-triton` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA1_3B` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA5B` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA8_3B` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA10B` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA40BProxy` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA70BProxy` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA116BProxy` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA175BProxy` | |
Registered experiment `tasks.lm.params.nvidia.TestSmallConfig` | |
Registered experiment `tasks.lm.params.nvidia.NVIDIA1_3BPmap` | |
I0602 22:27:31.236617 140467571585024 local.py:45] Setting task status: process_index: 0, process_count: 1 | |
I0602 22:27:31.236836 140467571585024 local.py:50] Created artifact job_log_dir of type ArtifactType.DIRECTORY and value log_NVIDIA1_3BPmap. | |
I0602 22:27:31.590877 140467571585024 local.py:45] Setting task status: Train experiment tasks.lm.params.nvidia.NVIDIA1_3BPmap at log_NVIDIA1_3BPmap | |
I0602 22:27:31.590993 140467571585024 train.py:139] [PAX STATUS] Starting `train_and_evaluate` | |
I0602 22:27:31.698075 140467571585024 train.py:146] [PAX STATUS] Obtaining and initializing datasets. | |
I0602 22:27:31.699886 140467571585024 train.py:162] [PAX STATUS]: Done initializing dataset objects | |
I0602 22:27:31.699933 140467571585024 train.py:164] train_input_p: | |
I0602 22:27:31.700371 140467571585024 train.py:168] allow_fixed_file_random_seed : False | |
I0602 22:27:31.700420 140467571585024 train.py:168] batch_padding_size : 0 | |
I0602 22:27:31.700451 140467571585024 train.py:168] batch_size : NoneType | |
I0602 22:27:31.700481 140467571585024 train.py:168] cls : type/praxis.base_input/LingvoInputAdaptor | |
I0602 22:27:31.700511 140467571585024 train.py:168] cluster_do_eval : False | |
I0602 22:27:31.700541 140467571585024 train.py:168] custom_device_order : NoneType | |
I0602 22:27:31.700570 140467571585024 train.py:168] eval_loop_num_batches : 1 | |
I0602 22:27:31.700599 140467571585024 train.py:168] experimental_remote_input : False | |
I0602 22:27:31.700628 140467571585024 train.py:168] infeed_host_index : 0 | |
I0602 22:27:31.700657 140467571585024 train.py:168] input.activation_split_dims_mapping : NoneType | |
I0602 22:27:31.700687 140467571585024 train.py:168] input.add_name_to_theta : False | |
I0602 22:27:31.700716 140467571585024 train.py:168] input.allow_implicit_capture : NoneType | |
I0602 22:27:31.700745 140467571585024 train.py:168] input.batch_size : 1 | |
I0602 22:27:31.700774 140467571585024 train.py:168] input.cls : type/paxml.tasks.lm.input_generator/SyntheticLmData | |
I0602 22:27:31.700803 140467571585024 train.py:168] input.decoder_samples_per_summary : NoneType | |
I0602 22:27:31.700833 140467571585024 train.py:168] input.device_mesh : NoneType | |
I0602 22:27:31.700862 140467571585024 train.py:168] input.dtype : float32 | |
I0602 22:27:31.700891 140467571585024 train.py:168] input.eval_samples_per_summary : NoneType | |
I0602 22:27:31.700920 140467571585024 train.py:168] input.file_datasource : NoneType | |
I0602 22:27:31.700949 140467571585024 train.py:168] input.filter_sparse_tensors : False | |
I0602 22:27:31.700978 140467571585024 train.py:168] input.fprop_dtype : NoneType | |
I0602 22:27:31.701008 140467571585024 train.py:168] input.inference_driver_name : NoneType | |
I0602 22:27:31.701036 140467571585024 train.py:168] input.input_stats_summary_interval_steps : 10 | |
I0602 22:27:31.701066 140467571585024 train.py:168] input.is_inference : NoneType | |
I0602 22:27:31.701095 140467571585024 train.py:168] input.name : 'input' | |
I0602 22:27:31.701132 140467571585024 train.py:168] input.num_partitions : NoneType | |
I0602 22:27:31.701162 140467571585024 train.py:168] input.num_samples : 0 | |
I0602 22:27:31.701192 140467571585024 train.py:168] input.outfeed_in_logical_order : False | |
I0602 22:27:31.701222 140467571585024 train.py:168] input.params_init.custom_v_init : NoneType | |
I0602 22:27:31.701251 140467571585024 train.py:168] input.params_init.method : 'xavier' | |
I0602 22:27:31.701285 140467571585024 train.py:168] input.params_init.scale : 1.000001 | |
I0602 22:27:31.701314 140467571585024 train.py:168] input.params_init.seed : NoneType | |
I0602 22:27:31.701343 140467571585024 train.py:168] input.random_seed : NoneType | |
I0602 22:27:31.701372 140467571585024 train.py:168] input.remote.max_inflights_per_target : 32 | |
I0602 22:27:31.701400 140467571585024 train.py:168] input.resettable : False | |
I0602 22:27:31.701430 140467571585024 train.py:168] input.seq_len : 2048 | |
I0602 22:27:31.701459 140467571585024 train.py:168] input.skip_lp_regularization : NoneType | |
I0602 22:27:31.701488 140467571585024 train.py:168] input.tpu_embedding_mode : 'train' | |
I0602 22:27:31.701517 140467571585024 train.py:168] input.tpu_infeed_parallelism : 1 | |
I0602 22:27:31.701546 140467571585024 train.py:168] input.use_partitioned_infeed_queue : False | |
I0602 22:27:31.701575 140467571585024 train.py:168] input.use_per_core_infeed : False | |
I0602 22:27:31.701603 140467571585024 train.py:168] input.use_per_host_infeed : False | |
I0602 22:27:31.701632 140467571585024 train.py:168] input.vn.deterministic : NoneType | |
I0602 22:27:31.701662 140467571585024 train.py:168] input.vn.global_vn : False | |
I0602 22:27:31.701690 140467571585024 train.py:168] input.vn.per_step_vn : False | |
I0602 22:27:31.701719 140467571585024 train.py:168] input.vn.scale : NoneType | |
I0602 22:27:31.701749 140467571585024 train.py:168] input.vn.seed : NoneType | |
I0602 22:27:31.701778 140467571585024 train.py:168] input.vn.start_step : 0 | |
I0602 22:27:31.701807 140467571585024 train.py:168] input.weight_split_dims_mapping : NoneType | |
I0602 22:27:31.701836 140467571585024 train.py:168] input_checkpointing_enabled : False | |
I0602 22:27:31.701865 140467571585024 train.py:168] input_random_seed : NoneType | |
I0602 22:27:31.701894 140467571585024 train.py:168] is_training : True | |
I0602 22:27:31.701923 140467571585024 train.py:168] name : '' | |
I0602 22:27:31.701952 140467571585024 train.py:168] num_batches : NoneType | |
I0602 22:27:31.701982 140467571585024 train.py:168] num_infeed_hosts : 0 | |
I0602 22:27:31.702010 140467571585024 train.py:168] reset_for_eval : False | |
I0602 22:27:31.702039 140467571585024 train.py:168] tf_data_service_address : NoneType | |
I0602 22:27:31.702069 140467571585024 train.py:169] task_p: | |
I0602 22:27:31.710068 140467571585024 train.py:171] cls : type/paxml.tasks_lib/SingleTask | |
I0602 22:27:31.710117 140467571585024 train.py:171] decode.cls : type/paxml.tasks_lib/SingleTask.Decode | |
I0602 22:27:31.710151 140467571585024 train.py:171] decode.prng_key_fold_with_batch_index : False | |
I0602 22:27:31.710183 140467571585024 train.py:171] decode.prng_key_fold_with_global_step : True | |
I0602 22:27:31.710213 140467571585024 train.py:171] decode.random_seed : 1234 | |
I0602 22:27:31.710242 140467571585024 train.py:171] early_stopping_fn : NoneType | |
I0602 22:27:31.710272 140467571585024 train.py:171] evaluate.apply_mutable_list : ['aux_loss', 'summaries', 'non_trainable'] | |
I0602 22:27:31.710302 140467571585024 train.py:171] evaluate.cls : type/paxml.tasks_lib/SingleTask.Evaluate | |
I0602 22:27:31.710331 140467571585024 train.py:171] evaluate.random_seed : 1234 | |
I0602 22:27:31.710360 140467571585024 train.py:171] infer.cls : type/paxml.tasks_lib/SingleTask.Infer | |
I0602 22:27:31.710390 140467571585024 train.py:171] infer.random_seed : 1234 | |
I0602 22:27:31.710419 140467571585024 train.py:171] infer_writer : NoneType | |
I0602 22:27:31.710449 140467571585024 train.py:171] loss_aggregator : NoneType | |
I0602 22:27:31.710482 140467571585024 train.py:171] metrics : NoneType | |
I0602 22:27:31.710511 140467571585024 train.py:171] model.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.710540 140467571585024 train.py:171] model.apply_eval_sample_weights : False | |
I0602 22:27:31.710570 140467571585024 train.py:171] model.cls : type/praxis.layers.models/LanguageModel | |
I0602 22:27:31.710598 140467571585024 train.py:171] model.contiguous_submeshes : NoneType | |
I0602 22:27:31.710628 140467571585024 train.py:171] model.count_tokens : False | |
I0602 22:27:31.710657 140467571585024 train.py:171] model.dcn_mesh_shape : NoneType | |
I0602 22:27:31.710686 140467571585024 train.py:171] model.decoder_tpl.cls : type/praxis.decoder_hparams/GreedyDecoderHParams | |
I0602 22:27:31.710716 140467571585024 train.py:171] model.decoder_tpl.decode_loop_mesh_axes_transpose : NoneType | |
I0602 22:27:31.710744 140467571585024 train.py:171] model.decoder_tpl.emb_lookup_style : 'matmul' | |
I0602 22:27:31.710774 140467571585024 train.py:171] model.decoder_tpl.eos_id : 2 | |
I0602 22:27:31.710803 140467571585024 train.py:171] model.decoder_tpl.fprop_for_prefix : False | |
I0602 22:27:31.710831 140467571585024 train.py:171] model.decoder_tpl.lazy_prefix_broadcast : False | |
I0602 22:27:31.710860 140467571585024 train.py:171] model.decoder_tpl.max_decode_steps : NoneType | |
I0602 22:27:31.710889 140467571585024 train.py:171] model.decoder_tpl.min_prefix_len : 5 | |
I0602 22:27:31.710919 140467571585024 train.py:171] model.decoder_tpl.process_result_fn : NoneType | |
I0602 22:27:31.710948 140467571585024 train.py:171] model.decoder_tpl.seqlen : 0 | |
I0602 22:27:31.710977 140467571585024 train.py:171] model.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.711006 140467571585024 train.py:171] model.fprop_dtype : dtype[float32] | |
I0602 22:27:31.711036 140467571585024 train.py:171] model.ici_mesh_shape : NoneType | |
I0602 22:27:31.711064 140467571585024 train.py:171] model.lm_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.711093 140467571585024 train.py:171] model.lm_tpl.cls : type/praxis.layers.transformer_models/TransformerLm | |
I0602 22:27:31.711122 140467571585024 train.py:171] model.lm_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.711152 140467571585024 train.py:171] model.lm_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.711181 140467571585024 train.py:171] model.lm_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.711210 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.711239 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm | |
I0602 22:27:31.711268 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.711297 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.711326 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.dim : 0 | |
I0602 22:27:31.711355 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.711384 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.epsilon : 1e-06 | |
I0602 22:27:31.711413 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.711442 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.711472 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.711501 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.name : NoneType | |
I0602 22:27:31.711530 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.711559 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.711588 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.711617 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.reductions_in_fp32 : False | |
I0602 22:27:31.711647 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.711676 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.711705 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.use_bias : True | |
I0602 22:27:31.711734 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.use_scale : True | |
I0602 22:27:31.711763 140467571585024 train.py:171] model.lm_tpl.final_ln_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.711792 140467571585024 train.py:171] model.lm_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.711822 140467571585024 train.py:171] model.lm_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.711851 140467571585024 train.py:171] model.lm_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.711880 140467571585024 train.py:171] model.lm_tpl.model_dims : 2048 | |
I0602 22:27:31.711910 140467571585024 train.py:171] model.lm_tpl.model_type : 'causal' | |
I0602 22:27:31.711939 140467571585024 train.py:171] model.lm_tpl.name : NoneType | |
I0602 22:27:31.711968 140467571585024 train.py:171] model.lm_tpl.ngrammer_tpl : NoneType | |
I0602 22:27:31.711998 140467571585024 train.py:171] model.lm_tpl.packed_input : True | |
I0602 22:27:31.712027 140467571585024 train.py:171] model.lm_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.712056 140467571585024 train.py:171] model.lm_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.712085 140467571585024 train.py:171] model.lm_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.712115 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.activation_split_dims_mapping.emb_out_split_dims_mapping : NoneType | |
I0602 22:27:31.712143 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.712172 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.712201 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.cls : type/praxis.layers.base_ops/ArrayLookup | |
I0602 22:27:31.712230 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.712260 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.712288 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.712317 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.712346 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.712375 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.712404 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.name : NoneType | |
I0602 22:27:31.712433 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.712462 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.712491 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.712520 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.712549 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.712578 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.array_lookup_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.712606 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.cls : type/praxis.layers.embedding_softmax/TrainablePositionalEmbedding | |
I0602 22:27:31.712636 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.712666 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.712696 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.712725 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.712754 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.712783 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.712812 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.712841 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.712870 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.712898 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.712927 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.712956 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.name : NoneType | |
I0602 22:27:31.712984 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.713014 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.713042 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.713071 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.713106 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.713136 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.713165 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.embedding_dims : 0 | |
I0602 22:27:31.713195 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.713224 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.713253 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.lookup_style : 'matmul' | |
I0602 22:27:31.713282 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.max_seq_length : 2048 | |
I0602 22:27:31.713311 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.max_timescale : 10000 | |
I0602 22:27:31.713340 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.713370 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.min_timescale : 1 | |
I0602 22:27:31.713398 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.name : NoneType | |
I0602 22:27:31.713428 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.713457 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.713486 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.713515 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.713544 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.713574 140467571585024 train.py:171] model.lm_tpl.position_emb_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.713603 140467571585024 train.py:171] model.lm_tpl.post_attention_ngrammer_tpls : NoneType | |
I0602 22:27:31.713632 140467571585024 train.py:171] model.lm_tpl.record_activations_in_xent_output : False | |
I0602 22:27:31.713661 140467571585024 train.py:171] model.lm_tpl.separate_embedding_tpl : NoneType | |
I0602 22:27:31.713690 140467571585024 train.py:171] model.lm_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.713720 140467571585024 train.py:171] model.lm_tpl.skip_aux_loss : False | |
I0602 22:27:31.713748 140467571585024 train.py:171] model.lm_tpl.skip_compute_loss : False | |
I0602 22:27:31.713778 140467571585024 train.py:171] model.lm_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.713807 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.activation_split_dims_mapping.emb_out_split_dims_mapping : NoneType | |
I0602 22:27:31.713836 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.713865 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.713894 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.cls : type/praxis.layers.base_ops/ArrayLookup | |
I0602 22:27:31.713923 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.713952 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.713980 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.714009 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.714038 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.714067 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.714096 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.name : NoneType | |
I0602 22:27:31.714125 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.714154 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.714183 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.714211 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.714240 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.714269 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.array_lookup_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.714297 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.bi_tempered_loss_tpl : NoneType | |
I0602 22:27:31.714326 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.bias_init : 0.0 | |
I0602 22:27:31.714355 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.cls : type/praxis.layers.embedding_softmax/SharedEmbeddingSoftmax | |
I0602 22:27:31.714384 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.714413 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.714442 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.714471 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.714500 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.714529 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.714558 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.714587 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.714616 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.714645 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.714674 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.714703 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.name : NoneType | |
I0602 22:27:31.714732 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.714761 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.714790 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.714819 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.714848 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.714877 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.714906 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.714936 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.714965 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.cls : type/praxis.layers.activations/ReLU | |
I0602 22:27:31.714994 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.715023 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.715052 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.715080 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.715109 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.715137 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.715166 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.name : NoneType | |
I0602 22:27:31.715195 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.715224 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.715252 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.715281 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.715310 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.715339 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.715367 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.bias_init : 0.0 | |
I0602 22:27:31.715396 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.cls : type/praxis.layers.linears/FeedForward | |
I0602 22:27:31.715424 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.715453 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.715481 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.715510 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.715539 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.has_bias : True | |
I0602 22:27:31.715568 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.715597 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.input_dims : 0 | |
I0602 22:27:31.715626 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.715655 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.cls : type/praxis.layers.linears/Linear | |
I0602 22:27:31.715683 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.715712 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.715740 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.715769 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.715798 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.715827 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.715856 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.715885 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.715914 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.715942 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.715970 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.715999 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.name : NoneType | |
I0602 22:27:31.716028 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.716056 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.716085 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.716114 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.716143 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.716171 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.716200 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.716228 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.716256 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.input_dims : 0 | |
I0602 22:27:31.716284 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.716312 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.name : NoneType | |
I0602 22:27:31.716341 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.output_dims : 0 | |
I0602 22:27:31.716369 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.716398 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.716427 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.716455 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.716484 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.716512 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.weight_init : NoneType | |
I0602 22:27:31.716540 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.linear_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.716568 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.716597 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.name : NoneType | |
I0602 22:27:31.716625 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.output_dims : 0 | |
I0602 22:27:31.716654 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.716682 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.716711 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.716740 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.716768 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.716797 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.weight_init : NoneType | |
I0602 22:27:31.716825 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.feed_forward_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.716854 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.716882 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.716911 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.input_dims : 0 | |
I0602 22:27:31.716940 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.label_smoothing_apply_for_eval : True | |
I0602 22:27:31.716969 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.label_smoothing_prob : 0.0 | |
I0602 22:27:31.716998 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.lookup_style : 'index' | |
I0602 22:27:31.717026 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.717056 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.name : NoneType | |
I0602 22:27:31.717084 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.num_classes : 0 | |
I0602 22:27:31.717129 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.params_init.method : 'gaussian' | |
I0602 22:27:31.717163 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.params_init.scale : 0.022097086912079608 | |
I0602 22:27:31.717195 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.scale_sqrt_depth : True | |
I0602 22:27:31.717224 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.717253 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.717282 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.soft_cap_logits : 30.0 | |
I0602 22:27:31.717311 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.717340 140467571585024 train.py:171] model.lm_tpl.softmax_tpl.z_loss_weight : 0.0 | |
I0602 22:27:31.717369 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.717397 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.717426 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.atten_dropout_prob : NoneType | |
I0602 22:27:31.717455 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.cls : type/praxis.layers.transformers/StackedTransformer | |
I0602 22:27:31.717484 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.contiguous_submeshes : NoneType | |
I0602 22:27:31.717512 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dcn_mesh_shape : NoneType | |
I0602 22:27:31.717541 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dim_per_head : 64 | |
I0602 22:27:31.717570 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dropout_prob : 0.0 | |
I0602 22:27:31.717600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.717628 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.fold_padding_with_segment_mask : False | |
I0602 22:27:31.717657 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.fprop_dtype : NoneType | |
I0602 22:27:31.717686 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.gating_func : 'top2' | |
I0602 22:27:31.717715 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.hidden_dims : 8192 | |
I0602 22:27:31.717744 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.ici_mesh_shape : NoneType | |
I0602 22:27:31.717772 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.input_dropout_prob : 0.0 | |
I0602 22:27:31.717801 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.mask_self_attention : False | |
I0602 22:27:31.717830 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.mesh_axis_names : NoneType | |
I0602 22:27:31.717860 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.min_group_size : NoneType | |
I0602 22:27:31.717889 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.model_dims : 2048 | |
I0602 22:27:31.717918 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.egch : NoneType | |
I0602 22:27:31.717947 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.egcm : NoneType | |
I0602 22:27:31.717976 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gec : NoneType | |
I0602 22:27:31.718006 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gecm : NoneType | |
I0602 22:27:31.718035 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gecs : NoneType | |
I0602 22:27:31.718063 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gs : NoneType | |
I0602 22:27:31.718092 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gsec : NoneType | |
I0602 22:27:31.718121 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.gsm : NoneType | |
I0602 22:27:31.718151 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.718180 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.718209 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.cls : type/praxis.layers.activations/ReLU | |
I0602 22:27:31.718238 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.718267 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.718296 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.718325 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.718354 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.718383 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.718412 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.name : NoneType | |
I0602 22:27:31.718441 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.718470 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.718499 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.718529 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.718558 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.718587 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.718616 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.add_skip_connection : True | |
I0602 22:27:31.718645 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.apply_padding_first : False | |
I0602 22:27:31.718673 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.cls : type/praxis.layers.transformers/TransformerFeedForwardMoe | |
I0602 22:27:31.718702 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.718730 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.718759 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.718787 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.expert_capacity_dim : 0 | |
I0602 22:27:31.718815 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.expert_weight_shards : 1 | |
I0602 22:27:31.718844 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.explicit_fan_in_fan_out_axes : False | |
I0602 22:27:31.718872 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.718901 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.gating_func : 'top2' | |
I0602 22:27:31.718930 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.gating_logit_cap : 0.0 | |
I0602 22:27:31.718958 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.hidden_dims : 0 | |
I0602 22:27:31.718987 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.719016 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.input_dims : 0 | |
I0602 22:27:31.719044 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.internal_gshard_variance_scaling_fan_in_init : True | |
I0602 22:27:31.719073 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.719101 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm | |
I0602 22:27:31.719131 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.719160 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.719189 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.dim : 0 | |
I0602 22:27:31.719217 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.719246 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.epsilon : 1e-06 | |
I0602 22:27:31.719275 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.719304 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.719333 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.719362 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.name : NoneType | |
I0602 22:27:31.719390 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.719419 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.719448 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.719476 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.reductions_in_fp32 : False | |
I0602 22:27:31.719505 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.719533 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.719562 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.use_bias : True | |
I0602 22:27:31.719591 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.use_scale : True | |
I0602 22:27:31.719619 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.ln_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.719648 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.719677 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.min_group_size : NoneType | |
I0602 22:27:31.719705 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.moe_gating_embedding_level : 'token' | |
I0602 22:27:31.719734 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.moe_load_balance_loss_weight : 1.0 | |
I0602 22:27:31.719763 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.name : NoneType | |
I0602 22:27:31.719791 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.norm_policy : 'pre' | |
I0602 22:27:31.719819 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.num_experts : 0 | |
I0602 22:27:31.719848 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.num_groups : 0 | |
I0602 22:27:31.719876 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.719905 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.719933 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.719962 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_prob : 0.0 | |
I0602 22:27:31.719990 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.720020 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout | |
I0602 22:27:31.720048 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.720077 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.720107 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.dropout_at_eval : False | |
I0602 22:27:31.720135 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.720164 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.720193 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.720222 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.keep_prob : 1.0 | |
I0602 22:27:31.720250 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.720278 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.name : NoneType | |
I0602 22:27:31.720307 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.noise_shape : NoneType | |
I0602 22:27:31.720336 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.noise_shape_broadcast_dims : NoneType | |
I0602 22:27:31.720365 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.720393 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.720422 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.720451 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.720480 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.720509 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.transpose_qk : False | |
I0602 22:27:31.720537 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.relu_dropout_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.720566 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_prob : 0.0 | |
I0602 22:27:31.720595 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.720623 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout | |
I0602 22:27:31.720652 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.720681 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.720710 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.dropout_at_eval : False | |
I0602 22:27:31.720739 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.720767 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.720796 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.720825 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.keep_prob : 1.0 | |
I0602 22:27:31.720854 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.720882 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.name : NoneType | |
I0602 22:27:31.720911 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.noise_shape : NoneType | |
I0602 22:27:31.720940 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType | |
I0602 22:27:31.720968 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.720997 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.721026 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.721055 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.721084 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.721122 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.transpose_qk : False | |
I0602 22:27:31.721153 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_dropout_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.721183 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_droppath_prob : 0.0 | |
I0602 22:27:31.721212 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.residual_weight : 1.0 | |
I0602 22:27:31.721240 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.second_expert_policy : 'all' | |
I0602 22:27:31.721269 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.721297 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.721326 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.unadjusted_expert_capacity_factor : 2.0 | |
I0602 22:27:31.721354 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.use_gated_activation : False | |
I0602 22:27:31.721383 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.ehm : NoneType | |
I0602 22:27:31.721411 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.emh : NoneType | |
I0602 22:27:31.721440 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.me : NoneType | |
I0602 22:27:31.721469 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.moe_layer_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.721497 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.name : NoneType | |
I0602 22:27:31.721526 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.ngrammer_tpls : NoneType | |
I0602 22:27:31.721555 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_experts : 0 | |
I0602 22:27:31.721583 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_groups : 1 | |
I0602 22:27:31.721612 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_heads : 32 | |
I0602 22:27:31.721641 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.num_layers : 1 | |
I0602 22:27:31.721670 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.packed_input : False | |
I0602 22:27:31.721699 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.721728 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.params_init.method : 'xavier' | |
I0602 22:27:31.721757 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.params_init.scale : 1.000001 | |
I0602 22:27:31.721785 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.relu_dropout_prob : NoneType | |
I0602 22:27:31.721814 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.residual_dropout_prob : NoneType | |
I0602 22:27:31.721843 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.residual_droppath_prob : 0.0 | |
I0602 22:27:31.721871 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.shared_weight_layer_id : NoneType | |
I0602 22:27:31.721899 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.skip_lp_regularization : NoneType | |
I0602 22:27:31.721928 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.721957 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.allow_skip_cross_attention : False | |
I0602 22:27:31.721986 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.atten_dropout_prob : 0.0 | |
I0602 22:27:31.722014 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.cls : type/praxis.layers.transformers/Transformer | |
I0602 22:27:31.722043 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.722072 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.cross_atten_tpl : NoneType | |
I0602 22:27:31.722101 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.722130 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dim_per_head : NoneType | |
I0602 22:27:31.722159 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.722187 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.cls : type/praxis.layers.stochastics/Dropout | |
I0602 22:27:31.722218 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.722252 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.722280 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.dropout_at_eval : False | |
I0602 22:27:31.722309 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.722339 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.722368 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.722396 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.keep_prob : 1.0 | |
I0602 22:27:31.722425 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.722454 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.name : NoneType | |
I0602 22:27:31.722483 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.noise_shape : NoneType | |
I0602 22:27:31.722512 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.noise_shape_broadcast_dims : NoneType | |
I0602 22:27:31.722542 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.722571 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.722600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.722629 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.722658 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.722687 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.transpose_qk : False | |
I0602 22:27:31.722716 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dropout_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.722745 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.722774 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.722803 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.hidden_dims : 0 | |
I0602 22:27:31.722832 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.722865 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.input_dims : 0 | |
I0602 22:27:31.722895 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.722926 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm | |
I0602 22:27:31.722954 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.722983 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.723013 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.dim : 0 | |
I0602 22:27:31.723042 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.723070 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.epsilon : 1e-06 | |
I0602 22:27:31.723100 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.723128 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.723157 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.723186 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.name : NoneType | |
I0602 22:27:31.723214 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.723243 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.723273 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.723302 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.reductions_in_fp32 : False | |
I0602 22:27:31.723331 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.723360 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.723389 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.use_bias : True | |
I0602 22:27:31.723417 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.use_scale : True | |
I0602 22:27:31.723446 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ln_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.723474 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.723503 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.name : NoneType | |
I0602 22:27:31.723531 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.ngrammer_tpl : NoneType | |
I0602 22:27:31.723560 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.norm_policy : 'pre' | |
I0602 22:27:31.723588 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.num_heads : NoneType | |
I0602 22:27:31.723617 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.packed_input : False | |
I0602 22:27:31.723646 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.723675 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.723705 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.723733 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.relu_dropout_prob : 0.0 | |
I0602 22:27:31.723762 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.residual_dropout_prob : 0.0 | |
I0602 22:27:31.723791 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.residual_droppath_prob : 0.0 | |
I0602 22:27:31.723820 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.723849 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.723878 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.activation_split_dims_mapping.bld : NoneType | |
I0602 22:27:31.723906 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.activation_split_dims_mapping.blnh : NoneType | |
I0602 22:27:31.723936 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.723965 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.atten_dropout_prob : 0.0 | |
I0602 22:27:31.723994 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.atten_logit_cap : 50.0 | |
I0602 22:27:31.724023 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.attention_extra_logit : NoneType | |
I0602 22:27:31.724051 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.attention_mask_summary : False | |
I0602 22:27:31.724080 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.cast_rotary_position_emb : True | |
I0602 22:27:31.724108 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.cls : type/praxis.layers.attentions/DotProductAttention | |
I0602 22:27:31.724137 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combine_qkv : True | |
I0602 22:27:31.724166 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.724195 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.attention_combine_dims : False | |
I0602 22:27:31.724224 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.cls : type/praxis.layers.attentions/CombinedQKVProjectionLayer | |
I0602 22:27:31.724254 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.724282 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.724311 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.dim_per_head : 0 | |
I0602 22:27:31.724340 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.724368 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.724398 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.724426 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.724455 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.724484 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.724513 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.724541 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.724570 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.724600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.name : NoneType | |
I0602 22:27:31.724628 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.724657 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.724686 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.724714 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.724743 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.724772 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.724801 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.explicit_fan_in_fan_out_axes : False | |
I0602 22:27:31.724830 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.724858 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.724887 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.input_dim : 0 | |
I0602 22:27:31.724916 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.724945 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.name : NoneType | |
I0602 22:27:31.724974 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.num_heads : 0 | |
I0602 22:27:31.725003 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.725032 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.725060 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.725089 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.725125 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.725154 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.use_bias : True | |
I0602 22:27:31.725186 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.combined_qkv_proj_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.725216 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.725245 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.725274 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dconv_kernel_size : 3 | |
I0602 22:27:31.725303 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dconv_qkv : False | |
I0602 22:27:31.725332 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.decode_cache : True | |
I0602 22:27:31.725361 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dim_per_head : NoneType | |
I0602 22:27:31.725390 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.725418 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.cls : type/praxis.layers.stochastics/Dropout | |
I0602 22:27:31.725447 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.725476 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.725505 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.dropout_at_eval : False | |
I0602 22:27:31.725535 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.725564 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.725593 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.725622 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.keep_prob : 1.0 | |
I0602 22:27:31.725651 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.725679 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.name : NoneType | |
I0602 22:27:31.725708 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.noise_shape : NoneType | |
I0602 22:27:31.725736 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.noise_shape_broadcast_dims : NoneType | |
I0602 22:27:31.725765 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.725794 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.725823 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.725851 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.725880 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.725909 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.transpose_qk : False | |
I0602 22:27:31.725938 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dropout_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.725967 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.725996 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.726024 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.hidden_dim : 0 | |
I0602 22:27:31.726053 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.726082 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.input_dim : 0 | |
I0602 22:27:31.726110 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.internal_enable_per_dim_scale : True | |
I0602 22:27:31.726139 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.internal_enable_query_scale : True | |
I0602 22:27:31.726168 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.internal_gshard_gaussian_init : False | |
I0602 22:27:31.726196 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.726225 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.name : NoneType | |
I0602 22:27:31.726253 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.ngrammer_tpl : NoneType | |
I0602 22:27:31.726282 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.num_heads : 1 | |
I0602 22:27:31.726311 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.output_proj_use_nhd_shape : False | |
I0602 22:27:31.726339 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.726368 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.726397 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.726425 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.726454 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.attention_combine_dims : False | |
I0602 22:27:31.726484 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.cls : type/praxis.layers.attentions/AttentionProjection | |
I0602 22:27:31.726512 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.726541 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.726570 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.dim_per_head : 0 | |
I0602 22:27:31.726599 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.726628 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.726658 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.726686 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.726715 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.726748 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.726781 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.726811 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.726840 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.726871 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.name : NoneType | |
I0602 22:27:31.726902 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.726932 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.726960 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.726989 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.727019 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.727047 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.727076 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.explicit_fan_in_fan_out_axes : False | |
I0602 22:27:31.727105 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.727134 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.727163 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.input_dim : 0 | |
I0602 22:27:31.727192 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.is_output_projection : False | |
I0602 22:27:31.727221 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.727249 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.name : NoneType | |
I0602 22:27:31.727277 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.num_heads : 0 | |
I0602 22:27:31.727306 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.727335 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.727364 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.727393 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.727422 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.727451 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.use_bias : True | |
I0602 22:27:31.727480 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.use_nhd_shape : False | |
I0602 22:27:31.727509 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.proj_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.727538 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.727567 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.727596 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.727624 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.727653 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.727681 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.727710 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.727739 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.727768 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.name : NoneType | |
I0602 22:27:31.727797 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.727825 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.727853 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.727881 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.727910 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.727939 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.pv_einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.727968 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.727996 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.728025 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.728054 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.728082 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.728111 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.728141 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.728170 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.728199 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.name : NoneType | |
I0602 22:27:31.728228 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.728257 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.728286 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.728314 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.728344 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.728373 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.qk_einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.728401 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.relative_bias_tpl : NoneType | |
I0602 22:27:31.728430 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.728459 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.cast_as_fprop_dtype : True | |
I0602 22:27:31.728488 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.cls : type/praxis.layers.embedding_softmax/RotaryPositionalEmbedding | |
I0602 22:27:31.728517 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.728546 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.728575 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.728603 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.embedding_dims : 0 | |
I0602 22:27:31.728632 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.728661 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.728690 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.max_timescale : 10000 | |
I0602 22:27:31.728719 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.728748 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.min_timescale : 1 | |
I0602 22:27:31.728777 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.name : NoneType | |
I0602 22:27:31.728806 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.728835 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.728864 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.728893 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.728921 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.728950 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.rotary_position_emb_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.728979 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.scale_logits_by_head_dims : False | |
I0602 22:27:31.729009 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.729038 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.729066 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.use_bias : False | |
I0602 22:27:31.729095 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.use_rotary_position_emb : False | |
I0602 22:27:31.729138 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.weight_split_dims_mapping.dconv : NoneType | |
I0602 22:27:31.729169 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.weight_split_dims_mapping.proj : NoneType | |
I0602 22:27:31.729198 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.729227 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_atten_tpl.zero_fully_masked : False | |
I0602 22:27:31.729256 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_split_dims_mapping.ffn0 : NoneType | |
I0602 22:27:31.729285 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_split_dims_mapping.ffn1 : NoneType | |
I0602 22:27:31.729314 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.729343 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.729372 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.approximate : True | |
I0602 22:27:31.729401 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.cls : type/praxis.layers.activations/GELU | |
I0602 22:27:31.729430 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.729460 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.729488 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.729518 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.729548 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.729576 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.729605 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.name : NoneType | |
I0602 22:27:31.729633 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.729662 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.729691 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.729720 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.729749 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.729778 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.729807 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.add_skip_connection : True | |
I0602 22:27:31.729836 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.apply_padding_first : False | |
I0602 22:27:31.729865 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.cls : type/praxis.layers.transformers/TransformerFeedForward | |
I0602 22:27:31.729897 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.729926 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.729955 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.729984 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.730012 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.730040 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.cls : type/praxis.layers.activations/ReLU | |
I0602 22:27:31.730069 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.730098 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.730126 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.730155 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.730184 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.730213 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.730242 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.name : NoneType | |
I0602 22:27:31.730271 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.730300 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.730328 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.730357 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.730386 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.730414 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.activation_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.730443 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.bias_init : 0.0 | |
I0602 22:27:31.730472 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.cls : type/praxis.layers.linears/FeedForward | |
I0602 22:27:31.730500 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.730529 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.730558 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.730588 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.730617 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.has_bias : True | |
I0602 22:27:31.730646 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.730674 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.input_dims : 0 | |
I0602 22:27:31.730703 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.730732 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.cls : type/praxis.layers.linears/Linear | |
I0602 22:27:31.730761 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.730790 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.730818 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.730847 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.730876 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.cls : type/praxis.layers.base_ops/Einsum | |
I0602 22:27:31.730905 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.730934 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.730963 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.730992 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.731020 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.731049 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.731077 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.name : NoneType | |
I0602 22:27:31.731106 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.731135 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.731164 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.731193 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.731221 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.731251 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.einsum_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.731279 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.731308 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.731337 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.input_dims : 0 | |
I0602 22:27:31.731366 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.731395 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.name : NoneType | |
I0602 22:27:31.731424 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.output_dims : 0 | |
I0602 22:27:31.731452 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.731481 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.731510 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.731539 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.731568 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.731596 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.weight_init : NoneType | |
I0602 22:27:31.731625 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.linear_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.731653 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.731682 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.name : NoneType | |
I0602 22:27:31.731712 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.output_dims : 0 | |
I0602 22:27:31.731740 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.731769 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.731798 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.731827 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.731856 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.731884 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.weight_init : NoneType | |
I0602 22:27:31.731913 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fflayer_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.731942 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.731971 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.has_bias : True | |
I0602 22:27:31.732000 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.hidden_dims : 0 | |
I0602 22:27:31.732029 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.732057 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.input_dims : 0 | |
I0602 22:27:31.732086 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.internal_gshard_variance_scaling_fan_in_init : False | |
I0602 22:27:31.732115 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.732144 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.cls : type/praxis.layers.normalizations/LayerNorm | |
I0602 22:27:31.732172 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.732202 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.732230 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.dim : 0 | |
I0602 22:27:31.732259 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.732288 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.epsilon : 1e-06 | |
I0602 22:27:31.732317 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.732346 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.732375 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.732403 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.name : NoneType | |
I0602 22:27:31.732432 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.732461 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.732490 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.732519 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.reductions_in_fp32 : False | |
I0602 22:27:31.732547 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.732576 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.732605 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.use_bias : True | |
I0602 22:27:31.732634 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.use_scale : True | |
I0602 22:27:31.732663 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.ln_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.732692 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.732721 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.name : NoneType | |
I0602 22:27:31.732751 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.norm_policy : 'pre' | |
I0602 22:27:31.732779 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.output_dims : 0 | |
I0602 22:27:31.732809 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.732837 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.732866 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.732895 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_prob : 0.0 | |
I0602 22:27:31.732924 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.732953 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout | |
I0602 22:27:31.732981 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.733010 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.733039 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.dropout_at_eval : False | |
I0602 22:27:31.733068 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.733098 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.733132 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.733162 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.keep_prob : 1.0 | |
I0602 22:27:31.733191 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.733219 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.name : NoneType | |
I0602 22:27:31.733248 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.noise_shape : NoneType | |
I0602 22:27:31.733276 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.noise_shape_broadcast_dims : NoneType | |
I0602 22:27:31.733305 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.733334 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.733362 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.733391 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.733420 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.733449 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.transpose_qk : False | |
I0602 22:27:31.733477 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.relu_dropout_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.733506 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_prob : 0.0 | |
I0602 22:27:31.733535 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.activation_split_dims_mapping.out : NoneType | |
I0602 22:27:31.733563 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.cls : type/praxis.layers.stochastics/Dropout | |
I0602 22:27:31.733592 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.733620 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.733649 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.dropout_at_eval : False | |
I0602 22:27:31.733679 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.733708 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.733737 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.733766 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.keep_prob : 1.0 | |
I0602 22:27:31.733795 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.733824 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.name : NoneType | |
I0602 22:27:31.733852 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape : NoneType | |
I0602 22:27:31.733881 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType | |
I0602 22:27:31.733911 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.733939 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.733968 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.733997 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.734025 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.734054 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.transpose_qk : False | |
I0602 22:27:31.734083 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_dropout_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.734111 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_droppath_prob : 0.0 | |
I0602 22:27:31.734140 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.residual_weight : 1.0 | |
I0602 22:27:31.734169 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.734198 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.734226 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.use_gated_activation : False | |
I0602 22:27:31.734255 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.weight_split_dims_mapping.ffn0 : NoneType | |
I0602 22:27:31.734284 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.weight_split_dims_mapping.ffn1 : NoneType | |
I0602 22:27:31.734313 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.tr_fflayer_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.734342 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.use_cross_attention : False | |
I0602 22:27:31.734370 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.transformer_layer_params_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.734399 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.unadjusted_expert_capacity_factor : 2.0 | |
I0602 22:27:31.734427 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.use_cross_attention : False | |
I0602 22:27:31.734457 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.block.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.734486 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.checkpoint_policy : 'save_nothing' | |
I0602 22:27:31.734514 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.cls : type/praxis.layers.transformers/StackedTransformerRepeated | |
I0602 22:27:31.734543 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.contiguous_submeshes : NoneType | |
I0602 22:27:31.734571 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.dcn_mesh_shape : NoneType | |
I0602 22:27:31.734600 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.dtype : type/jax.numpy/float32 | |
I0602 22:27:31.734629 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.fprop_dtype : NoneType | |
I0602 22:27:31.734658 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.ici_mesh_shape : NoneType | |
I0602 22:27:31.734687 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.mesh_axis_names : NoneType | |
I0602 22:27:31.734716 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.name : NoneType | |
I0602 22:27:31.734745 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.nd_prefix_shape : NoneType | |
I0602 22:27:31.734774 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.params_init.cls : type/praxis.base_layer/WeightInit | |
I0602 22:27:31.734802 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.params_init.method : 'xavier' | |
I0602 22:27:31.734831 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.params_init.scale : 1.000001 | |
I0602 22:27:31.734859 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.repeat_layer_name : 'repeat' | |
I0602 22:27:31.734888 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.repeat_optimizer_dims_mapping : NoneType | |
I0602 22:27:31.734916 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.shared_weight_layer_id : NoneType | |
I0602 22:27:31.734945 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.skip_lp_regularization : NoneType | |
I0602 22:27:31.734973 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.sublayer_name : 'sub' | |
I0602 22:27:31.735002 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.unroll_in_decode : True | |
I0602 22:27:31.735031 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.weight_split_dims_mapping.block : NoneType | |
I0602 22:27:31.735060 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.735089 140467571585024 train.py:171] model.lm_tpl.stacked_transformer_tpl.x_times : 24 | |
I0602 22:27:31.735118 140467571585024 train.py:171] model.lm_tpl.vocab_size : 51200 | |
I0602 22:27:31.735147 140467571585024 train.py:171] model.lm_tpl.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.735176 140467571585024 train.py:171] model.mesh_axis_names : NoneType | |
I0602 22:27:31.735205 140467571585024 train.py:171] model.model_type : 'causal' | |
I0602 22:27:31.735234 140467571585024 train.py:171] model.name : 'xformer_lm' | |
I0602 22:27:31.735263 140467571585024 train.py:171] model.params_init.method : 'gaussian' | |
I0602 22:27:31.735292 140467571585024 train.py:171] model.params_init.scale : 0.023 | |
I0602 22:27:31.735321 140467571585024 train.py:171] model.report_strict_acc : False | |
I0602 22:27:31.735351 140467571585024 train.py:171] model.return_predictions : False | |
I0602 22:27:31.735379 140467571585024 train.py:171] model.shared_weight_layer_id : NoneType | |
I0602 22:27:31.735408 140467571585024 train.py:171] model.skip_lp_regularization : NoneType | |
I0602 22:27:31.735437 140467571585024 train.py:171] model.weight_split_dims_mapping.wt : NoneType | |
I0602 22:27:31.735466 140467571585024 train.py:171] name : 'xformer_task' | |
I0602 22:27:31.735495 140467571585024 train.py:171] summary_verbosity : 3 | |
I0602 22:27:31.735524 140467571585024 train.py:171] train.always_use_train_for_model_init : True | |
I0602 22:27:31.735553 140467571585024 train.py:171] train.apply_mutable_list : ['aux_loss', 'summaries', 'non_trainable', 'batch_stats', 'params_axes'] | |
I0602 22:27:31.735582 140467571585024 train.py:171] train.async_summary_writing : True | |
I0602 22:27:31.735610 140467571585024 train.py:171] train.cls : type/paxml.tasks_lib/SingleTask.Train | |
I0602 22:27:31.735640 140467571585024 train.py:171] train.decode_interval_steps : NoneType | |
I0602 22:27:31.735669 140467571585024 train.py:171] train.decode_start_after_n_steps : 0 | |
I0602 22:27:31.735697 140467571585024 train.py:171] train.decode_use_ema_states : False | |
I0602 22:27:31.735726 140467571585024 train.py:171] train.device_sync_interval_steps : NoneType | |
I0602 22:27:31.735755 140467571585024 train.py:171] train.enable_input_checkpointing : False | |
I0602 22:27:31.735785 140467571585024 train.py:171] train.enforce_input_specs : False | |
I0602 22:27:31.735813 140467571585024 train.py:171] train.eval_interval_steps : 100 | |
I0602 22:27:31.735843 140467571585024 train.py:171] train.eval_skip_train : False | |
I0602 22:27:31.735871 140467571585024 train.py:171] train.eval_use_ema_states : False | |
I0602 22:27:31.735900 140467571585024 train.py:171] train.external_checkpoint_handler : NoneType | |
I0602 22:27:31.735929 140467571585024 train.py:171] train.external_checkpoint_path : NoneType | |
I0602 22:27:31.735958 140467571585024 train.py:171] train.inputs_split_mapping : NoneType | |
I0602 22:27:31.735987 140467571585024 train.py:171] train.learner.check_valid_step : True | |
I0602 22:27:31.736016 140467571585024 train.py:171] train.learner.cls : type/paxml.learners/Learner | |
I0602 22:27:31.736045 140467571585024 train.py:171] train.learner.enable_skip_step_on_gradient_anomalies : True | |
I0602 22:27:31.736074 140467571585024 train.py:171] train.learner.force_repeat_prefix_structure : False | |
I0602 22:27:31.736103 140467571585024 train.py:171] train.learner.grad_norm_individual_vars : False | |
I0602 22:27:31.736132 140467571585024 train.py:171] train.learner.grad_norm_summary : True | |
I0602 22:27:31.736161 140467571585024 train.py:171] train.learner.keep_optimizer_state_for_excluded_vars : False | |
I0602 22:27:31.736190 140467571585024 train.py:171] train.learner.loss_name : 'total_loss' | |
I0602 22:27:31.736219 140467571585024 train.py:171] train.learner.name : '' | |
I0602 22:27:31.736248 140467571585024 train.py:171] train.learner.optimizer.beta1 : 0.9 | |
I0602 22:27:31.736277 140467571585024 train.py:171] train.learner.optimizer.beta2 : 0.95 | |
I0602 22:27:31.736306 140467571585024 train.py:171] train.learner.optimizer.clip_gradient_norm_to_value : 1.0 | |
I0602 22:27:31.736335 140467571585024 train.py:171] train.learner.optimizer.clip_gradient_single_norm_to_value : 0.0 | |
I0602 22:27:31.736364 140467571585024 train.py:171] train.learner.optimizer.clip_threshold : 1.0 | |
I0602 22:27:31.736393 140467571585024 train.py:171] train.learner.optimizer.cls : type/praxis.optimizers/Adam | |
I0602 22:27:31.736422 140467571585024 train.py:171] train.learner.optimizer.decoupled_weight_decay : NoneType | |
I0602 22:27:31.736451 140467571585024 train.py:171] train.learner.optimizer.ema_decay : 0.0 | |
I0602 22:27:31.736480 140467571585024 train.py:171] train.learner.optimizer.epsilon : 1e-08 | |
I0602 22:27:31.736509 140467571585024 train.py:171] train.learner.optimizer.epsilon_root : 0.0 | |
I0602 22:27:31.736538 140467571585024 train.py:171] train.learner.optimizer.ewc_regularizer_weight : 0.0 | |
I0602 22:27:31.736567 140467571585024 train.py:171] train.learner.optimizer.ewc_weight_per_var : NoneType | |
I0602 22:27:31.736596 140467571585024 train.py:171] train.learner.optimizer.l1_regularizer_weight : NoneType | |
I0602 22:27:31.736625 140467571585024 train.py:171] train.learner.optimizer.l2_regularizer_weight : NoneType | |
I0602 22:27:31.736653 140467571585024 train.py:171] train.learner.optimizer.learning_rate : 0.0006 | |
I0602 22:27:31.736682 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.cls : type/praxis.schedules/LinearRampupCosineDecay | |
I0602 22:27:31.736711 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.decay_end : 500000 | |
I0602 22:27:31.736740 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.decay_start : 1 | |
I0602 22:27:31.736769 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.max : 1.0 | |
I0602 22:27:31.736798 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.min_ratio : 0.1 | |
I0602 22:27:31.736827 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.name : '' | |
I0602 22:27:31.736856 140467571585024 train.py:171] train.learner.optimizer.lr_schedule.warmup_steps : 0 | |
I0602 22:27:31.736885 140467571585024 train.py:171] train.learner.optimizer.maybe_inf_to_nan : True | |
I0602 22:27:31.736913 140467571585024 train.py:171] train.learner.optimizer.name : '' | |
I0602 22:27:31.736943 140467571585024 train.py:171] train.learner.optimizer.sharded_adam : True | |
I0602 22:27:31.736972 140467571585024 train.py:171] train.learner.optimizer.skip_lp_1d_vectors : False | |
I0602 22:27:31.737001 140467571585024 train.py:171] train.learner.optimizer.weight_decay : 0.001 | |
I0602 22:27:31.737030 140467571585024 train.py:171] train.learner.repeat_prefix_sep : '#' | |
I0602 22:27:31.737059 140467571585024 train.py:171] train.learner.skip_step_gradient_norm_value : 0.0 | |
I0602 22:27:31.737088 140467571585024 train.py:171] train.learner.skip_zero_gradients : NoneType | |
I0602 22:27:31.737123 140467571585024 train.py:171] train.learner.stochastic_gradient : NoneType | |
I0602 22:27:31.737153 140467571585024 train.py:171] train.learner.var_norm_summary : True | |
I0602 22:27:31.737183 140467571585024 train.py:171] train.learner.vectorize_on_repeat_prefix : True | |
I0602 22:27:31.737212 140467571585024 train.py:171] train.log_train_output_interval_steps : NoneType | |
I0602 22:27:31.737240 140467571585024 train.py:171] train.max_inflight_steps : 2 | |
I0602 22:27:31.737269 140467571585024 train.py:171] train.num_train_steps : 10000000.0 | |
I0602 22:27:31.737298 140467571585024 train.py:171] train.profiler_capture_step : NoneType | |
I0602 22:27:31.737327 140467571585024 train.py:171] train.profiler_max_num_hosts : NoneType | |
I0602 22:27:31.737356 140467571585024 train.py:171] train.profiler_min_duration_sec : 1 | |
I0602 22:27:31.737385 140467571585024 train.py:171] train.profiler_num_steps : 2 | |
I0602 22:27:31.737413 140467571585024 train.py:171] train.random_seed : 1234 | |
I0602 22:27:31.737442 140467571585024 train.py:171] train.restore_transformations : NoneType | |
I0602 22:27:31.737471 140467571585024 train.py:171] train.save_interval_steps : 100000 | |
I0602 22:27:31.737500 140467571585024 train.py:171] train.save_keep_interval_duration : '12h' | |
I0602 22:27:31.737528 140467571585024 train.py:171] train.save_max_to_keep : 10 | |
I0602 22:27:31.737558 140467571585024 train.py:171] train.summary_accumulate_interval_steps : NoneType | |
I0602 22:27:31.737586 140467571585024 train.py:171] train.summary_interval_steps : 100 | |
I0602 22:27:31.737615 140467571585024 train.py:171] train.tensorstore_metadata_key : NoneType | |
I0602 22:27:31.737645 140467571585024 train.py:171] train.variable_norm_summary : True | |
I0602 22:27:31.737679 140467571585024 train.py:171] vn.cls : type/paxml.tasks_lib/SingleTask.VariationalNoise | |
I0602 22:27:31.737711 140467571585024 train.py:171] vn.vn_regex : '' | |
I0602 22:27:31.737752 140467571585024 train.py:171] vn.vn_scale : 0.0 | |
I0602 22:27:31.737785 140467571585024 train.py:171] vn.vn_start_step : 0 | |
I0602 22:27:31.737824 140467571585024 train.py:173] [PAX STATUS]: Initializing decoder | |
I0602 22:27:31.737904 140467571585024 checkpoint_creators.py:564] [PAX STATUS]: Creating checkpointer. | |
I0602 22:27:31.737980 140467571585024 py_utils.py:338] Starting sync_global_devices checkpointer:makedirs:log_NVIDIA1_3BPmap/checkpoints across 1 devices globally | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e18934820 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:0}, signal={0x556e18933900:1} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:1}, signal={0x556e18933900:2} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1894d0d0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1894d120 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1894d0d0, semaphore=0x556e18933900, value=2 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1894d120, semaphore=0x556e18933900, value=2 (OK) | |
W0602 22:27:31.808284 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0005655288696289062 sec | |
W0602 22:27:31.808686 140467571585024 dispatch.py:272] Finished tracing + transforming _psum for pjit in 0.0013377666473388672 sec | |
W0602 22:27:31.809392 140467571585024 pxla.py:1882] Compiling _psum for with global shapes and types [ShapedArray(uint32[1])]. Argument mapping: (GSPMDSharding({maximal device=0}),). | |
W0602 22:27:31.811983 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_psum) in 0.0024673938751220703 sec | |
W0602 22:27:31.880504 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_psum) in 0.06763148307800293 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1988d550 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1988d550, semaphore=0x556e189338c0, value=0 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=1, fence=0x556e191b40c0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1988d550, from_fence=0x556e1894d0d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1894d120, semaphore=0x556e189338c0, value=1 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e1988d550 {0x556e189338c0:0, 0x556e18933900:2}, signal_fence=0x556e191b40c0 {0x556e189338c0:1} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18aebc50 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16eed850 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18aebc50, semaphore=0x556e189338c0, value=1 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e16eed850, semaphore=0x556e189338c0, value=1 (OK) | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebfa38 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:1}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1894cf50, wait={0x556e18933900:2, 0x556e189338c0:1}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1894cf50, wait={0x556e189338c0:1}, signal={} (OK) | |
I0602 22:27:31.882042 140467571585024 py_utils.py:341] Finished sync_global_devices checkpointer:makedirs:log_NVIDIA1_3BPmap/checkpoints across 1 devices globally | |
I0602 22:27:31.882626 140467571585024 checkpointer.py:96] Restoring item from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata. | |
I0602 22:27:31.882973 140467571585024 checkpointer.py:98] Finished restoring checkpoint from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata. | |
I0602 22:27:31.883031 140467571585024 checkpoint_managers.py:197] Found existing checkpoint with version: 1.1, step: 0 | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19401cb0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:2}, signal={0x556e18933900:3} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:3}, signal={0x556e18933900:4} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18aede80 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e112a1af0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18aede80, semaphore=0x556e18933900, value=4 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e112a1af0, semaphore=0x556e18933900, value=4 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a43f70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a43f70, semaphore=0x556e189338c0, value=1 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=2, fence=0x556e18aebf60 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18a43f70, from_fence=0x556e18aede80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e112a1af0, semaphore=0x556e189338c0, value=2 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e18a43f70 {0x556e189338c0:1, 0x556e18933900:4}, signal_fence=0x556e18aebf60 {0x556e189338c0:2} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16eed850 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e170685a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e16eed850, semaphore=0x556e189338c0, value=2 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e189338c0, value=2 (OK) | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf318 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:2}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a48190, wait={0x556e18933900:4, 0x556e189338c0:2}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a48190, wait={0x556e189338c0:2}, signal={} (OK) | |
I0602 22:27:31.885030 140467571585024 utils.py:366] Cleaning up existing temporary directories at log_NVIDIA1_3BPmap/checkpoints. | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e18a65f90 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:4}, signal={0x556e18933900:5} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:5}, signal={0x556e18933900:6} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e170685a0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16ef1c50 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e18933900, value=6 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e16ef1c50, semaphore=0x556e18933900, value=6 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a43f70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a43f70, semaphore=0x556e189338c0, value=2 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=3, fence=0x556e18a48a90 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18a43f70, from_fence=0x556e170685a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e16ef1c50, semaphore=0x556e189338c0, value=3 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e18a43f70 {0x556e189338c0:2, 0x556e18933900:6}, signal_fence=0x556e18a48a90 {0x556e189338c0:3} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e112a3a60 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e110f47f0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e112a3a60, semaphore=0x556e189338c0, value=3 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e110f47f0, semaphore=0x556e189338c0, value=3 (OK) | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebef78 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:3}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a45050, wait={0x556e18933900:6, 0x556e189338c0:3}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18a45050, wait={0x556e189338c0:3}, signal={} (OK) | |
I0602 22:27:31.887058 140467571585024 train.py:206] [PAX STATUS]: Creating task | |
I0602 22:27:32.067810 140467571585024 train.py:217] [PAX STATUS]: Initializing partitioner | |
I0602 22:27:32.067943 140467571585024 partitioning.py:576] Using pmap for data parallelism. | |
I0602 22:27:32.067994 140467571585024 train.py:245] [PAX STATUS]: Creating executor. | |
I0602 22:27:32.068036 140467571585024 train.py:249] [PAX STATUS]: Setting up executor. | |
W0602 22:27:32.071034 140467571585024 dispatch.py:272] Finished tracing + transforming jit(convert_element_type) in 0.00023603439331054688 sec | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e16f726e0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:6}, signal={0x556e18933900:7} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:7}, signal={0x556e18933900:8} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e16eed850 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e170685a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e16eed850, semaphore=0x556e18933900, value=8 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e18933900, value=8 (OK) | |
W0602 22:27:32.072897 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003733634948730469 sec | |
W0602 22:27:32.073698 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_seed for pjit in 0.0016872882843017578 sec | |
W0602 22:27:32.074185 140467571585024 pxla.py:1882] Compiling _threefry_seed for with global shapes and types [ShapedArray(int32[])]. Argument mapping: (GSPMDSharding({replicated}),). | |
W0602 22:27:32.076742 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_seed) in 0.0024423599243164062 sec | |
W0602 22:27:32.169981 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_seed) in 0.09298300743103027 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1860c810 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1860c810, semaphore=0x556e189338c0, value=3 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=4, fence=0x556e199dd790 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1860c810, from_fence=0x556e16eed850 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e170685a0, semaphore=0x556e189338c0, value=4 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e19946ad0, f=0, wait_fence=0x556e1860c810 {0x556e189338c0:3, 0x556e18933900:8}, signal_fence=0x556e199dd790 {0x556e189338c0:4} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19869950 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190fd2e0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19869950, semaphore=0x556e189338c0, value=4 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e190fd2e0, semaphore=0x556e189338c0, value=4 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e189dabc0, wait={0x556e18933900:8, 0x556e189338c0:4}, signal={} (OK) | |
I0602 22:27:32.171376 140467571585024 partitioning.py:420] input_p.tf_data_service_address: None | |
I0602 22:27:32.171604 140467571585024 executors.py:163] [PAX STATUS]: Instantiating train input pipeline. | |
I0602 22:27:32.174570 140467571585024 executors.py:222] [PAX STATUS]: Setting up partitioner | |
I0602 22:27:32.174625 140467571585024 partitioning.py:353] [PAX STATUS]: Getting input shapes from first batch. | |
I0602 22:27:32.778908 140467571585024 local.py:50] Created artifact Input specs of type ArtifactType.FILE and value log_NVIDIA1_3BPmap/input_specs.json. | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1907d0e0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:8}, signal={0x556e18933900:9} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:9}, signal={0x556e18933900:10} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e198cfb80 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b0cc70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e198cfb80, semaphore=0x556e18933900, value=10 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b0cc70, semaphore=0x556e18933900, value=10 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19103da0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19103da0, semaphore=0x556e189338c0, value=4 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=5, fence=0x556e18b62000 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e19103da0, from_fence=0x556e198cfb80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b0cc70, semaphore=0x556e189338c0, value=5 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e19946ad0, f=0, wait_fence=0x556e19103da0 {0x556e189338c0:4, 0x556e18933900:10}, signal_fence=0x556e18b62000 {0x556e189338c0:5} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19882980 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18eaa130 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19882980, semaphore=0x556e189338c0, value=5 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18eaa130, semaphore=0x556e189338c0, value=5 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18b691b0, wait={0x556e18933900:10, 0x556e189338c0:5}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1907d0e0, wait={0x556e189338c0:5}, signal={} (OK) | |
W0602 22:27:33.229183 140467571585024 optimizers.py:1170] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update). | |
I0602 22:27:33.229286 140467571585024 optimizers.py:1173] Using sharded_adam. | |
W0602 22:27:33.229323 140467571585024 optimizers.py:580] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update). | |
W0602 22:27:33.241345 140467571585024 optimizers.py:1170] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update). | |
I0602 22:27:33.241400 140467571585024 optimizers.py:1173] Using sharded_adam. | |
W0602 22:27:33.241433 140467571585024 optimizers.py:580] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update). | |
I0602 22:27:33.253473 140467571585024 trainer_lib.py:197] post_init_model_params: log_NVIDIA1_3BPmap/post_init_model_params.txt | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19213e30 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:10}, signal={0x556e18933900:11} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:11}, signal={0x556e18933900:12} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1982ded0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a814b0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1982ded0, semaphore=0x556e18933900, value=12 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a814b0, semaphore=0x556e18933900, value=12 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b8c7a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b8c7a0, semaphore=0x556e189338c0, value=5 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=6, fence=0x556e19156ee0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18b8c7a0, from_fence=0x556e1982ded0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a814b0, semaphore=0x556e189338c0, value=6 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e19946ad0, f=0, wait_fence=0x556e18b8c7a0 {0x556e189338c0:5, 0x556e18933900:12}, signal_fence=0x556e19156ee0 {0x556e189338c0:6} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199a9390 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b769a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199a9390, semaphore=0x556e189338c0, value=6 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b769a0, semaphore=0x556e189338c0, value=6 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18b83130, wait={0x556e18933900:12, 0x556e189338c0:6}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e199b25b0, wait={0x556e189338c0:6}, signal={} (OK) | |
I0602 22:27:33.415624 140467571585024 checkpointer.py:96] Restoring item from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/state. | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e197c8fd0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:12}, signal={0x556e18933900:13} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:13}, signal={0x556e18933900:14} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b69fa0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1914ed00 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b69fa0, semaphore=0x556e18933900, value=14 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1914ed00, semaphore=0x556e18933900, value=14 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b24760 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b24760, semaphore=0x556e189338c0, value=6 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=7, fence=0x556e19976620 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18b24760, from_fence=0x556e18b69fa0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1914ed00, semaphore=0x556e189338c0, value=7 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e18b24760 {0x556e189338c0:6, 0x556e18933900:14}, signal_fence=0x556e19976620 {0x556e189338c0:7} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199b21a0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b817a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199b21a0, semaphore=0x556e189338c0, value=7 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b817a0, semaphore=0x556e189338c0, value=7 (OK) | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebe7d8 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:7}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e195ce1e0, wait={0x556e18933900:14, 0x556e189338c0:7}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e195ce1e0, wait={0x556e189338c0:7}, signal={} (OK) | |
I0602 22:27:49.403822 140467571585024 checkpointer.py:98] Finished restoring checkpoint from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/state. | |
I0602 22:27:49.404018 140467571585024 checkpointer.py:96] Restoring item from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata. | |
I0602 22:27:49.404202 140467571585024 checkpointer.py:98] Finished restoring checkpoint from log_NVIDIA1_3BPmap/checkpoints/checkpoint_0/metadata. | |
W0602 22:27:49.405730 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.0001819133758544922 sec | |
W0602 22:27:49.406473 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0014467239379882812 sec | |
W0602 22:27:49.407426 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.002722501754760742 sec | |
W0602 22:27:49.407973 140467571585024 pxla.py:1882] Compiling _threefry_split_original for with global shapes and types [ShapedArray(uint32[2])]. Argument mapping: (GSPMDSharding({replicated}),). | |
W0602 22:27:49.410572 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002830028533935547 sec | |
W0602 22:27:49.411499 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002779960632324219 sec | |
W0602 22:27:49.412356 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002703666687011719 sec | |
W0602 22:27:49.413204 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002875328063964844 sec | |
W0602 22:27:49.413828 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026679039001464844 sec | |
W0602 22:27:49.450353 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_split_original) in 0.04226422309875488 sec | |
W0602 22:27:49.958152 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_split_original) in 0.5075216293334961 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191d5a50 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e191d5a50, semaphore=0x556e189338c0, value=7 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=8, fence=0x556e18a78d00 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e191d5a50, from_fence=0x556e19869950 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e190fd2e0, semaphore=0x556e189338c0, value=8 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e1a3de240, f=0, wait_fence=0x556e191d5a50 {0x556e189338c0:7}, signal_fence=0x556e18a78d00 {0x556e189338c0:8} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a9c820 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18e42a10 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a9c820, semaphore=0x556e189338c0, value=8 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18e42a10, semaphore=0x556e189338c0, value=8 (OK) | |
W0602 22:27:49.960502 140467571585024 dispatch.py:272] Finished tracing + transforming _unstack for pjit in 0.0006079673767089844 sec | |
W0602 22:27:49.961032 140467571585024 pxla.py:1882] Compiling _unstack for with global shapes and types [ShapedArray(uint32[2,2])]. Argument mapping: (GSPMDSharding({replicated}),). | |
W0602 22:27:49.962872 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_unstack) in 0.001695394515991211 sec | |
W0602 22:27:50.023515 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_unstack) in 0.06038188934326172 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a286b60 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a286b60, semaphore=0x556e189338c0, value=8 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=9, fence=0x556e1994b6a0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a286b60, from_fence=0x556e18a9c820 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18e42a10, semaphore=0x556e189338c0, value=9 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e19ade6f0, f=0, wait_fence=0x556e1a286b60 {0x556e189338c0:8}, signal_fence=0x556e1994b6a0 {0x556e189338c0:9} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e195d2630 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a368cc0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d2630, semaphore=0x556e189338c0, value=9 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a368cc0, semaphore=0x556e189338c0, value=9 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2517e0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19ad8820 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2517e0, semaphore=0x556e189338c0, value=9 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ad8820, semaphore=0x556e189338c0, value=9 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1a440ec0, wait={0x556e189338c0:9}, signal={} (OK) | |
I0602 22:27:50.024492 140467571585024 partitioning.py:631] train state shapes: TrainState(step=(), mdl_vars={'params': {'lm': {'final_ln': {'bias': (2048,), 'scale': (2048,)}, 'position_emb': {'emb_var': (2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (51200,)}, 'linear': {'w': (2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (24, 8192)}, 'linear': {'w': (24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (24, 2048)}, 'linear': {'w': (24, 8192, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}, 'self_attention': {'combined_qkv': {'w': (24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (24, 64)}, 'post': {'w': (24, 2048, 32, 64)}}}}}}}}}, opt_states=[{'no_prefix': ({'count': ()}, {'count': ()}, {'count': (), 'm': {'params': {'lm': {'final_ln': {'bias': (2048,), 'scale': (2048,)}, 'position_emb': {'emb_var': (2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (51200,)}, 'linear': {'w': (2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': (2048,), 'scale': (2048,)}, 'position_emb': {'emb_var': (2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (51200,)}, 'linear': {'w': (2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}}, {'count': ()}), 'p#24#i-1': ({'count': (24,)}, {'count': (24,)}, {'count': (24,), 'm': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (24, 8192)}, 'linear': {'w': (24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (24, 2048)}, 'linear': {'w': (24, 8192, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}, 'self_attention': {'combined_qkv': {'w': (24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (24, 64)}, 'post': {'w': (24, 2048, 32, 64)}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (24, 8192)}, 'linear': {'w': (24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (24, 2048)}, 'linear': {'w': (24, 8192, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}}, 'layer_norm': {'bias': (24, 2048), 'scale': (24, 2048)}, 'self_attention': {'combined_qkv': {'w': (24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (24, 64)}, 'post': {'w': (24, 2048, 32, 64)}}}}}}}}}}, {'count': (24,)})}]) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e18b2b390 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:14}, signal={0x556e18933900:15} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:15}, signal={0x556e18933900:16} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191a6280 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a470c0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e191a6280, semaphore=0x556e18933900, value=16 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a470c0, semaphore=0x556e18933900, value=16 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e18b9adf0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:16}, signal={0x556e18933900:17} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:17}, signal={0x556e18933900:18} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1962ee50 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1962ea60 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1962ee50, semaphore=0x556e18933900, value=18 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1962ea60, semaphore=0x556e18933900, value=18 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1996e330 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:18}, signal={0x556e18933900:19} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:19}, signal={0x556e18933900:20} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3f5e20 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a422390 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3f5e20, semaphore=0x556e18933900, value=20 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a422390, semaphore=0x556e18933900, value=20 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=16777216, buffer=0x556e1a3d6580 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=16777216, wait={0x556e18933900:20}, signal={0x556e18933900:21} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:21}, signal={0x556e18933900:22} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e197797a0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3d6610 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e197797a0, semaphore=0x556e18933900, value=22 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3d6610, semaphore=0x556e18933900, value=22 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=204800, buffer=0x556e1a369cf0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=204800, wait={0x556e18933900:22}, signal={0x556e18933900:23} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:23}, signal={0x556e18933900:24} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2a73d0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19ffd400 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2a73d0, semaphore=0x556e18933900, value=24 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ffd400, semaphore=0x556e18933900, value=24 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=419430400, buffer=0x556e19d12af0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=419430400, wait={0x556e18933900:24}, signal={0x556e18933900:25} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:25}, signal={0x556e18933900:26} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a420280 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2afb00 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a420280, semaphore=0x556e18933900, value=26 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2afb00, semaphore=0x556e18933900, value=26 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=786432, buffer=0x556e1a2aef40 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=786432, wait={0x556e18933900:26}, signal={0x556e18933900:27} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:27}, signal={0x556e18933900:28} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a14ab90 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a14abe0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a14ab90, semaphore=0x556e18933900, value=28 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a14abe0, semaphore=0x556e18933900, value=28 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a3c77b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:28}, signal={0x556e18933900:29} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:29}, signal={0x556e18933900:30} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c07d10 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c07d60 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c07d10, semaphore=0x556e18933900, value=30 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c07d60, semaphore=0x556e18933900, value=30 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:30}, signal={0x556e18933900:31} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:31}, signal={0x556e18933900:32} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e193ef470 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19140df0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e193ef470, semaphore=0x556e18933900, value=32 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19140df0, semaphore=0x556e18933900, value=32 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a41f5c0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:32}, signal={0x556e18933900:33} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:33}, signal={0x556e18933900:34} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c04ab0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c04b00 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04ab0, semaphore=0x556e18933900, value=34 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04b00, semaphore=0x556e18933900, value=34 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:34}, signal={0x556e18933900:35} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:35}, signal={0x556e18933900:36} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18baf800 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19aded80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18baf800, semaphore=0x556e18933900, value=36 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aded80, semaphore=0x556e18933900, value=36 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:36}, signal={0x556e18933900:37} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:37}, signal={0x556e18933900:38} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1922bd90 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a41c090 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1922bd90, semaphore=0x556e18933900, value=38 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a41c090, semaphore=0x556e18933900, value=38 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:38}, signal={0x556e18933900:39} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:39}, signal={0x556e18933900:40} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3a2b90 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3a2be0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3a2b90, semaphore=0x556e18933900, value=40 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3a2be0, semaphore=0x556e18933900, value=40 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3c77b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:40}, signal={0x556e18933900:41} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:41}, signal={0x556e18933900:42} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a419dd0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191e42f0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a419dd0, semaphore=0x556e18933900, value=42 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e191e42f0, semaphore=0x556e18933900, value=42 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1207959552, buffer=0x556e19607b10 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1207959552, wait={0x556e18933900:42}, signal={0x556e18933900:43} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:43}, signal={0x556e18933900:44} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19829890 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e198298e0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19829890, semaphore=0x556e18933900, value=44 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e198298e0, semaphore=0x556e18933900, value=44 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=6144, buffer=0x556e1a3c77b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=6144, wait={0x556e18933900:44}, signal={0x556e18933900:45} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:45}, signal={0x556e18933900:46} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19f4deb0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19f4df00 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f4deb0, semaphore=0x556e18933900, value=46 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f4df00, semaphore=0x556e18933900, value=46 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=402653184, buffer=0x556e1a390de0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=402653184, wait={0x556e18933900:46}, signal={0x556e18933900:47} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:47}, signal={0x556e18933900:48} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19d0a1e0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19d0a230 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d0a1e0, semaphore=0x556e18933900, value=48 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d0a230, semaphore=0x556e18933900, value=48 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19f4df50 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:48}, signal={0x556e18933900:49} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:49}, signal={0x556e18933900:50} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199c4af0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19535260 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c4af0, semaphore=0x556e18933900, value=50 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535260, semaphore=0x556e18933900, value=50 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a419690 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:50}, signal={0x556e18933900:51} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:51}, signal={0x556e18933900:52} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b2be70 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19533540 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b2be70, semaphore=0x556e18933900, value=52 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533540, semaphore=0x556e18933900, value=52 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a419690 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:52}, signal={0x556e18933900:53} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:53}, signal={0x556e18933900:54} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1a1d0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1a220 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1a1d0, semaphore=0x556e18933900, value=54 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1a220, semaphore=0x556e18933900, value=54 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e19962980 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:54}, signal={0x556e18933900:55} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:55}, signal={0x556e18933900:56} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a151ab0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c23f90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a151ab0, semaphore=0x556e18933900, value=56 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c23f90, semaphore=0x556e18933900, value=56 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e19c24390 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:56}, signal={0x556e18933900:57} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:57}, signal={0x556e18933900:58} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b5a2f0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b5a340 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b5a2f0, semaphore=0x556e18933900, value=58 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b5a340, semaphore=0x556e18933900, value=58 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=16777216, buffer=0x556e1a397060 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=16777216, wait={0x556e18933900:58}, signal={0x556e18933900:59} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:59}, signal={0x556e18933900:60} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19533ed0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19533f20 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533ed0, semaphore=0x556e18933900, value=60 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533f20, semaphore=0x556e18933900, value=60 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=204800, buffer=0x556e18b470a0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=204800, wait={0x556e18933900:60}, signal={0x556e18933900:61} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:61}, signal={0x556e18933900:62} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c24420 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b54260 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c24420, semaphore=0x556e18933900, value=62 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b54260, semaphore=0x556e18933900, value=62 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=419430400, buffer=0x556e19c09c60 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=419430400, wait={0x556e18933900:62}, signal={0x556e18933900:63} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:63}, signal={0x556e18933900:64} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19535d90 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19535de0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535d90, semaphore=0x556e18933900, value=64 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535de0, semaphore=0x556e18933900, value=64 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1a396fc0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:64}, signal={0x556e18933900:65} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:65}, signal={0x556e18933900:66} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c09d90 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e191ecc60 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c09d90, semaphore=0x556e18933900, value=66 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e191ecc60, semaphore=0x556e18933900, value=66 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e19b76cd0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:66}, signal={0x556e18933900:67} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:67}, signal={0x556e18933900:68} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c05760 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c057b0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c05760, semaphore=0x556e18933900, value=68 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c057b0, semaphore=0x556e18933900, value=68 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=16777216, buffer=0x556e19c05800 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=16777216, wait={0x556e18933900:68}, signal={0x556e18933900:69} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:69}, signal={0x556e18933900:70} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a24d3f0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a24d440 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d3f0, semaphore=0x556e18933900, value=70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d440, semaphore=0x556e18933900, value=70 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=204800, buffer=0x556e19c05800 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=204800, wait={0x556e18933900:70}, signal={0x556e18933900:71} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:71}, signal={0x556e18933900:72} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a250bf0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a250c40 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a250bf0, semaphore=0x556e18933900, value=72 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a250c40, semaphore=0x556e18933900, value=72 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=419430400, buffer=0x556e1a2507b0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=419430400, wait={0x556e18933900:72}, signal={0x556e18933900:73} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:73}, signal={0x556e18933900:74} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199c3e30 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199c3e80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c3e30, semaphore=0x556e18933900, value=74 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c3e80, semaphore=0x556e18933900, value=74 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a2508f0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:74}, signal={0x556e18933900:75} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:75}, signal={0x556e18933900:76} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b4a060 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19aeb4a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b4a060, semaphore=0x556e18933900, value=76 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aeb4a0, semaphore=0x556e18933900, value=76 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e1a2508f0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:76}, signal={0x556e18933900:77} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:77}, signal={0x556e18933900:78} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b9fe70 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b9fec0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b9fe70, semaphore=0x556e18933900, value=78 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b9fec0, semaphore=0x556e18933900, value=78 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e18ba0130 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:78}, signal={0x556e18933900:79} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:79}, signal={0x556e18933900:80} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bbfce0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bbfd30 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bbfce0, semaphore=0x556e18933900, value=80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bbfd30, semaphore=0x556e18933900, value=80 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e19b76cd0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:80}, signal={0x556e18933900:81} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:81}, signal={0x556e18933900:82} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a253340 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a253390 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a253340, semaphore=0x556e18933900, value=82 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a253390, semaphore=0x556e18933900, value=82 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=786432, buffer=0x556e1a439960 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=786432, wait={0x556e18933900:82}, signal={0x556e18933900:83} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:83}, signal={0x556e18933900:84} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a24d930 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b473e0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d930, semaphore=0x556e18933900, value=84 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b473e0, semaphore=0x556e18933900, value=84 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a288940 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:84}, signal={0x556e18933900:85} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:85}, signal={0x556e18933900:86} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b67f70 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19b67fc0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b67f70, semaphore=0x556e18933900, value=86 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b67fc0, semaphore=0x556e18933900, value=86 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a289780 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:86}, signal={0x556e18933900:87} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:87}, signal={0x556e18933900:88} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2ae8a0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a289880 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2ae8a0, semaphore=0x556e18933900, value=88 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a289880, semaphore=0x556e18933900, value=88 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e19bc1fa0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:88}, signal={0x556e18933900:89} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:89}, signal={0x556e18933900:90} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190f07e0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190f0830 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e190f07e0, semaphore=0x556e18933900, value=90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e190f0830, semaphore=0x556e18933900, value=90 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e1a3615d0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:90}, signal={0x556e18933900:91} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:91}, signal={0x556e18933900:92} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a361100 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a361150 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a361100, semaphore=0x556e18933900, value=92 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a361150, semaphore=0x556e18933900, value=92 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19bc1dc0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:92}, signal={0x556e18933900:93} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:93}, signal={0x556e18933900:94} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bc0660 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bc06b0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc0660, semaphore=0x556e18933900, value=94 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc06b0, semaphore=0x556e18933900, value=94 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19bc0700 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:94}, signal={0x556e18933900:95} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:95}, signal={0x556e18933900:96} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19740d80 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19740dd0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19740d80, semaphore=0x556e18933900, value=96 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19740dd0, semaphore=0x556e18933900, value=96 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19740e20 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:96}, signal={0x556e18933900:97} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:97}, signal={0x556e18933900:98} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19bc0a10 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18bb3b10 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc0a10, semaphore=0x556e18933900, value=98 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bb3b10, semaphore=0x556e18933900, value=98 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1207959552, buffer=0x556e18bb4000 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1207959552, wait={0x556e18933900:98}, signal={0x556e18933900:99} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:99}, signal={0x556e18933900:100} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e195d0b40 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e195d0510 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d0b40, semaphore=0x556e18933900, value=100 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d0510, semaphore=0x556e18933900, value=100 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=6144, buffer=0x556e19740e20 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=6144, wait={0x556e18933900:100}, signal={0x556e18933900:101} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:101}, signal={0x556e18933900:102} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19955b30 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19955b80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19955b30, semaphore=0x556e18933900, value=102 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19955b80, semaphore=0x556e18933900, value=102 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=402653184, buffer=0x556e19629880 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=402653184, wait={0x556e18933900:102}, signal={0x556e18933900:103} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:103}, signal={0x556e18933900:104} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19809610 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19809660 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809610, semaphore=0x556e18933900, value=104 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809660, semaphore=0x556e18933900, value=104 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=786432, buffer=0x556e19629490 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=786432, wait={0x556e18933900:104}, signal={0x556e18933900:105} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:105}, signal={0x556e18933900:106} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19809860 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e198098b0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809860, semaphore=0x556e18933900, value=106 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e198098b0, semaphore=0x556e18933900, value=106 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e1a3c0c90 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:106}, signal={0x556e18933900:107} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:107}, signal={0x556e18933900:108} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3c1090 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f796d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c1090, semaphore=0x556e18933900, value=108 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f796d0, semaphore=0x556e18933900, value=108 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19629490 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:108}, signal={0x556e18933900:109} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:109}, signal={0x556e18933900:110} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3c06d0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3c0920 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c06d0, semaphore=0x556e18933900, value=110 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c0920, semaphore=0x556e18933900, value=110 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1610612736, buffer=0x556e18b29560 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1610612736, wait={0x556e18933900:110}, signal={0x556e18933900:111} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:111}, signal={0x556e18933900:112} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a10ca60 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a10cab0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10ca60, semaphore=0x556e18933900, value=112 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10cab0, semaphore=0x556e18933900, value=112 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19629490 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:112}, signal={0x556e18933900:113} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:113}, signal={0x556e18933900:114} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19623470 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e196234c0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19623470, semaphore=0x556e18933900, value=114 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e196234c0, semaphore=0x556e18933900, value=114 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19623510 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:114}, signal={0x556e18933900:115} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:115}, signal={0x556e18933900:116} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19623760 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a10d670 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19623760, semaphore=0x556e18933900, value=116 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10d670, semaphore=0x556e18933900, value=116 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19623510 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:116}, signal={0x556e18933900:117} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:117}, signal={0x556e18933900:118} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1af20 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c1af70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1af20, semaphore=0x556e18933900, value=118 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1af70, semaphore=0x556e18933900, value=118 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=196608, buffer=0x556e19623510 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=196608, wait={0x556e18933900:118}, signal={0x556e18933900:119} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:119}, signal={0x556e18933900:120} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19acf2d0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19acf320 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19acf2d0, semaphore=0x556e18933900, value=120 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19acf320, semaphore=0x556e18933900, value=120 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=1207959552, buffer=0x556e199b3320 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=1207959552, wait={0x556e18933900:120}, signal={0x556e18933900:121} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:121}, signal={0x556e18933900:122} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3621f0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a362240 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3621f0, semaphore=0x556e18933900, value=122 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362240, semaphore=0x556e18933900, value=122 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=6144, buffer=0x556e1a10cc40 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=6144, wait={0x556e18933900:122}, signal={0x556e18933900:123} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:123}, signal={0x556e18933900:124} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3626f0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a362740 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3626f0, semaphore=0x556e18933900, value=124 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362740, semaphore=0x556e18933900, value=124 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=402653184, buffer=0x556e18a85800 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=402653184, wait={0x556e18933900:124}, signal={0x556e18933900:125} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:125}, signal={0x556e18933900:126} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b41e10 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b41e60 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b41e10, semaphore=0x556e18933900, value=126 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b41e60, semaphore=0x556e18933900, value=126 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=96, buffer=0x556e1a363330 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=96, wait={0x556e18933900:126}, signal={0x556e18933900:127} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:127}, signal={0x556e18933900:128} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b42150 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18b421a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b42150, semaphore=0x556e18933900, value=128 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b421a0, semaphore=0x556e18933900, value=128 (OK) | |
I0602 22:27:57.174826 140467571585024 partitioning.py:637] replicated train state shapes: TrainState(step=(1,), mdl_vars={'params': {'lm': {'final_ln': {'bias': (1, 2048), 'scale': (1, 2048)}, 'position_emb': {'emb_var': (1, 2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (1, 51200)}, 'linear': {'w': (1, 2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (1, 24, 8192)}, 'linear': {'w': (1, 24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (1, 24, 2048)}, 'linear': {'w': (1, 24, 8192, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}, 'self_attention': {'combined_qkv': {'w': (1, 24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (1, 24, 64)}, 'post': {'w': (1, 24, 2048, 32, 64)}}}}}}}}}, opt_states=[{'no_prefix': ({'count': (1,)}, {'count': (1,)}, {'count': (1,), 'm': {'params': {'lm': {'final_ln': {'bias': (1, 2048), 'scale': (1, 2048)}, 'position_emb': {'emb_var': (1, 2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (1, 51200)}, 'linear': {'w': (1, 2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': (1, 2048), 'scale': (1, 2048)}, 'position_emb': {'emb_var': (1, 2048, 2048)}, 'softmax': {'logits_ffn': {'bias': {'b': (1, 51200)}, 'linear': {'w': (1, 2048, 51200)}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'ffn_layer2': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}}, 'layer_norm': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'self_attention': {'combined_qkv': {'w': MaskedNode()}, 'per_dim_scale': {'per_dim_scale': MaskedNode()}, 'post': {'w': MaskedNode()}}}}}}}}}}, {'count': (1,)}), 'p#24#i-1': ({'count': (1, 24)}, {'count': (1, 24)}, {'count': (1, 24), 'm': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (1, 24, 8192)}, 'linear': {'w': (1, 24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (1, 24, 2048)}, 'linear': {'w': (1, 24, 8192, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}, 'self_attention': {'combined_qkv': {'w': (1, 24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (1, 24, 64)}, 'post': {'w': (1, 24, 2048, 32, 64)}}}}}}}}}, 'v': {'params': {'lm': {'final_ln': {'bias': MaskedNode(), 'scale': MaskedNode()}, 'position_emb': {'emb_var': MaskedNode()}, 'softmax': {'logits_ffn': {'bias': {'b': MaskedNode()}, 'linear': {'w': MaskedNode()}}}, 'transformer': {'repeat': {'sub': {'x_layers_0': {'ff_layer': {'ffn_layer1': {'bias': {'b': (1, 24, 8192)}, 'linear': {'w': (1, 24, 2048, 8192)}}, 'ffn_layer2': {'bias': {'b': (1, 24, 2048)}, 'linear': {'w': (1, 24, 8192, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}}, 'layer_norm': {'bias': (1, 24, 2048), 'scale': (1, 24, 2048)}, 'self_attention': {'combined_qkv': {'w': (1, 24, 3, 2048, 32, 64)}, 'per_dim_scale': {'per_dim_scale': (1, 24, 64)}, 'post': {'w': (1, 24, 2048, 32, 64)}}}}}}}}}}, {'count': (1, 24)})}]) | |
W0602 22:27:57.175599 140467571585024 dispatch.py:272] Finished tracing + transforming jit(convert_element_type) in 0.00022101402282714844 sec | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e199b2fc0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:128}, signal={0x556e18933900:129} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:129}, signal={0x556e18933900:130} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19c04f70 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e199a91b0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04f70, semaphore=0x556e18933900, value=130 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199a91b0, semaphore=0x556e18933900, value=130 (OK) | |
W0602 22:27:57.177212 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.000293731689453125 sec | |
W0602 22:27:57.177804 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_seed for pjit in 0.001306295394897461 sec | |
W0602 22:27:57.179432 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00016808509826660156 sec | |
W0602 22:27:57.180124 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0016508102416992188 sec | |
W0602 22:27:57.181025 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_fold_in for pjit in 0.0047168731689453125 sec | |
W0602 22:27:57.181611 140467571585024 pxla.py:1882] Compiling _threefry_fold_in for with global shapes and types [ShapedArray(uint32[2]), ShapedArray(uint32[])]. Argument mapping: (GSPMDSharding({replicated}), GSPMDSharding({replicated})). | |
W0602 22:27:57.185242 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002880096435546875 sec | |
W0602 22:27:57.186213 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003724098205566406 sec | |
W0602 22:27:57.187031 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.000263214111328125 sec | |
W0602 22:27:57.187665 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002658367156982422 sec | |
W0602 22:27:57.224001 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_fold_in) in 0.042261362075805664 sec | |
W0602 22:27:57.672591 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_fold_in) in 0.4483067989349365 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5f7860 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5f7860, semaphore=0x556e189338c0, value=9 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=10, fence=0x556e1a3387d0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5f7860, from_fence=0x556e195d2630 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a368cc0, semaphore=0x556e189338c0, value=10 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5f7860, from_fence=0x556e19c04f70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199a91b0, semaphore=0x556e189338c0, value=10 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e1a5731a0, f=0, wait_fence=0x556e1a5f7860 {0x556e189338c0:9, 0x556e18933900:130}, signal_fence=0x556e1a3387d0 {0x556e189338c0:10} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19813510 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a05a820 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19813510, semaphore=0x556e189338c0, value=10 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a05a820, semaphore=0x556e189338c0, value=10 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ffb880, wait={0x556e18933900:130, 0x556e189338c0:10}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ad8460, wait={0x556e189338c0:10}, signal={} (OK) | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf198 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:10}, signal={} (OK) | |
I0602 22:27:57.673998 140467571585024 partitioning.py:647] root prng key: [3199903509 2250625448] | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ad8640, wait={0x556e189338c0:9}, signal={} (OK) | |
W0602 22:27:58.050272 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00019407272338867188 sec | |
W0602 22:27:58.051110 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0016100406646728516 sec | |
W0602 22:27:58.052080 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.0029850006103515625 sec | |
W0602 22:27:58.052662 140467571585024 pxla.py:1882] Compiling _threefry_split_original for with global shapes and types [ShapedArray(uint32[2])]. Argument mapping: (GSPMDSharding({replicated}),). | |
W0602 22:27:58.055990 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00040984153747558594 sec | |
W0602 22:27:58.056857 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00027370452880859375 sec | |
W0602 22:27:58.057676 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00027060508728027344 sec | |
W0602 22:27:58.058297 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002770423889160156 sec | |
W0602 22:27:58.095024 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_split_original) in 0.04224109649658203 sec | |
W0602 22:27:58.572319 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_split_original) in 0.477022647857666 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5b4770 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5b4770, semaphore=0x556e189338c0, value=10 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=11, fence=0x556e195265a0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5b4770, from_fence=0x556e19813510 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a05a820, semaphore=0x556e189338c0, value=11 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e19266b10, f=0, wait_fence=0x556e1a5b4770 {0x556e189338c0:10}, signal_fence=0x556e195265a0 {0x556e189338c0:11} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2b64a0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19ef0180 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2b64a0, semaphore=0x556e189338c0, value=11 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ef0180, semaphore=0x556e189338c0, value=11 (OK) | |
W0602 22:27:58.574334 140467571585024 dispatch.py:272] Finished tracing + transforming _unstack for pjit in 0.0007600784301757812 sec | |
W0602 22:27:58.574897 140467571585024 pxla.py:1882] Compiling _unstack for with global shapes and types [ShapedArray(uint32[3,2])]. Argument mapping: (GSPMDSharding({replicated}),). | |
W0602 22:27:58.576957 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_unstack) in 0.0019330978393554688 sec | |
W0602 22:27:58.638722 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_unstack) in 0.061475276947021484 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18a1cb40 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a1cb40, semaphore=0x556e189338c0, value=11 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=12, fence=0x556e1a2a2ec0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18a1cb40, from_fence=0x556e1a2b64a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ef0180, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e196c1cf0, f=0, wait_fence=0x556e18a1cb40 {0x556e189338c0:11}, signal_fence=0x556e1a2a2ec0 {0x556e189338c0:12} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19fa7550 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e190c26b0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19fa7550, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e190c26b0, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18c40140 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a450b40 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18c40140, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a450b40, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19047700 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18bec470 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19047700, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bec470, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1962c9e0, wait={0x556e189338c0:12}, signal={} (OK) | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf538 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:12}, signal={} (OK) | |
I0602 22:27:58.639493 140467571585024 executors.py:260] train prng seed: [3373580220 3771856083] | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf538 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:12}, signal={} (OK) | |
I0602 22:27:58.640165 140467571585024 executors.py:261] eval prng seed: [3893388808 331134876] | |
W0602 22:27:58.642186 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.001253366470336914 sec | |
W0602 22:27:58.642764 140467571585024 pxla.py:1882] Compiling _threefry_split_original for with global shapes and types [ShapedArray(uint32[2])]. Argument mapping: (GSPMDSharding({replicated}),). | |
W0602 22:27:58.683536 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_threefry_split_original) in 0.04065346717834473 sec | |
W0602 22:27:59.126293 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_threefry_split_original) in 0.4424901008605957 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f19d00 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f19d00, semaphore=0x556e189338c0, value=12 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=13, fence=0x556e19d8df10 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e18f19d00, from_fence=0x556e18c40140 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a450b40, semaphore=0x556e189338c0, value=13 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e19dcb0c0, f=0, wait_fence=0x556e18f19d00 {0x556e189338c0:12}, signal_fence=0x556e19d8df10 {0x556e189338c0:13} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a280840 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2825c0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a280840, semaphore=0x556e189338c0, value=13 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2825c0, semaphore=0x556e189338c0, value=13 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1a210440, wait={0x556e189338c0:13}, signal={} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19e98090 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19e98090, semaphore=0x556e189338c0, value=13 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=14, fence=0x556e18c3b9a0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e19e98090, from_fence=0x556e19047700 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bec470, semaphore=0x556e189338c0, value=14 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e19dcb0c0, f=0, wait_fence=0x556e19e98090 {0x556e189338c0:13}, signal_fence=0x556e18c3b9a0 {0x556e189338c0:14} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18ca6110 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a598320 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18ca6110, semaphore=0x556e189338c0, value=14 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a598320, semaphore=0x556e189338c0, value=14 (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e1a210620, wait={0x556e189338c0:14}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e18b937f0, wait={0x556e189338c0:11}, signal={} (OK) | |
I0602 22:27:59.127763 140467571585024 executors.py:295] Starting executor. | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ec10d8 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:16}, signal={} (OK) | |
I0602 22:27:59.128399 140467571585024 executors.py:454] Model initial global_step=0 | |
I0602 22:27:59.128465 140467571585024 executors.py:461] [PAX STATUS]: Starting training loop. | |
I0602 22:27:59.128512 140467571585024 programs.py:210] [PAX STATUS]: Setting up BaseTrainProgram. | |
I0602 22:27:59.128588 140467571585024 summary_utils.py:281] Opening SummaryWriter `log_NVIDIA1_3BPmap/summaries/train`... | |
I0602 22:27:59.129504 140467571585024 summary_utils.py:281] Opening SummaryWriter `log_NVIDIA1_3BPmap/summaries/eval_train`... | |
I0602 22:27:59.132162 140467571585024 py_utils.py:338] Starting sync_global_devices Start training loop from step: 0 across 1 devices globally | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e19f52f60 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:130}, signal={0x556e18933900:131} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:131}, signal={0x556e18933900:132} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f601c0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18f60020 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f601c0, semaphore=0x556e18933900, value=132 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f60020, semaphore=0x556e18933900, value=132 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19f57630 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f57630, semaphore=0x556e189338c0, value=14 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=15, fence=0x556e19f52ec0 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e19f57630, from_fence=0x556e18f601c0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f60020, semaphore=0x556e189338c0, value=15 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e196fdca0, f=0, wait_fence=0x556e19f57630 {0x556e189338c0:14, 0x556e18933900:132}, signal_fence=0x556e19f52ec0 {0x556e189338c0:15} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19db8490 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a1d0f80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19db8490, semaphore=0x556e189338c0, value=15 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a1d0f80, semaphore=0x556e189338c0, value=15 (OK) | |
:: IREE INVOKE (hal_allocator_import_buffer): external_buffer=0x7ffef8ebf9f8 (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e189338c0:15}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ffb880, wait={0x556e18933900:132, 0x556e189338c0:15}, signal={} (OK) | |
:: IREE INVOKE (hal_device_queue_dealloca): device=0x556e18904d50, buffer=0x556e19ffb880, wait={0x556e189338c0:15}, signal={} (OK) | |
I0602 22:27:59.133880 140467571585024 py_utils.py:341] Finished sync_global_devices Start training loop from step: 0 across 1 devices globally | |
W0602 22:27:59.344522 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003952980041503906 sec | |
W0602 22:27:59.345551 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0004057884216308594 sec | |
W0602 22:27:59.346375 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00029778480529785156 sec | |
I0602 22:27:59.357974 140467571585024 base_layer.py:632] Creating var /lm/softmax/logits_ffn/linear/w with shape=[2048, 51200], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.022097086912079608 | |
W0602 22:27:59.359756 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00039649009704589844 sec | |
W0602 22:27:59.360463 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00029850006103515625 sec | |
W0602 22:27:59.361151 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003151893615722656 sec | |
W0602 22:27:59.361839 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003058910369873047 sec | |
W0602 22:27:59.362359 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.003681659698486328 sec | |
W0602 22:27:59.362930 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.004486560821533203 sec | |
W0602 22:27:59.363209 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.004973649978637695 sec | |
W0602 22:27:59.363873 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003452301025390625 sec | |
W0602 22:27:59.366754 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003452301025390625 sec | |
W0602 22:27:59.367571 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00014781951904296875 sec | |
W0602 22:27:59.368822 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003464221954345703 sec | |
I0602 22:27:59.370981 140467571585024 base_layer.py:632] Creating var /lm/position_emb/emb_var with shape=[2048, 2048], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
W0602 22:27:59.372668 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003902912139892578 sec | |
W0602 22:27:59.373650 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030803680419921875 sec | |
W0602 22:27:59.374335 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030493736267089844 sec | |
W0602 22:27:59.374852 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.003212451934814453 sec | |
W0602 22:27:59.375406 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.0039823055267333984 sec | |
W0602 22:27:59.375681 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0044536590576171875 sec | |
W0602 22:27:59.376334 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003409385681152344 sec | |
W0602 22:27:59.380460 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003094673156738281 sec | |
W0602 22:27:59.380913 140467571585024 dispatch.py:272] Finished tracing + transforming _one_hot for pjit in 0.0011699199676513672 sec | |
W0602 22:27:59.381712 140467571585024 dispatch.py:272] Finished tracing + transforming matmul for pjit in 0.0004634857177734375 sec | |
W0602 22:27:59.384545 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005903244018554688 sec | |
W0602 22:27:59.386854 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031876564025878906 sec | |
W0602 22:27:59.388913 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003006458282470703 sec | |
W0602 22:27:59.389868 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029277801513671875 sec | |
W0602 22:27:59.392195 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003139972686767578 sec | |
W0602 22:27:59.392928 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029277801513671875 sec | |
W0602 22:27:59.393807 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00031948089599609375 sec | |
W0602 22:27:59.460369 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003571510314941406 sec | |
W0602 22:27:59.461174 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003254413604736328 sec | |
W0602 22:27:59.461993 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00037980079650878906 sec | |
W0602 22:27:59.463433 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00034689903259277344 sec | |
W0602 22:27:59.464406 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030922889709472656 sec | |
W0602 22:27:59.465304 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00026702880859375 sec | |
W0602 22:27:59.466746 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003063678741455078 sec | |
W0602 22:27:59.467547 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002110004425048828 sec | |
W0602 22:27:59.468261 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00034117698669433594 sec | |
W0602 22:27:59.469110 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.00035643577575683594 sec | |
W0602 22:27:59.470227 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003745555877685547 sec | |
W0602 22:27:59.470706 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0011000633239746094 sec | |
W0602 22:27:59.471545 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.0003387928009033203 sec | |
I0602 22:27:59.472298 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
W0602 22:27:59.473121 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004374980926513672 sec | |
I0602 22:27:59.473878 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
W0602 22:27:59.475531 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0004634857177734375 sec | |
W0602 22:27:59.476112 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0012922286987304688 sec | |
W0602 22:27:59.476956 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.000301361083984375 sec | |
W0602 22:27:59.478550 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033736228942871094 sec | |
W0602 22:27:59.479698 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00035262107849121094 sec | |
W0602 22:27:59.480726 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0005667209625244141 sec | |
W0602 22:27:59.481723 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004661083221435547 sec | |
I0602 22:27:59.492099 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/combined_qkv/w with shape=[3, 2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
W0602 22:27:59.493881 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003955364227294922 sec | |
W0602 22:27:59.494585 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00029778480529785156 sec | |
W0602 22:27:59.495265 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003154277801513672 sec | |
W0602 22:27:59.495957 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003161430358886719 sec | |
W0602 22:27:59.496483 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.0036971569061279297 sec | |
W0602 22:27:59.497040 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.004477024078369141 sec | |
W0602 22:27:59.497337 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.004982948303222656 sec | |
W0602 22:27:59.497991 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034499168395996094 sec | |
W0602 22:27:59.500785 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005729198455810547 sec | |
I0602 22:27:59.502290 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/per_dim_scale/per_dim_scale with shape=[64], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
W0602 22:27:59.503006 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003504753112792969 sec | |
W0602 22:27:59.505191 140467571585024 dispatch.py:272] Finished tracing + transforming logaddexp for pjit in 0.0010294914245605469 sec | |
W0602 22:27:59.505631 140467571585024 dispatch.py:272] Finished tracing + transforming softplus for pjit in 0.0018105506896972656 sec | |
W0602 22:27:59.506253 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029921531677246094 sec | |
W0602 22:27:59.507096 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00048279762268066406 sec | |
W0602 22:27:59.508324 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004889965057373047 sec | |
W0602 22:27:59.509315 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003619194030761719 sec | |
W0602 22:27:59.510055 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029921531677246094 sec | |
W0602 22:27:59.511349 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0006844997406005859 sec | |
W0602 22:27:59.511945 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0015289783477783203 sec | |
W0602 22:27:59.512629 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003342628479003906 sec | |
W0602 22:27:59.513368 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.00036644935607910156 sec | |
W0602 22:27:59.514435 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003521442413330078 sec | |
W0602 22:27:59.514896 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010592937469482422 sec | |
W0602 22:27:59.516114 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002925395965576172 sec | |
W0602 22:27:59.516677 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002224445343017578 sec | |
W0602 22:27:59.517291 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030422210693359375 sec | |
W0602 22:27:59.519034 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0004436969757080078 sec | |
W0602 22:27:59.519812 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003006458282470703 sec | |
W0602 22:27:59.520391 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002224445343017578 sec | |
W0602 22:27:59.521248 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0005037784576416016 sec | |
W0602 22:27:59.521986 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00030612945556640625 sec | |
W0602 22:27:59.523357 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005538463592529297 sec | |
I0602 22:27:59.524224 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/post/w with shape=[2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
W0602 22:27:59.525957 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030350685119628906 sec | |
W0602 22:27:59.526643 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00029659271240234375 sec | |
W0602 22:27:59.527315 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031280517578125 sec | |
W0602 22:27:59.528625 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0009379386901855469 sec | |
W0602 22:27:59.529157 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.004189968109130859 sec | |
W0602 22:27:59.529712 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.0049741268157958984 sec | |
W0602 22:27:59.529983 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0055196285247802734 sec | |
W0602 22:27:59.530637 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003428459167480469 sec | |
W0602 22:27:59.533398 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0005085468292236328 sec | |
W0602 22:27:59.536686 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003006458282470703 sec | |
W0602 22:27:59.537475 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00039386749267578125 sec | |
W0602 22:27:59.539196 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003559589385986328 sec | |
W0602 22:27:59.540110 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00038814544677734375 sec | |
W0602 22:27:59.541549 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.0003504753112792969 sec | |
W0602 22:27:59.542280 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029349327087402344 sec | |
W0602 22:27:59.543831 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035500526428222656 sec | |
W0602 22:27:59.544297 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010623931884765625 sec | |
I0602 22:27:59.555739 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.556720 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.567184 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/linear/w with shape=[2048, 8192], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
W0602 22:27:59.568845 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030517578125 sec | |
W0602 22:27:59.569918 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004093647003173828 sec | |
W0602 22:27:59.570603 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003085136413574219 sec | |
W0602 22:27:59.571115 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.003255605697631836 sec | |
W0602 22:27:59.571674 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.004033803939819336 sec | |
W0602 22:27:59.571937 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0045070648193359375 sec | |
W0602 22:27:59.572589 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003426074981689453 sec | |
W0602 22:27:59.575275 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004978179931640625 sec | |
I0602 22:27:59.575988 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/bias/b with shape=[8192], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
W0602 22:27:59.576705 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003559589385986328 sec | |
W0602 22:27:59.577903 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004894733428955078 sec | |
W0602 22:27:59.579039 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003445148468017578 sec | |
W0602 22:27:59.579769 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029587745666503906 sec | |
W0602 22:27:59.580426 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029397010803222656 sec | |
W0602 22:27:59.581023 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00022649765014648438 sec | |
W0602 22:27:59.581819 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004436969757080078 sec | |
W0602 22:27:59.582886 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003018379211425781 sec | |
W0602 22:27:59.583923 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031375885009765625 sec | |
W0602 22:27:59.584674 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.0003490447998046875 sec | |
W0602 22:27:59.585723 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.000362396240234375 sec | |
W0602 22:27:59.586187 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001088857650756836 sec | |
I0602 22:27:59.592514 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/linear/w with shape=[8192, 2048], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
W0602 22:27:59.594151 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002982616424560547 sec | |
W0602 22:27:59.595182 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004038810729980469 sec | |
W0602 22:27:59.595859 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003058910369873047 sec | |
W0602 22:27:59.596365 140467571585024 dispatch.py:272] Finished tracing + transforming _uniform for pjit in 0.0031800270080566406 sec | |
W0602 22:27:59.596910 140467571585024 dispatch.py:272] Finished tracing + transforming _normal_real for pjit in 0.003953456878662109 sec | |
W0602 22:27:59.597192 140467571585024 dispatch.py:272] Finished tracing + transforming _normal for pjit in 0.0044362545013427734 sec | |
W0602 22:27:59.597837 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034165382385253906 sec | |
W0602 22:27:59.600503 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004875659942626953 sec | |
I0602 22:27:59.601223 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/bias/b with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.647464 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.648468 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.662634 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/combined_qkv/w with shape=[3, 2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
I0602 22:27:59.666589 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/per_dim_scale/per_dim_scale with shape=[64], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.675507 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/self_attention/post/w with shape=[2048, 32, 64], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
I0602 22:27:59.695976 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.696956 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/layer_norm/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.707431 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/linear/w with shape=[2048, 8192], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
I0602 22:27:59.710544 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer1/bias/b with shape=[8192], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.721235 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/linear/w with shape=[8192, 2048], dtype=<class 'jax.numpy.float32'>, init method=gaussian and scale=0.023 | |
I0602 22:27:59.724257 140467571585024 base_layer.py:632] Creating var /lm/transformer/repeat/remat(scan(map_variables(map_variables(map_variables(sub)))))/x_layers_0/ff_layer/ffn_layer2/bias/b with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
W0602 22:27:59.748244 140467571585024 dispatch.py:272] Finished tracing + transforming logaddexp for pjit in 0.0008337497711181641 sec | |
W0602 22:27:59.748823 140467571585024 dispatch.py:272] Finished tracing + transforming real for pjit in 0.00013375282287597656 sec | |
W0602 22:27:59.750124 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021195411682128906 sec | |
W0602 22:27:59.750622 140467571585024 dispatch.py:272] Finished tracing + transforming real for pjit in 0.000125885009765625 sec | |
I0602 22:27:59.836073 140467571585024 base_layer.py:632] Creating var /lm/final_ln/scale with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
I0602 22:27:59.837118 140467571585024 base_layer.py:632] Creating var /lm/final_ln/bias with shape=[2048], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
W0602 22:27:59.854961 140467571585024 dispatch.py:272] Finished tracing + transforming _einsum for pjit in 0.0004992485046386719 sec | |
I0602 22:27:59.857668 140467571585024 base_layer.py:632] Creating var /lm/softmax/logits_ffn/bias/b with shape=[51200], dtype=<class 'jax.numpy.float32'>, init method=constant and scale=0.0 | |
W0602 22:27:59.858443 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034928321838378906 sec | |
W0602 22:27:59.859645 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0004897117614746094 sec | |
W0602 22:27:59.862132 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003387928009033203 sec | |
W0602 22:27:59.863983 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002231597900390625 sec | |
W0602 22:27:59.866493 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034046173095703125 sec | |
W0602 22:27:59.868813 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0004448890686035156 sec | |
W0602 22:27:59.869667 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00037789344787597656 sec | |
W0602 22:27:59.870247 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021958351135253906 sec | |
W0602 22:27:59.871076 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00047278404235839844 sec | |
W0602 22:27:59.871719 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00023937225341796875 sec | |
W0602 22:27:59.872289 140467571585024 dispatch.py:272] Finished tracing + transforming log_softmax for pjit in 0.004140615463256836 sec | |
W0602 22:27:59.877013 140467571585024 dispatch.py:272] Finished tracing + transforming _squeeze for pjit in 0.00019550323486328125 sec | |
W0602 22:27:59.878072 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030040740966796875 sec | |
W0602 22:27:59.878505 140467571585024 dispatch.py:272] Finished tracing + transforming _one_hot for pjit in 0.0011343955993652344 sec | |
W0602 22:27:59.879173 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003135204315185547 sec | |
W0602 22:27:59.881138 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00036787986755371094 sec | |
W0602 22:27:59.882631 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00019931793212890625 sec | |
W0602 22:27:59.884168 140467571585024 dispatch.py:272] Finished tracing + transforming _argmax for pjit in 0.00023031234741210938 sec | |
W0602 22:27:59.885844 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030541419982910156 sec | |
W0602 22:27:59.887813 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003502368927001953 sec | |
W0602 22:27:59.889712 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003333091735839844 sec | |
W0602 22:27:59.893325 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00037479400634765625 sec | |
W0602 22:27:59.895086 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002925395965576172 sec | |
W0602 22:27:59.896665 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002980232238769531 sec | |
W0602 22:27:59.897483 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00045609474182128906 sec | |
W0602 22:27:59.898527 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003457069396972656 sec | |
W0602 22:27:59.899624 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00019860267639160156 sec | |
W0602 22:27:59.903453 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002846717834472656 sec | |
W0602 22:27:59.915879 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00040531158447265625 sec | |
W0602 22:27:59.919377 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003483295440673828 sec | |
W0602 22:28:00.023173 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00034737586975097656 sec | |
W0602 22:28:00.024725 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003299713134765625 sec | |
W0602 22:28:00.025552 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003440380096435547 sec | |
W0602 22:28:00.026346 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003674030303955078 sec | |
W0602 22:28:00.027400 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00030732154846191406 sec | |
W0602 22:28:00.027714 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009222030639648438 sec | |
W0602 22:28:00.028578 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003437995910644531 sec | |
W0602 22:28:00.029335 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003364086151123047 sec | |
W0602 22:28:00.031265 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033664703369140625 sec | |
W0602 22:28:00.033033 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0005025863647460938 sec | |
W0602 22:28:00.033688 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00028896331787109375 sec | |
W0602 22:28:00.034654 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032401084899902344 sec | |
W0602 22:28:00.035815 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.00042939186096191406 sec | |
W0602 22:28:00.036924 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028324127197265625 sec | |
W0602 22:28:00.037694 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035881996154785156 sec | |
W0602 22:28:00.038773 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028324127197265625 sec | |
W0602 22:28:00.039510 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035309791564941406 sec | |
W0602 22:28:00.040193 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003631114959716797 sec | |
W0602 22:28:00.040920 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00034737586975097656 sec | |
W0602 22:28:00.041531 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002837181091308594 sec | |
W0602 22:28:00.042264 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003497600555419922 sec | |
W0602 22:28:00.042867 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002853870391845703 sec | |
W0602 22:28:00.043591 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003466606140136719 sec | |
W0602 22:28:00.044194 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002808570861816406 sec | |
W0602 22:28:00.044924 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003497600555419922 sec | |
W0602 22:28:00.045540 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002884864807128906 sec | |
W0602 22:28:00.046352 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0004208087921142578 sec | |
W0602 22:28:00.046952 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028133392333984375 sec | |
W0602 22:28:00.047699 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00035762786865234375 sec | |
W0602 22:28:00.050174 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028586387634277344 sec | |
W0602 22:28:00.050910 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003540515899658203 sec | |
W0602 22:28:00.051504 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002827644348144531 sec | |
W0602 22:28:00.052228 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003445148468017578 sec | |
W0602 22:28:00.052894 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003573894500732422 sec | |
W0602 22:28:00.053647 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003571510314941406 sec | |
W0602 22:28:00.055830 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003528594970703125 sec | |
W0602 22:28:00.056593 140467571585024 dispatch.py:272] Finished tracing + transforming isfinite for pjit in 0.00018739700317382812 sec | |
W0602 22:28:00.057262 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_all for pjit in 0.00037097930908203125 sec | |
W0602 22:28:00.058438 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00034618377685546875 sec | |
W0602 22:28:00.059085 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002803802490234375 sec | |
W0602 22:28:00.060156 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002760887145996094 sec | |
W0602 22:28:00.060764 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002777576446533203 sec | |
W0602 22:28:00.061400 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003063678741455078 sec | |
W0602 22:28:00.062021 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028133392333984375 sec | |
W0602 22:28:00.062642 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002751350402832031 sec | |
W0602 22:28:00.063256 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00027942657470703125 sec | |
W0602 22:28:00.064822 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034427642822265625 sec | |
W0602 22:28:00.065452 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002810955047607422 sec | |
W0602 22:28:00.066066 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002791881561279297 sec | |
W0602 22:28:00.075807 140467571585024 optimizers.py:1170] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update). | |
I0602 22:28:00.075856 140467571585024 optimizers.py:1173] Using sharded_adam. | |
W0602 22:28:00.075891 140467571585024 optimizers.py:580] DEPRECATION WARNING: p.weight_decay will be deprecated. In future, we will do a migration to remove p.weight_decay and after that, setting it will throw an exception. In future, we will use p.l2_regularizer_weight for coupled weight decay (i.e., weight decays that affect optimizer slots), and use p.decoupled_weight_decay for decoupled weight decay (i.e., weight decays that are added only to the final update). | |
W0602 22:28:00.076794 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003323554992675781 sec | |
W0602 22:28:00.078552 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002219676971435547 sec | |
W0602 22:28:00.079495 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003407001495361328 sec | |
W0602 22:28:00.079883 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009777545928955078 sec | |
W0602 22:28:00.080916 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.003301382064819336 sec | |
W0602 22:28:00.082036 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00021529197692871094 sec | |
W0602 22:28:00.083037 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004119873046875 sec | |
W0602 22:28:00.083425 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010447502136230469 sec | |
W0602 22:28:00.084460 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002839803695678711 sec | |
W0602 22:28:00.085329 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00022673606872558594 sec | |
W0602 22:28:00.086240 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003211498260498047 sec | |
W0602 22:28:00.086613 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009417533874511719 sec | |
W0602 22:28:00.087617 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0027151107788085938 sec | |
W0602 22:28:00.088449 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00020623207092285156 sec | |
W0602 22:28:00.089448 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00032806396484375 sec | |
W0602 22:28:00.089833 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010499954223632812 sec | |
W0602 22:28:00.090841 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0027959346771240234 sec | |
W0602 22:28:00.091972 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00030994415283203125 sec | |
W0602 22:28:00.092691 140467571585024 dispatch.py:272] Finished tracing + transforming _power for pjit in 0.00031757354736328125 sec | |
W0602 22:28:00.093427 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00032520294189453125 sec | |
W0602 22:28:00.097660 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002770423889160156 sec | |
W0602 22:28:00.098491 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028824806213378906 sec | |
W0602 22:28:00.111270 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028967857360839844 sec | |
W0602 22:28:00.112138 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003113746643066406 sec | |
W0602 22:28:00.118681 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002868175506591797 sec | |
W0602 22:28:00.119528 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002980232238769531 sec | |
W0602 22:28:00.126013 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028777122497558594 sec | |
W0602 22:28:00.126874 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003070831298828125 sec | |
W0602 22:28:00.128772 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033593177795410156 sec | |
W0602 22:28:00.129413 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002238750457763672 sec | |
W0602 22:28:00.130366 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002868175506591797 sec | |
W0602 22:28:00.132015 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003223419189453125 sec | |
W0602 22:28:00.132609 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002219676971435547 sec | |
W0602 22:28:00.133493 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002834796905517578 sec | |
W0602 22:28:00.134187 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003380775451660156 sec | |
W0602 22:28:00.134777 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00022077560424804688 sec | |
W0602 22:28:00.136005 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003387928009033203 sec | |
W0602 22:28:00.136672 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031280517578125 sec | |
W0602 22:28:00.137268 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002319812774658203 sec | |
W0602 22:28:00.138124 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002760887145996094 sec | |
W0602 22:28:00.138686 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00020956993103027344 sec | |
W0602 22:28:00.139617 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00034236907958984375 sec | |
W0602 22:28:00.140043 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010128021240234375 sec | |
W0602 22:28:00.141171 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00021147727966308594 sec | |
W0602 22:28:00.142399 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0017132759094238281 sec | |
W0602 22:28:00.143631 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00027441978454589844 sec | |
W0602 22:28:00.145866 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00021409988403320312 sec | |
W0602 22:28:00.146854 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00041794776916503906 sec | |
W0602 22:28:00.147282 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010738372802734375 sec | |
W0602 22:28:00.148948 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002741813659667969 sec | |
W0602 22:28:00.149488 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00020432472229003906 sec | |
W0602 22:28:00.150406 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00034236907958984375 sec | |
W0602 22:28:00.150839 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010149478912353516 sec | |
W0602 22:28:00.152539 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002808570861816406 sec | |
W0602 22:28:00.153557 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.0006973743438720703 sec | |
W0602 22:28:00.154273 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003600120544433594 sec | |
W0602 22:28:00.156442 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00029158592224121094 sec | |
W0602 22:28:00.157175 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003440380096435547 sec | |
W0602 22:28:00.159234 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033092498779296875 sec | |
W0602 22:28:00.163224 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032711029052734375 sec | |
W0602 22:28:00.163944 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00031685829162597656 sec | |
W0602 22:28:00.180238 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00021576881408691406 sec | |
W0602 22:28:00.181189 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003566741943359375 sec | |
W0602 22:28:00.181575 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009889602661132812 sec | |
W0602 22:28:00.182617 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0028810501098632812 sec | |
W0602 22:28:00.185919 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002105236053466797 sec | |
W0602 22:28:00.186836 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003268718719482422 sec | |
W0602 22:28:00.187237 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009739398956298828 sec | |
W0602 22:28:00.188269 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0027616024017333984 sec | |
W0602 22:28:00.194182 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002090930938720703 sec | |
W0602 22:28:00.195191 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004177093505859375 sec | |
W0602 22:28:00.195587 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.001054525375366211 sec | |
W0602 22:28:00.196628 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002859830856323242 sec | |
W0602 22:28:00.201195 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.00022411346435546875 sec | |
W0602 22:28:00.202216 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004239082336425781 sec | |
W0602 22:28:00.202624 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010750293731689453 sec | |
W0602 22:28:00.203661 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002897977828979492 sec | |
W0602 22:28:00.206920 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002090930938720703 sec | |
W0602 22:28:00.207919 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004134178161621094 sec | |
W0602 22:28:00.208302 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010347366333007812 sec | |
W0602 22:28:00.209338 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.0028257369995117188 sec | |
W0602 22:28:00.212531 140467571585024 dispatch.py:272] Finished tracing + transforming isnan for pjit in 0.0002090930938720703 sec | |
W0602 22:28:00.213589 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0004582405090332031 sec | |
W0602 22:28:00.213998 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.00112152099609375 sec | |
W0602 22:28:00.215037 140467571585024 dispatch.py:272] Finished tracing + transforming nan_to_num for pjit in 0.002917766571044922 sec | |
W0602 22:28:00.226901 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028133392333984375 sec | |
W0602 22:28:00.229168 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003066062927246094 sec | |
W0602 22:28:00.238538 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028777122497558594 sec | |
W0602 22:28:00.240322 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030875205993652344 sec | |
W0602 22:28:00.260538 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002918243408203125 sec | |
W0602 22:28:00.262394 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003674030303955078 sec | |
W0602 22:28:00.310421 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002956390380859375 sec | |
W0602 22:28:00.312259 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003032684326171875 sec | |
W0602 22:28:00.323301 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.000286102294921875 sec | |
W0602 22:28:00.332812 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030231475830078125 sec | |
W0602 22:28:00.334606 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002968311309814453 sec | |
W0602 22:28:00.337694 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003323554992675781 sec | |
W0602 22:28:00.339194 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002148151397705078 sec | |
W0602 22:28:00.340575 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002872943878173828 sec | |
W0602 22:28:00.341731 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032591819763183594 sec | |
W0602 22:28:00.342816 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021719932556152344 sec | |
W0602 22:28:00.344238 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003428459167480469 sec | |
W0602 22:28:00.347850 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033020973205566406 sec | |
W0602 22:28:00.348942 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021958351135253906 sec | |
W0602 22:28:00.350367 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003342628479003906 sec | |
W0602 22:28:00.357018 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00033855438232421875 sec | |
W0602 22:28:00.358186 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002727508544921875 sec | |
W0602 22:28:00.359584 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002987384796142578 sec | |
W0602 22:28:00.360738 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003199577331542969 sec | |
W0602 22:28:00.361827 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00021886825561523438 sec | |
W0602 22:28:00.363179 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002841949462890625 sec | |
W0602 22:28:00.364387 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00032711029052734375 sec | |
W0602 22:28:00.365492 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002205371856689453 sec | |
W0602 22:28:00.366867 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002865791320800781 sec | |
W0602 22:28:00.367908 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.000213623046875 sec | |
W0602 22:28:00.369345 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00044918060302734375 sec | |
W0602 22:28:00.369788 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0011372566223144531 sec | |
W0602 22:28:00.375707 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.000278472900390625 sec | |
W0602 22:28:00.376975 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00020933151245117188 sec | |
W0602 22:28:00.378338 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003464221954345703 sec | |
W0602 22:28:00.378786 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001043081283569336 sec | |
W0602 22:28:00.381543 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00028705596923828125 sec | |
W0602 22:28:00.386663 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.0002148151397705078 sec | |
W0602 22:28:00.388015 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003528594970703125 sec | |
W0602 22:28:00.388454 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001035451889038086 sec | |
W0602 22:28:00.391259 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002779960632324219 sec | |
W0602 22:28:00.402782 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00022363662719726562 sec | |
W0602 22:28:00.404162 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.00036597251892089844 sec | |
W0602 22:28:00.404610 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.001064300537109375 sec | |
W0602 22:28:00.407434 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.000347137451171875 sec | |
W0602 22:28:00.408698 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00021696090698242188 sec | |
W0602 22:28:00.410056 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003528594970703125 sec | |
W0602 22:28:00.410495 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010399818420410156 sec | |
W0602 22:28:00.413275 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002903938293457031 sec | |
W0602 22:28:00.414558 140467571585024 dispatch.py:272] Finished tracing + transforming square for pjit in 0.00021409988403320312 sec | |
W0602 22:28:00.415879 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_sum for pjit in 0.0003445148468017578 sec | |
W0602 22:28:00.416320 140467571585024 dispatch.py:272] Finished tracing + transforming _mean for pjit in 0.0010304450988769531 sec | |
W0602 22:28:00.419043 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00028204917907714844 sec | |
W0602 22:28:00.420496 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003352165222167969 sec | |
W0602 22:28:00.430241 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003299713134765625 sec | |
W0602 22:28:00.466179 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003275871276855469 sec | |
W0602 22:28:00.466578 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009796619415283203 sec | |
W0602 22:28:00.467983 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003292560577392578 sec | |
W0602 22:28:00.468378 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009663105010986328 sec | |
W0602 22:28:00.469430 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00033593177795410156 sec | |
W0602 22:28:00.469815 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009608268737792969 sec | |
W0602 22:28:00.470983 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003285408020019531 sec | |
W0602 22:28:00.471376 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010793209075927734 sec | |
W0602 22:28:00.472426 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003216266632080078 sec | |
W0602 22:28:00.472812 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009558200836181641 sec | |
W0602 22:28:00.473871 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00032591819763183594 sec | |
W0602 22:28:00.474267 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009617805480957031 sec | |
W0602 22:28:00.475304 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003190040588378906 sec | |
W0602 22:28:00.475695 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009546279907226562 sec | |
W0602 22:28:00.476808 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003247261047363281 sec | |
W0602 22:28:00.477206 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.001033782958984375 sec | |
W0602 22:28:00.479650 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00032830238342285156 sec | |
W0602 22:28:00.480049 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009765625 sec | |
W0602 22:28:00.481175 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00040793418884277344 sec | |
W0602 22:28:00.481562 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0010426044464111328 sec | |
W0602 22:28:00.482620 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.0003299713134765625 sec | |
W0602 22:28:00.483013 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009722709655761719 sec | |
W0602 22:28:00.483891 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00017547607421875 sec | |
W0602 22:28:00.484178 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0007045269012451172 sec | |
W0602 22:28:00.487202 140467571585024 dispatch.py:272] Finished tracing + transforming _broadcast_arrays for pjit in 0.00031948089599609375 sec | |
W0602 22:28:00.487582 140467571585024 dispatch.py:272] Finished tracing + transforming _where for pjit in 0.0009427070617675781 sec | |
W0602 22:28:00.513556 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002880096435546875 sec | |
W0602 22:28:00.514215 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00031638145446777344 sec | |
W0602 22:28:00.514865 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00029468536376953125 sec | |
W0602 22:28:00.515513 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003058910369873047 sec | |
W0602 22:28:00.517177 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003752708435058594 sec | |
W0602 22:28:00.517815 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028586387634277344 sec | |
W0602 22:28:00.518432 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00028514862060546875 sec | |
W0602 22:28:00.519467 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003421306610107422 sec | |
W0602 22:28:00.520108 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00020766258239746094 sec | |
W0602 22:28:00.520784 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003330707550048828 sec | |
W0602 22:28:00.521442 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0003376007080078125 sec | |
W0602 22:28:00.526046 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00021767616271972656 sec | |
W0602 22:28:00.526714 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032639503479003906 sec | |
W0602 22:28:00.528863 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002040863037109375 sec | |
W0602 22:28:00.529608 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00038170814514160156 sec | |
W0602 22:28:00.531756 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002048015594482422 sec | |
W0602 22:28:00.532421 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032019615173339844 sec | |
W0602 22:28:00.534582 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0001976490020751953 sec | |
W0602 22:28:00.535311 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00038623809814453125 sec | |
W0602 22:28:00.536197 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002865791320800781 sec | |
W0602 22:28:00.537872 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00020456314086914062 sec | |
W0602 22:28:00.538537 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032448768615722656 sec | |
W0602 22:28:00.539419 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002818107604980469 sec | |
W0602 22:28:00.541140 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002739429473876953 sec | |
W0602 22:28:00.541811 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00032782554626464844 sec | |
W0602 22:28:00.542690 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.00028395652770996094 sec | |
W0602 22:28:00.544338 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.00019788742065429688 sec | |
W0602 22:28:00.544997 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003192424774169922 sec | |
W0602 22:28:00.545884 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002818107604980469 sec | |
W0602 22:28:00.555858 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002009868621826172 sec | |
W0602 22:28:00.556550 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003445148468017578 sec | |
W0602 22:28:00.557439 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002849102020263672 sec | |
W0602 22:28:00.559089 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0001995563507080078 sec | |
W0602 22:28:00.559751 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.00031757354736328125 sec | |
W0602 22:28:00.560696 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0003495216369628906 sec | |
W0602 22:28:00.562363 140467571585024 dispatch.py:272] Finished tracing + transforming absolute for pjit in 0.0002009868621826172 sec | |
W0602 22:28:00.563033 140467571585024 dispatch.py:272] Finished tracing + transforming _reduce_max for pjit in 0.0003275871276855469 sec | |
W0602 22:28:00.563914 140467571585024 dispatch.py:272] Finished tracing + transforming true_divide for pjit in 0.0002868175506591797 sec | |
W0602 22:28:00.601922 140467571585024 dispatch.py:272] Finished tracing + transforming _wrapped_step_fn for pmap in 1.2830684185028076 sec | |
W0602 22:28:00.602606 140467571585024 pxla.py:859] Compiling _wrapped_step_fn (140457517638464) for 1 devices with args (ShapedArray(uint32[1]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048,2048]), ShapedArray(float32[1,51200]), ShapedArray(float32[1,2048,51200]), ShapedArray(float32[1,24,8192]), ShapedArray(float32[1,24,2048,8192]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,8192,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,3,2048,32,64]), ShapedArray(float32[1,24,64]), ShapedArray(float32[1,24,2048,32,64]), ShapedArray(int32[1]), ShapedArray(int32[1]), ShapedArray(int32[1]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048,2048]), ShapedArray(float32[1,51200]), ShapedArray(float32[1,2048,51200]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048]), ShapedArray(float32[1,2048,2048]), ShapedArray(float32[1,51200]), ShapedArray(float32[1,2048,51200]), ShapedArray(int32[1]), ShapedArray(int32[1,24]), ShapedArray(int32[1,24]), ShapedArray(int32[1,24]), ShapedArray(float32[1,24,8192]), ShapedArray(float32[1,24,2048,8192]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,8192,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,3,2048,32,64]), ShapedArray(float32[1,24,64]), ShapedArray(float32[1,24,2048,32,64]), ShapedArray(float32[1,24,8192]), ShapedArray(float32[1,24,2048,8192]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,8192,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,2048]), ShapedArray(float32[1,24,3,2048,32,64]), ShapedArray(float32[1,24,64]), ShapedArray(float32[1,24,2048,32,64]), ShapedArray(int32[1,24]), ShapedArray(uint32[1,2]), ShapedArray(float32[1,1]), ShapedArray(int32[1,1,2048]), ShapedArray(int32[1,1,2048]), ShapedArray(float32[1,1,2048]), ShapedArray(int32[1,1,2048]), ShapedArray(int32[1,1,2048]), ShapedArray(float32[1,1,2048])). (num_replicas=1) | |
/workspace/jax/jax/_src/interpreters/mlir.py:618: UserWarning: Some donated buffers were not usable: ShapedArray(uint32[]), ShapedArray(float32[2048]), ShapedArray(float32[2048]), ShapedArray(float32[2048,2048]), ShapedArray(float32[51200]), ShapedArray(float32[2048,51200]), ShapedArray(float32[24,8192]), ShapedArray(float32[24,2048,8192]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,8192,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,3,2048,32,64]), ShapedArray(float32[24,64]), ShapedArray(float32[24,2048,32,64]), ShapedArray(int32[]), ShapedArray(int32[]), ShapedArray(int32[]), ShapedArray(float32[2048]), ShapedArray(float32[2048]), ShapedArray(float32[2048,2048]), ShapedArray(float32[51200]), ShapedArray(float32[2048,51200]), ShapedArray(float32[2048]), ShapedArray(float32[2048]), ShapedArray(float32[2048,2048]), ShapedArray(float32[51200]), ShapedArray(float32[2048,51200]), ShapedArray(int32[]), ShapedArray(int32[24]), ShapedArray(int32[24]), ShapedArray(int32[24]), ShapedArray(float32[24,8192]), ShapedArray(float32[24,2048,8192]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,8192,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,3,2048,32,64]), ShapedArray(float32[24,64]), ShapedArray(float32[24,2048,32,64]), ShapedArray(float32[24,8192]), ShapedArray(float32[24,2048,8192]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,8192,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,2048]), ShapedArray(float32[24,3,2048,32,64]), ShapedArray(float32[24,64]), ShapedArray(float32[24,2048,32,64]), ShapedArray(int32[24]). | |
Donation is not implemented for iree_cuda. | |
See an explanation at https://jax.readthedocs.io/en/latest/faq.html#buffer-donation. | |
warnings.warn(f"Some donated buffers were not usable: {', '.join(unused_donations)}.\n{msg}") | |
W0602 22:28:00.607805 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002810955047607422 sec | |
W0602 22:28:00.608350 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_seed for pjit in 0.00139617919921875 sec | |
W0602 22:28:00.609600 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00016379356384277344 sec | |
W0602 22:28:00.610242 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012764930725097656 sec | |
W0602 22:28:00.611082 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_fold_in for pjit in 0.004296541213989258 sec | |
W0602 22:28:00.614205 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00035190582275390625 sec | |
W0602 22:28:00.615066 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00026988983154296875 sec | |
W0602 22:28:00.615887 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002651214599609375 sec | |
W0602 22:28:00.616645 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026297569274902344 sec | |
W0602 22:28:00.617237 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026726722717285156 sec | |
W0602 22:28:00.654618 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.00016427040100097656 sec | |
W0602 22:28:00.655276 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012590885162353516 sec | |
W0602 22:28:00.656152 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.0023670196533203125 sec | |
W0602 22:28:00.658758 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00026988983154296875 sec | |
W0602 22:28:00.659666 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00034999847412109375 sec | |
W0602 22:28:00.660421 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00025963783264160156 sec | |
W0602 22:28:00.661008 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002598762512207031 sec | |
W0602 22:28:00.698722 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.0001583099365234375 sec | |
W0602 22:28:00.699368 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012388229370117188 sec | |
W0602 22:28:00.700251 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.002351522445678711 sec | |
W0602 22:28:00.703298 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0007131099700927734 sec | |
W0602 22:28:00.704129 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002689361572265625 sec | |
W0602 22:28:00.704889 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002589225769042969 sec | |
W0602 22:28:00.705490 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002582073211669922 sec | |
W0602 22:28:00.750689 140467571585024 dispatch.py:272] Finished tracing + transforming ravel for pjit in 0.0001697540283203125 sec | |
W0602 22:28:00.751362 140467571585024 dispatch.py:272] Finished tracing + transforming threefry_2x32 for pjit in 0.0012905597686767578 sec | |
W0602 22:28:00.752371 140467571585024 dispatch.py:272] Finished tracing + transforming _threefry_split_original for pjit in 0.002538919448852539 sec | |
W0602 22:28:00.755024 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002722740173339844 sec | |
W0602 22:28:00.755850 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002663135528564453 sec | |
W0602 22:28:00.756608 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.0002589225769042969 sec | |
W0602 22:28:00.757198 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00026679039001464844 sec | |
W0602 22:28:00.800924 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002942085266113281 sec | |
W0602 22:28:00.802230 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.000308990478515625 sec | |
W0602 22:28:00.823777 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002770423889160156 sec | |
W0602 22:28:00.938540 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0003228187561035156 sec | |
W0602 22:28:00.939468 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00030684471130371094 sec | |
W0602 22:28:00.940159 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.0002620220184326172 sec | |
W0602 22:28:00.940891 140467571585024 dispatch.py:272] Finished tracing + transforming fn for pjit in 0.00025773048400878906 sec | |
W0602 22:28:00.959371 140467571585024 dispatch.py:272] Finished tracing + transforming <lambda> for pjit in 0.00027489662170410156 sec | |
W0602 22:28:01.484800 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion pmap(_wrapped_step_fn) in 0.8817405700683594 sec | |
W0602 22:28:10.412033 140467571585024 dispatch.py:272] Finished XLA compilation of _wrapped_step_fn in 8.916320085525513 sec | |
W0602 22:28:10.421547 140467571585024 dispatch.py:272] Finished tracing + transforming _multi_slice for pjit in 0.0004761219024658203 sec | |
W0602 22:28:10.422143 140467571585024 pxla.py:1882] Compiling _multi_slice for with global shapes and types [ShapedArray(uint32[1,2])]. Argument mapping: (GSPMDSharding({replicated}),). | |
W0602 22:28:10.423828 140467571585024 dispatch.py:272] Finished jaxpr to MLIR module conversion jit(_multi_slice) in 0.0015578269958496094 sec | |
W0602 22:28:10.486968 140467571585024 dispatch.py:272] Finished XLA compilation of jit(_multi_slice) in 0.0628662109375 sec | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1c697e70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c697e70, semaphore=0x556e189338c0, value=15 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=16, fence=0x556e19f6c630 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1c697e70, from_fence=0x556e1a280840 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2825c0, semaphore=0x556e189338c0, value=16 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e1b0149a0, f=0, wait_fence=0x556e1c697e70 {0x556e189338c0:15}, signal_fence=0x556e19f6c630 {0x556e189338c0:16} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19d4ba50 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18df85d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d4ba50, semaphore=0x556e189338c0, value=16 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18df85d0, semaphore=0x556e189338c0, value=16 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=4, buffer=0x556e1a0226e0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=4, wait={0x556e18933900:132}, signal={0x556e18933900:133} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:133}, signal={0x556e18933900:134} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1c697e70 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a2586a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c697e70, semaphore=0x556e18933900, value=134 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2586a0, semaphore=0x556e18933900, value=134 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e18df7630 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:134}, signal={0x556e18933900:135} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:135}, signal={0x556e18933900:136} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1ad9d840 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e18fbca50 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1ad9d840, semaphore=0x556e18933900, value=136 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18fbca50, semaphore=0x556e18933900, value=136 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e193b88a0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:136}, signal={0x556e18933900:137} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:137}, signal={0x556e18933900:138} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1b87c050 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1b87c230 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1b87c050, semaphore=0x556e18933900, value=138 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1b87c230, semaphore=0x556e18933900, value=138 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1ca2f600 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:138}, signal={0x556e18933900:139} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:139}, signal={0x556e18933900:140} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a3672d0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1c9dd550 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3672d0, semaphore=0x556e18933900, value=140 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c9dd550, semaphore=0x556e18933900, value=140 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1a0226e0 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:140}, signal={0x556e18933900:141} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:141}, signal={0x556e18933900:142} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5be9c0 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1af12c90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5be9c0, semaphore=0x556e18933900, value=142 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1af12c90, semaphore=0x556e18933900, value=142 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1ca2f600 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:142}, signal={0x556e18933900:143} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:143}, signal={0x556e18933900:144} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a4d6d90 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5e3ae0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a4d6d90, semaphore=0x556e18933900, value=144 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5e3ae0, semaphore=0x556e18933900, value=144 (OK) | |
:: IREE INVOKE (hal_allocator_allocate_buffer): allocator=0x556e18904e30, size=8192, buffer=0x556e1ca2f600 (OK) | |
:: IREE INVOKE (hal_device_queue_alloca): device=0x556e18904d50, size=8192, wait={0x556e18933900:144}, signal={0x556e18933900:145} (OK) | |
:: IREE INVOKE (hal_device_queue_execute): device=0x556e18904d50, wait={0x556e18933900:145}, signal={0x556e18933900:146} (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e19fd4610 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a0b4920 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19fd4610, semaphore=0x556e18933900, value=146 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a0b4920, semaphore=0x556e18933900, value=146 (OK) | |
:: IREE INVOKE (hal_fence_create): capacity=2, fence=0x556e1a5e31f0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5e31f0, semaphore=0x556e189338c0, value=16 (OK) | |
:: IREE INVOKE (hal_fence_create_at): semaphore=0x556e189338c0, value=17, fence=0x556e19669370 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e191a6280 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18a470c0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1962ee50 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1962ea60, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3f5e20 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a422390, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e197797a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3d6610, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a2a73d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19ffd400, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a420280 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2afb00, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a14ab90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a14abe0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c07d10 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c07d60, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e193ef470 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19140df0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c04ab0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c04b00, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18baf800 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aded80, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1922bd90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a41c090, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3a2b90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3a2be0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a419dd0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e191e42f0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19829890 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e198298e0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19f4deb0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19f4df00, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19d0a1e0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19d0a230, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e199c4af0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535260, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b2be70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533540, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c1a1d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1a220, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a151ab0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c23f90, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19b5a2f0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b5a340, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19533ed0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19533f20, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c24420 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b54260, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19535d90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19535de0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c09d90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e191ecc60, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c05760 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c057b0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a24d3f0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a24d440, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a250bf0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a250c40, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e199c3e30 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e199c3e80, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b4a060 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19aeb4a0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b9fe70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b9fec0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19bbfce0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bbfd30, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a253340 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a253390, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a24d930 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b473e0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19b67f70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19b67fc0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a2ae8a0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a289880, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e190f07e0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e190f0830, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a361100 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a361150, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19bc0660 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19bc06b0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19740d80 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19740dd0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19bc0a10 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18bb3b10, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e195d0b40 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e195d0510, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19955b30 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19955b80, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19809610 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19809660, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19809860 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e198098b0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3c1090 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18f796d0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3c06d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a3c0920, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a10ca60 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10cab0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19623470 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e196234c0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19623760 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a10d670, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19c1af20 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19c1af70, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19acf2d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e19acf320, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3621f0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362240, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3626f0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a362740, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b41e10 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b41e60, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e18b42150 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18b421a0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19d4ba50 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18df85d0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1c697e70 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a2586a0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1ad9d840 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e18fbca50, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1b87c050 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1b87c230, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a3672d0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1c9dd550, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a5be9c0 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1af12c90, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e1a4d6d90 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a5e3ae0, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (hal_fence_extend): into_fence=0x556e1a5e31f0, from_fence=0x556e19fd4610 (OK) | |
:: IREE INVOKE (hal_fence_insert): fence=0x556e1a0b4920, semaphore=0x556e189338c0, value=17 (OK) | |
:: IREE INVOKE (vm_invoke[async]): context=0x556e1add98f0, f=0, wait_fence=0x556e1a5e31f0 {0x556e189338c0:16, 0x556e18933900:146}, signal_fence=0x556e19669370 {0x556e189338c0:17}Fatal Python error: Segmentation fault | |
Current thread 0x00007fc127b4f000 (most recent call first): | |
File "/workspace/jax/jax/_src/interpreters/pxla.py", line 1349 in __call__ | |
File "/workspace/jax/jax/_src/profiler.py", line 314 in wrapper | |
File "/workspace/jax/jax/_src/api.py", line 1774 in cache_miss | |
File "/workspace/jax/jax/_src/traceback_util.py", line 166 in reraise_with_filtered_traceback | |
File "/opt/paxml/paxml/partitioning.py", line 712 in _wrapped_partitioned_step | |
File "/opt/paxml/paxml/programs.py", line 559 in train_step | |
File "/opt/paxml/paxml/programs.py", line 294 in run | |
File "/opt/paxml/paxml/executors.py", line 529 in _train_and_evaluate_common | |
File "/opt/paxml/paxml/executors.py", line 297 in start | |
File "/opt/paxml/paxml/train.py", line 264 in train_and_evaluate | |
File "/opt/paxml/paxml/main.py", line 277 in run_experiment | |
File "/opt/paxml/paxml/main.py", line 400 in run | |
File "/opt/paxml/paxml/main.py", line 456 in main | |
File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254 in _run_main | |
File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308 in run | |
File "/opt/paxml/paxml/main.py", line 486 in <module> | |
Extension modules: jaxlib.cpu_feature_guard, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, zstandard.backend_c, msgpack._cmsgpack, yaml._yaml, google.protobuf.pyext._message, tensorflow.python.framework.fast_tensor_util, charset_normalizer.md, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, PIL._imaging, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pandas._libs.tslib, pandas._libs.ops, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, numpy.linalg.lapack_lite, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._mvn, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, editdistance.bycython, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, matplotlib._image, scipy.cluster._vq, scipy.cluster._hierarchy, scipy.cluster._optimal_leaf_ordering, lxml._elementpath, lxml.etree, sklearn.__check_build._check_build, sklearn.utils.murmurhash, sklearn.utils._isfinite, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.utils._typedefs, sklearn.utils._readonly_array_wrapper, sklearn.metrics._dist_metrics, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, regex._regex, sklearn.feature_extraction._hashing_fast, sklearn.svm._libsvm, sklearn.svm._liblinear, sklearn.svm._libsvm_sparse, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.utils.arrayfuncs, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, sklearn.datasets._svmlight_format_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils (total: 238) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Note ASAN stack Trace: https://gist.github.com/trevor-m/d1d8912f8ab0d96da315a0f6e2f4aff5