Last active
August 23, 2022 01:31
-
-
Save vgthengane/d953b35f187c7726c80ca27d0a568339 to your computer and use it in GitHub Desktop.
Error logs for L2p issue creation.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I0805 09:46:50.326969 140593525006912 xla_bridge.py:328] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: | |
I0805 09:46:50.384492 140593525006912 xla_bridge.py:328] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA Host Interpreter | |
I0805 09:46:50.385140 140593525006912 xla_bridge.py:328] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' | |
I0805 09:46:50.385412 140593525006912 main.py:68] JAX host: 0 / 1 | |
I0805 09:46:50.385540 140593525006912 main.py:69] JAX devices: [GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0), GpuDevice(id=2, process_index=0), GpuDevice(id=3, process_index=0)] | |
I0805 09:46:50.918455 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:50.923666 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:51.525545 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:51.961479 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:52.294708 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:52.295063 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:52.673859 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:52.675483 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:52.676127 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:52.678056 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:52.765921 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:52.766144 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:52.830416 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:52.832532 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:52.833179 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:52.835006 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:52.888651 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:52.888859 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:52.950469 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:52.952332 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:52.953294 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:52.956229 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.017944 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.018159 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.082077 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.083549 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:53.084056 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:53.085788 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.140528 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.140734 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.203151 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.204576 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:53.205087 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:53.206822 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.262300 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.262514 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.325390 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.327229 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:53.327750 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:53.329538 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.384052 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.384286 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.447248 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.448775 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:53.449291 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:53.451020 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.509752 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.510023 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.573357 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.574981 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:53.575496 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:53.577274 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.631870 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.632078 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.693382 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.694814 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:53.695322 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:53.697066 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.751270 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.751478 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.812805 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:46:53.814285 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2) | |
I0805 09:46:53.814789 140593525006912 input_pipeline.py:267] Use 224 input size for cifar. | |
I0805 09:46:53.816536 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
W0805 09:46:53.871267 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES. | |
I0805 09:46:53.871470 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2 | |
I0805 09:47:09.165291 140593525006912 parameter_overview.py:257] | |
+-------------------------------------------------------------------------+------------------+-----------+-----------+----------+ | |
| Name | Shape | Size | Mean | Std | | |
+-------------------------------------------------------------------------+------------------+-----------+-----------+----------+ | |
| Transformer/encoder_norm/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoder_norm/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_0/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_0/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_0/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_0/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_0/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | -1.9e-08 | 9.93e-07 | | |
| Transformer/encoderblock_0/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -8.83e-06 | 0.0228 | | |
| Transformer/encoderblock_0/MlpBlock_0/Dense_1/bias | (768,) | 768 | -2.93e-08 | 9.97e-07 | | |
| Transformer/encoderblock_0/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -1.54e-05 | 0.0228 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 6e-06 | 0.036 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -1.54e-05 | 0.0361 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -5.84e-05 | 0.0361 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 5.33e-06 | 0.0361 | | |
| Transformer/encoderblock_1/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_1/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_1/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_1/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_1/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | -6.25e-09 | 9.97e-07 | | |
| Transformer/encoderblock_1/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -1.67e-05 | 0.0228 | | |
| Transformer/encoderblock_1/MlpBlock_0/Dense_1/bias | (768,) | 768 | 2.54e-08 | 9.87e-07 | | |
| Transformer/encoderblock_1/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 8.9e-06 | 0.0228 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 1.68e-05 | 0.0361 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -1.34e-05 | 0.0361 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -2.13e-05 | 0.0361 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -2.95e-05 | 0.0361 | | |
| Transformer/encoderblock_10/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_10/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_10/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_10/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_10/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.84e-08 | 9.86e-07 | | |
| Transformer/encoderblock_10/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 4.75e-06 | 0.0228 | | |
| Transformer/encoderblock_10/MlpBlock_0/Dense_1/bias | (768,) | 768 | 1.87e-08 | 9.97e-07 | | |
| Transformer/encoderblock_10/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -7.17e-06 | 0.0228 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -7.58e-05 | 0.0361 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 2.26e-05 | 0.0361 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 1.63e-05 | 0.0361 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -4.45e-05 | 0.0361 | | |
| Transformer/encoderblock_11/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_11/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_11/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_11/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_11/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 2.04e-08 | 9.99e-07 | | |
| Transformer/encoderblock_11/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 6.49e-06 | 0.0228 | | |
| Transformer/encoderblock_11/MlpBlock_0/Dense_1/bias | (768,) | 768 | -5.32e-08 | 1e-06 | | |
| Transformer/encoderblock_11/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 3.03e-05 | 0.0228 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -0.000117 | 0.0361 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 5.31e-05 | 0.0361 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 3.91e-05 | 0.036 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 0.000127 | 0.0361 | | |
| Transformer/encoderblock_2/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_2/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_2/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_2/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_2/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.28e-08 | 9.91e-07 | | |
| Transformer/encoderblock_2/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -1.18e-05 | 0.0228 | | |
| Transformer/encoderblock_2/MlpBlock_0/Dense_1/bias | (768,) | 768 | 3.42e-08 | 9.99e-07 | | |
| Transformer/encoderblock_2/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 1.34e-05 | 0.0228 | | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -1.46e-05 | 0.0361 | | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
I0805 09:47:09.166202 140593525006912 parameter_overview.py:257] | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -1.77e-05 | 0.0361 | | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 4.85e-05 | 0.0361 | | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 8.71e-06 | 0.036 | | |
| Transformer/encoderblock_3/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_3/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_3/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_3/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_3/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 6.5e-09 | 9.95e-07 | | |
| Transformer/encoderblock_3/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -1.89e-06 | 0.0228 | | |
| Transformer/encoderblock_3/MlpBlock_0/Dense_1/bias | (768,) | 768 | -3.05e-09 | 1.03e-06 | | |
| Transformer/encoderblock_3/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 2.28e-05 | 0.0228 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -9.72e-05 | 0.0361 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -8.53e-05 | 0.0361 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 4.07e-06 | 0.0361 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -1.37e-05 | 0.0361 | | |
| Transformer/encoderblock_4/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_4/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_4/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_4/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_4/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.06e-08 | 1.01e-06 | | |
| Transformer/encoderblock_4/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -8.63e-06 | 0.0228 | | |
| Transformer/encoderblock_4/MlpBlock_0/Dense_1/bias | (768,) | 768 | 3.19e-08 | 9.7e-07 | | |
| Transformer/encoderblock_4/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -2.09e-06 | 0.0228 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 3.22e-05 | 0.0361 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 4.29e-05 | 0.0361 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 4.54e-05 | 0.0361 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 1.3e-05 | 0.0361 | | |
| Transformer/encoderblock_5/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_5/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_5/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_5/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_5/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.63e-08 | 9.93e-07 | | |
| Transformer/encoderblock_5/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 3.8e-05 | 0.0228 | | |
| Transformer/encoderblock_5/MlpBlock_0/Dense_1/bias | (768,) | 768 | 1.4e-08 | 1.02e-06 | | |
| Transformer/encoderblock_5/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 1.97e-05 | 0.0228 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 2.94e-05 | 0.0361 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 4.3e-05 | 0.0361 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -6.6e-06 | 0.0361 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 3e-05 | 0.0361 | | |
| Transformer/encoderblock_6/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_6/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_6/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_6/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_6/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 7.18e-09 | 9.81e-07 | | |
| Transformer/encoderblock_6/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 7.92e-07 | 0.0228 | | |
| Transformer/encoderblock_6/MlpBlock_0/Dense_1/bias | (768,) | 768 | -5.73e-08 | 1.01e-06 | | |
| Transformer/encoderblock_6/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 2.52e-06 | 0.0228 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -0.000101 | 0.0361 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -2.27e-05 | 0.0361 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -5.86e-05 | 0.0361 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 2.08e-05 | 0.0361 | | |
| Transformer/encoderblock_7/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_7/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_7/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_7/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_7/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | -1.85e-08 | 1e-06 | | |
| Transformer/encoderblock_7/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -3.97e-06 | 0.0228 | | |
| Transformer/encoderblock_7/MlpBlock_0/Dense_1/bias | (768,) | 768 | -4.66e-08 | 1.01e-06 | | |
| Transformer/encoderblock_7/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -3.69e-06 | 0.0228 | | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 4.47e-05 | 0.0361 | | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
I0805 09:47:09.166350 140593525006912 parameter_overview.py:257] | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 7.01e-06 | 0.0361 | | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 5.92e-05 | 0.0361 | | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -7.09e-06 | 0.0361 | | |
| Transformer/encoderblock_8/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_8/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_8/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_8/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_8/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 8.71e-09 | 9.77e-07 | | |
| Transformer/encoderblock_8/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -3.12e-06 | 0.0228 | | |
| Transformer/encoderblock_8/MlpBlock_0/Dense_1/bias | (768,) | 768 | -2.48e-08 | 9.98e-07 | | |
| Transformer/encoderblock_8/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 1.77e-05 | 0.0228 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 1.8e-05 | 0.0361 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 8.83e-06 | 0.036 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -4.51e-05 | 0.0361 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -7.51e-05 | 0.036 | | |
| Transformer/encoderblock_9/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_9/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_9/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_9/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 | | |
| Transformer/encoderblock_9/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 3.17e-08 | 9.95e-07 | | |
| Transformer/encoderblock_9/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -2.8e-05 | 0.0228 | | |
| Transformer/encoderblock_9/MlpBlock_0/Dense_1/bias | (768,) | 768 | -1.3e-08 | 9.88e-07 | | |
| Transformer/encoderblock_9/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -2.09e-05 | 0.0228 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 6.81e-06 | 0.0361 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -5.62e-05 | 0.0361 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 7.07e-05 | 0.0361 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 | | |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -7.58e-05 | 0.0361 | | |
| Transformer/posembed_input/pos_embedding | (1, 197, 768) | 151,296 | -0.000113 | 0.02 | | |
| cls | (1, 1, 768) | 768 | 0.0 | 0.0 | | |
| embedding/bias | (768,) | 768 | 0.0 | 0.0 | | |
| embedding/kernel | (16, 16, 3, 768) | 589,824 | 5.29e-05 | 0.0361 | | |
| head/bias | (100,) | 100 | 0.0 | 0.0 | | |
| head/kernel | (768, 100) | 76,800 | 0.0 | 0.0 | | |
| prompt_pool/key | (10, 768) | 7,680 | 0.00504 | 0.00288 | | |
| prompt_pool/prompt | (1, 10, 10, 768) | 76,800 | 0.00501 | 0.0029 | | |
+-------------------------------------------------------------------------+------------------+-----------+-----------+----------+ | |
Total: 85,960,036 | |
2022-08-05 09:47:19.440468: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.00MiB (rounded to 9437184)requested by op | |
2022-08-05 09:47:19.442356: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491] *********************************************************************************x**************x*** | |
2022-08-05 09:47:19.442485: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes. | |
BufferAssignment OOM Debugging. | |
BufferAssignment stats: | |
parameter allocation: 4B | |
constant allocation: 0B | |
maybe_live_out allocation: 9.00MiB | |
preallocated temp allocation: 0B | |
total allocation: 9.00MiB | |
total fragmentation: 0B (0.00%) | |
Peak buffers: | |
Buffer 1: | |
Size: 9.00MiB | |
Operator: op_name="jit(broadcast_in_dim)/jit(main)/broadcast_in_dim[shape=(768, 3072) broadcast_dimensions=()]" source_file="/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/adam.py" source_line=89 | |
XLA Label: broadcast | |
Shape: f32[768,3072] | |
========================== | |
Buffer 2: | |
Size: 4B | |
Entry Parameter Subshape: f32[] | |
========================== | |
/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py:488: FutureWarning: jax.tree_map is deprecated, and will be removed in a future release. Use jax.tree_util.tree_map instead. | |
param_states = jax.tree_map(_ShapeDtype.create, params) | |
/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/core/frozen_dict.py:196: FutureWarning: jax.tree_map is deprecated, and will be removed in a future release. Use jax.tree_util.tree_map instead. | |
return jax.tree_map(lambda y: y, x._dict) | |
Visible devices cannot be modified after being initialized | |
Traceback (most recent call last): | |
File "main.py", line 78, in <module> | |
app.run(main) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/absl/app.py", line 312, in run | |
_run_main(main, args) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main | |
sys.exit(main(argv)) | |
File "main.py", line 72, in main | |
train_continual.train_and_evaluate(FLAGS.my_config, FLAGS.workdir) | |
File "/nfs/users/ext_vishal.thengane/workspace/l2p/train_continual.py", line 942, in train_and_evaluate | |
model, state = create_train_state( | |
File "/nfs/users/ext_vishal.thengane/workspace/l2p/train_continual.py", line 145, in create_train_state | |
optimizer = create_optimizer(config, params) | |
File "/nfs/users/ext_vishal.thengane/workspace/l2p/train_continual.py", line 103, in create_optimizer | |
optimizer = opt_def.create(params) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py", line 142, in create | |
state = opt_def.init_state(target) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py", line 502, in init_state | |
param_states = traversal.update(lambda x: opt.init_param_state(x._value), param_states) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/traverse_util.py", line 431, in update | |
value = fn(value) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py", line 502, in <lambda> | |
param_states = traversal.update(lambda x: opt.init_param_state(x._value), param_states) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/adam.py", line 89, in init_param_state | |
return _AdamParamState(jnp.zeros_like(param), jnp.zeros_like(param)) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 1926, in zeros_like | |
return lax.full_like(a, 0, dtype, shape) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1284, in full_like | |
return full(fill_shape, _convert_element_type(fill_value, dtype, weak_type)) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1142, in full | |
return broadcast(fill_value, shape) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 760, in broadcast | |
return broadcast_in_dim(operand, tuple(sizes) + np.shape(operand), dims) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 789, in broadcast_in_dim | |
return broadcast_in_dim_p.bind( | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/core.py", line 324, in bind | |
return self.bind_with_trace(find_top_trace(args), args, params) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/core.py", line 327, in bind_with_trace | |
out = trace.process_primitive(self, map(trace.full_raise, args), params) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/core.py", line 684, in process_primitive | |
return primitive.impl(*tracers, **params) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/dispatch.py", line 101, in apply_primitive | |
return compiled_fun(*args) | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/dispatch.py", line 167, in <lambda> | |
return lambda *args, **kw: compiled(*args, **kw)[0] | |
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/dispatch.py", line 733, in _execute_compiled | |
out_flat = compiled.execute(in_flat) | |
jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes. | |
BufferAssignment OOM Debugging. | |
BufferAssignment stats: | |
parameter allocation: 4B | |
constant allocation: 0B | |
maybe_live_out allocation: 9.00MiB | |
preallocated temp allocation: 0B | |
total allocation: 9.00MiB | |
total fragmentation: 0B (0.00%) | |
Peak buffers: | |
Buffer 1: | |
Size: 9.00MiB | |
Operator: op_name="jit(broadcast_in_dim)/jit(main)/broadcast_in_dim[shape=(768, 3072) broadcast_dimensions=()]" source_file="/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/adam.py" source_line=89 | |
XLA Label: broadcast | |
Shape: f32[768,3072] | |
========================== | |
Buffer 2: | |
Size: 4B | |
Entry Parameter Subshape: f32[] | |
========================== | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hi Vishal. this is shashank here. i am facing the same problem you had in implementing GNN for MTS anomaly detection. did you overcome this problem ? it will be great for me if you explain to me. Thanks in advance.