Skip to content

Instantly share code, notes, and snippets.

@vgthengane
Last active August 23, 2022 01:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vgthengane/d953b35f187c7726c80ca27d0a568339 to your computer and use it in GitHub Desktop.
Save vgthengane/d953b35f187c7726c80ca27d0a568339 to your computer and use it in GitHub Desktop.
Error logs for L2p issue creation.
I0805 09:46:50.326969 140593525006912 xla_bridge.py:328] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I0805 09:46:50.384492 140593525006912 xla_bridge.py:328] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA Host Interpreter
I0805 09:46:50.385140 140593525006912 xla_bridge.py:328] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0805 09:46:50.385412 140593525006912 main.py:68] JAX host: 0 / 1
I0805 09:46:50.385540 140593525006912 main.py:69] JAX devices: [GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0), GpuDevice(id=2, process_index=0), GpuDevice(id=3, process_index=0)]
I0805 09:46:50.918455 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:50.923666 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:51.525545 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:51.961479 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:52.294708 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:52.295063 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:52.673859 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:52.675483 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:52.676127 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:52.678056 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:52.765921 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:52.766144 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:52.830416 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:52.832532 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:52.833179 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:52.835006 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:52.888651 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:52.888859 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:52.950469 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:52.952332 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:52.953294 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:52.956229 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.017944 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.018159 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.082077 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.083549 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:53.084056 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:53.085788 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.140528 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.140734 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.203151 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.204576 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:53.205087 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:53.206822 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.262300 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.262514 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.325390 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.327229 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:53.327750 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:53.329538 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.384052 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.384286 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.447248 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.448775 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:53.449291 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:53.451020 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.509752 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.510023 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.573357 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.574981 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:53.575496 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:53.577274 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.631870 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.632078 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.693382 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.694814 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:53.695322 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:53.697066 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.751270 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.751478 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.812805 140593525006912 dataset_info.py:365] Load dataset info from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:46:53.814285 140593525006912 dataset_builder.py:351] Reusing dataset cifar100 (/nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2)
I0805 09:46:53.814789 140593525006912 input_pipeline.py:267] Use 224 input size for cifar.
I0805 09:46:53.816536 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='train', from_=0, to=50000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
W0805 09:46:53.871267 140593525006912 deterministic_data.py:202] `drop_remainder` is deprecated. Please pass `remainder_options` instead. `remainder_options` is reset with RemainderOptions.BALANCE_ON_PROCESSES.
I0805 09:46:53.871470 140593525006912 logging_logger.py:33] Constructing tf.data.Dataset cifar100 for split ReadInstruction([_RelativeInstruction(splitname='test', from_=0, to=10000, unit='abs', rounding='closest')]), from /nfs/users/ext_vishal.thengane/tensorflow_datasets/cifar100/3.0.2
I0805 09:47:09.165291 140593525006912 parameter_overview.py:257]
+-------------------------------------------------------------------------+------------------+-----------+-----------+----------+
| Name | Shape | Size | Mean | Std |
+-------------------------------------------------------------------------+------------------+-----------+-----------+----------+
| Transformer/encoder_norm/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoder_norm/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_0/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_0/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_0/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_0/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_0/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | -1.9e-08 | 9.93e-07 |
| Transformer/encoderblock_0/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -8.83e-06 | 0.0228 |
| Transformer/encoderblock_0/MlpBlock_0/Dense_1/bias | (768,) | 768 | -2.93e-08 | 9.97e-07 |
| Transformer/encoderblock_0/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -1.54e-05 | 0.0228 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 6e-06 | 0.036 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -1.54e-05 | 0.0361 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -5.84e-05 | 0.0361 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_0/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 5.33e-06 | 0.0361 |
| Transformer/encoderblock_1/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_1/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_1/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_1/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_1/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | -6.25e-09 | 9.97e-07 |
| Transformer/encoderblock_1/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -1.67e-05 | 0.0228 |
| Transformer/encoderblock_1/MlpBlock_0/Dense_1/bias | (768,) | 768 | 2.54e-08 | 9.87e-07 |
| Transformer/encoderblock_1/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 8.9e-06 | 0.0228 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 1.68e-05 | 0.0361 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -1.34e-05 | 0.0361 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -2.13e-05 | 0.0361 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_1/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -2.95e-05 | 0.0361 |
| Transformer/encoderblock_10/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_10/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_10/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_10/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_10/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.84e-08 | 9.86e-07 |
| Transformer/encoderblock_10/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 4.75e-06 | 0.0228 |
| Transformer/encoderblock_10/MlpBlock_0/Dense_1/bias | (768,) | 768 | 1.87e-08 | 9.97e-07 |
| Transformer/encoderblock_10/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -7.17e-06 | 0.0228 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -7.58e-05 | 0.0361 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 2.26e-05 | 0.0361 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 1.63e-05 | 0.0361 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_10/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -4.45e-05 | 0.0361 |
| Transformer/encoderblock_11/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_11/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_11/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_11/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_11/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 2.04e-08 | 9.99e-07 |
| Transformer/encoderblock_11/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 6.49e-06 | 0.0228 |
| Transformer/encoderblock_11/MlpBlock_0/Dense_1/bias | (768,) | 768 | -5.32e-08 | 1e-06 |
| Transformer/encoderblock_11/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 3.03e-05 | 0.0228 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -0.000117 | 0.0361 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 5.31e-05 | 0.0361 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 3.91e-05 | 0.036 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_11/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 0.000127 | 0.0361 |
| Transformer/encoderblock_2/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_2/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_2/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_2/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_2/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.28e-08 | 9.91e-07 |
| Transformer/encoderblock_2/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -1.18e-05 | 0.0228 |
| Transformer/encoderblock_2/MlpBlock_0/Dense_1/bias | (768,) | 768 | 3.42e-08 | 9.99e-07 |
| Transformer/encoderblock_2/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 1.34e-05 | 0.0228 |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -1.46e-05 | 0.0361 |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
I0805 09:47:09.166202 140593525006912 parameter_overview.py:257]
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -1.77e-05 | 0.0361 |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 4.85e-05 | 0.0361 |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_2/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 8.71e-06 | 0.036 |
| Transformer/encoderblock_3/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_3/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_3/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_3/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_3/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 6.5e-09 | 9.95e-07 |
| Transformer/encoderblock_3/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -1.89e-06 | 0.0228 |
| Transformer/encoderblock_3/MlpBlock_0/Dense_1/bias | (768,) | 768 | -3.05e-09 | 1.03e-06 |
| Transformer/encoderblock_3/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 2.28e-05 | 0.0228 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -9.72e-05 | 0.0361 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -8.53e-05 | 0.0361 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 4.07e-06 | 0.0361 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_3/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -1.37e-05 | 0.0361 |
| Transformer/encoderblock_4/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_4/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_4/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_4/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_4/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.06e-08 | 1.01e-06 |
| Transformer/encoderblock_4/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -8.63e-06 | 0.0228 |
| Transformer/encoderblock_4/MlpBlock_0/Dense_1/bias | (768,) | 768 | 3.19e-08 | 9.7e-07 |
| Transformer/encoderblock_4/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -2.09e-06 | 0.0228 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 3.22e-05 | 0.0361 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 4.29e-05 | 0.0361 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 4.54e-05 | 0.0361 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_4/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 1.3e-05 | 0.0361 |
| Transformer/encoderblock_5/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_5/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_5/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_5/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_5/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 1.63e-08 | 9.93e-07 |
| Transformer/encoderblock_5/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 3.8e-05 | 0.0228 |
| Transformer/encoderblock_5/MlpBlock_0/Dense_1/bias | (768,) | 768 | 1.4e-08 | 1.02e-06 |
| Transformer/encoderblock_5/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 1.97e-05 | 0.0228 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 2.94e-05 | 0.0361 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 4.3e-05 | 0.0361 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -6.6e-06 | 0.0361 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_5/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 3e-05 | 0.0361 |
| Transformer/encoderblock_6/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_6/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_6/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_6/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_6/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 7.18e-09 | 9.81e-07 |
| Transformer/encoderblock_6/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | 7.92e-07 | 0.0228 |
| Transformer/encoderblock_6/MlpBlock_0/Dense_1/bias | (768,) | 768 | -5.73e-08 | 1.01e-06 |
| Transformer/encoderblock_6/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 2.52e-06 | 0.0228 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | -0.000101 | 0.0361 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -2.27e-05 | 0.0361 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -5.86e-05 | 0.0361 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_6/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | 2.08e-05 | 0.0361 |
| Transformer/encoderblock_7/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_7/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_7/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_7/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_7/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | -1.85e-08 | 1e-06 |
| Transformer/encoderblock_7/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -3.97e-06 | 0.0228 |
| Transformer/encoderblock_7/MlpBlock_0/Dense_1/bias | (768,) | 768 | -4.66e-08 | 1.01e-06 |
| Transformer/encoderblock_7/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -3.69e-06 | 0.0228 |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 4.47e-05 | 0.0361 |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
I0805 09:47:09.166350 140593525006912 parameter_overview.py:257]
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 7.01e-06 | 0.0361 |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 5.92e-05 | 0.0361 |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_7/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -7.09e-06 | 0.0361 |
| Transformer/encoderblock_8/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_8/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_8/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_8/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_8/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 8.71e-09 | 9.77e-07 |
| Transformer/encoderblock_8/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -3.12e-06 | 0.0228 |
| Transformer/encoderblock_8/MlpBlock_0/Dense_1/bias | (768,) | 768 | -2.48e-08 | 9.98e-07 |
| Transformer/encoderblock_8/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | 1.77e-05 | 0.0228 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 1.8e-05 | 0.0361 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | 8.83e-06 | 0.036 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | -4.51e-05 | 0.0361 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_8/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -7.51e-05 | 0.036 |
| Transformer/encoderblock_9/LayerNorm_0/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_9/LayerNorm_0/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_9/LayerNorm_1/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_9/LayerNorm_1/scale | (768,) | 768 | 1.0 | 0.0 |
| Transformer/encoderblock_9/MlpBlock_0/Dense_0/bias | (3072,) | 3,072 | 3.17e-08 | 9.95e-07 |
| Transformer/encoderblock_9/MlpBlock_0/Dense_0/kernel | (768, 3072) | 2,359,296 | -2.8e-05 | 0.0228 |
| Transformer/encoderblock_9/MlpBlock_0/Dense_1/bias | (768,) | 768 | -1.3e-08 | 9.88e-07 |
| Transformer/encoderblock_9/MlpBlock_0/Dense_1/kernel | (3072, 768) | 2,359,296 | -2.09e-05 | 0.0228 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/key/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/key/kernel | (768, 12, 64) | 589,824 | 6.81e-06 | 0.0361 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/out/bias | (768,) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/out/kernel | (12, 64, 768) | 589,824 | -5.62e-05 | 0.0361 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/query/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/query/kernel | (768, 12, 64) | 589,824 | 7.07e-05 | 0.0361 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/value/bias | (12, 64) | 768 | 0.0 | 0.0 |
| Transformer/encoderblock_9/MultiHeadDotProductAttention_0/value/kernel | (768, 12, 64) | 589,824 | -7.58e-05 | 0.0361 |
| Transformer/posembed_input/pos_embedding | (1, 197, 768) | 151,296 | -0.000113 | 0.02 |
| cls | (1, 1, 768) | 768 | 0.0 | 0.0 |
| embedding/bias | (768,) | 768 | 0.0 | 0.0 |
| embedding/kernel | (16, 16, 3, 768) | 589,824 | 5.29e-05 | 0.0361 |
| head/bias | (100,) | 100 | 0.0 | 0.0 |
| head/kernel | (768, 100) | 76,800 | 0.0 | 0.0 |
| prompt_pool/key | (10, 768) | 7,680 | 0.00504 | 0.00288 |
| prompt_pool/prompt | (1, 10, 10, 768) | 76,800 | 0.00501 | 0.0029 |
+-------------------------------------------------------------------------+------------------+-----------+-----------+----------+
Total: 85,960,036
2022-08-05 09:47:19.440468: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.00MiB (rounded to 9437184)requested by op
2022-08-05 09:47:19.442356: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491] *********************************************************************************x**************x***
2022-08-05 09:47:19.442485: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 4B
constant allocation: 0B
maybe_live_out allocation: 9.00MiB
preallocated temp allocation: 0B
total allocation: 9.00MiB
total fragmentation: 0B (0.00%)
Peak buffers:
Buffer 1:
Size: 9.00MiB
Operator: op_name="jit(broadcast_in_dim)/jit(main)/broadcast_in_dim[shape=(768, 3072) broadcast_dimensions=()]" source_file="/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/adam.py" source_line=89
XLA Label: broadcast
Shape: f32[768,3072]
==========================
Buffer 2:
Size: 4B
Entry Parameter Subshape: f32[]
==========================
/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py:488: FutureWarning: jax.tree_map is deprecated, and will be removed in a future release. Use jax.tree_util.tree_map instead.
param_states = jax.tree_map(_ShapeDtype.create, params)
/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/core/frozen_dict.py:196: FutureWarning: jax.tree_map is deprecated, and will be removed in a future release. Use jax.tree_util.tree_map instead.
return jax.tree_map(lambda y: y, x._dict)
Visible devices cannot be modified after being initialized
Traceback (most recent call last):
File "main.py", line 78, in <module>
app.run(main)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "main.py", line 72, in main
train_continual.train_and_evaluate(FLAGS.my_config, FLAGS.workdir)
File "/nfs/users/ext_vishal.thengane/workspace/l2p/train_continual.py", line 942, in train_and_evaluate
model, state = create_train_state(
File "/nfs/users/ext_vishal.thengane/workspace/l2p/train_continual.py", line 145, in create_train_state
optimizer = create_optimizer(config, params)
File "/nfs/users/ext_vishal.thengane/workspace/l2p/train_continual.py", line 103, in create_optimizer
optimizer = opt_def.create(params)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py", line 142, in create
state = opt_def.init_state(target)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py", line 502, in init_state
param_states = traversal.update(lambda x: opt.init_param_state(x._value), param_states)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/traverse_util.py", line 431, in update
value = fn(value)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/base.py", line 502, in <lambda>
param_states = traversal.update(lambda x: opt.init_param_state(x._value), param_states)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/adam.py", line 89, in init_param_state
return _AdamParamState(jnp.zeros_like(param), jnp.zeros_like(param))
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 1926, in zeros_like
return lax.full_like(a, 0, dtype, shape)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1284, in full_like
return full(fill_shape, _convert_element_type(fill_value, dtype, weak_type))
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1142, in full
return broadcast(fill_value, shape)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 760, in broadcast
return broadcast_in_dim(operand, tuple(sizes) + np.shape(operand), dims)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 789, in broadcast_in_dim
return broadcast_in_dim_p.bind(
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/core.py", line 324, in bind
return self.bind_with_trace(find_top_trace(args), args, params)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/core.py", line 327, in bind_with_trace
out = trace.process_primitive(self, map(trace.full_raise, args), params)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/core.py", line 684, in process_primitive
return primitive.impl(*tracers, **params)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/dispatch.py", line 101, in apply_primitive
return compiled_fun(*args)
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/dispatch.py", line 167, in <lambda>
return lambda *args, **kw: compiled(*args, **kw)[0]
File "/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/jax/_src/dispatch.py", line 733, in _execute_compiled
out_flat = compiled.execute(in_flat)
jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 4B
constant allocation: 0B
maybe_live_out allocation: 9.00MiB
preallocated temp allocation: 0B
total allocation: 9.00MiB
total fragmentation: 0B (0.00%)
Peak buffers:
Buffer 1:
Size: 9.00MiB
Operator: op_name="jit(broadcast_in_dim)/jit(main)/broadcast_in_dim[shape=(768, 3072) broadcast_dimensions=()]" source_file="/nfs/users/ext_vishal.thengane/miniconda3/envs/l2p/lib/python3.8/site-packages/flax/optim/adam.py" source_line=89
XLA Label: broadcast
Shape: f32[768,3072]
==========================
Buffer 2:
Size: 4B
Entry Parameter Subshape: f32[]
==========================
@shasha2408
Copy link

hi Vishal. this is shashank here. i am facing the same problem you had in implementing GNN for MTS anomaly detection. did you overcome this problem ? it will be great for me if you explain to me. Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment