stevenkolawole/train_error.txt

## train_error.txt
2023-02-09 11:38:54,186 INFO:Run name: EUR-Lex_bert_20230209113854
2023-02-09 11:38:54,800 INFO:Created a temporary directory at /tmp/tmp5_bwr35i
2023-02-09 11:38:54,800 INFO:Writing /tmp/tmp5_bwr35i/_remote_module_non_scriptable.py
2023-02-09 11:38:55,241 INFO:Global seed set to 1337
2023-02-09 11:38:55,246 INFO:Using device: cuda
2023-02-09 11:38:56,089 INFO:Load data from data/EUR-Lex/train.txt.
2023-02-09 11:38:56,930 INFO:Load data from data/EUR-Lex/test.txt.
2023-02-09 11:38:57,126 INFO:Finish loading dataset (train: 12359 / val: 3090 / test: 3865)
2023-02-09 11:38:57,128 INFO:Initialize model from scratch.
2023-02-09 11:38:57,136 INFO:Read 3956 labels.
/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/parsing.py:268: UserWarning: Attribute 'network' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['network'])`.
  rank_zero_warn(
/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:447: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead.
  rank_zero_deprecation(
2023-02-09 11:38:58,756 INFO:GPU available: True (cuda), used: True
2023-02-09 11:38:58,757 INFO:TPU available: False, using: 0 TPU cores
2023-02-09 11:38:58,757 INFO:IPU available: False, using: 0 IPUs
2023-02-09 11:38:58,757 INFO:HPU available: False, using: 0 HPUs
2023-02-09 11:38:58,800 INFO:`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
2023-02-09 11:38:58,800 INFO:`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
2023-02-09 11:38:58,800 INFO:`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
2023-02-09 11:38:58,800 INFO:Finish writing log to ./runs/EUR-Lex_bert_20230209113854/logs.json.
/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:616: UserWarning: Checkpoint directory /home/tmp/LibMultiLabel/runs/EUR-Lex_bert_20230209113854 exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
Traceback (most recent call last):
  File "main.py", line 222, in <module>
    main()
  File "main.py", line 209, in main
    trainer.train()
  File "/home/tmp/LibMultiLabel/torch_trainer.py", line 210, in train
    self.trainer.fit(self.model, train_loader, val_loader)
  File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1147, in _run
    self.strategy.setup(self)
  File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/single_device.py", line 73, in setup
    self.model_to_device()
  File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/single_device.py", line 70, in model_to_device
    self.model.to(self.root_device)
  File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 113, in to
    return super().to(*args, **kwargs)
  File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 989, in to
    return self._apply(convert)
  File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 23.70 GiB total capacity; 8.00 KiB already allocated; 704.00 KiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
	2023-02-09 11:38:54,186 INFO:Run name: EUR-Lex_bert_20230209113854
	2023-02-09 11:38:54,800 INFO:Created a temporary directory at /tmp/tmp5_bwr35i
	2023-02-09 11:38:54,800 INFO:Writing /tmp/tmp5_bwr35i/_remote_module_non_scriptable.py
	2023-02-09 11:38:55,241 INFO:Global seed set to 1337
	2023-02-09 11:38:55,246 INFO:Using device: cuda
	2023-02-09 11:38:56,089 INFO:Load data from data/EUR-Lex/train.txt.
	2023-02-09 11:38:56,930 INFO:Load data from data/EUR-Lex/test.txt.
	2023-02-09 11:38:57,126 INFO:Finish loading dataset (train: 12359 / val: 3090 / test: 3865)
	2023-02-09 11:38:57,128 INFO:Initialize model from scratch.
	2023-02-09 11:38:57,136 INFO:Read 3956 labels.
	/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/parsing.py:268: UserWarning: Attribute 'network' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['network'])`.
	rank_zero_warn(
	/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:447: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead.
	rank_zero_deprecation(
	2023-02-09 11:38:58,756 INFO:GPU available: True (cuda), used: True
	2023-02-09 11:38:58,757 INFO:TPU available: False, using: 0 TPU cores
	2023-02-09 11:38:58,757 INFO:IPU available: False, using: 0 IPUs
	2023-02-09 11:38:58,757 INFO:HPU available: False, using: 0 HPUs
	2023-02-09 11:38:58,800 INFO:`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
	2023-02-09 11:38:58,800 INFO:`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
	2023-02-09 11:38:58,800 INFO:`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
	2023-02-09 11:38:58,800 INFO:Finish writing log to ./runs/EUR-Lex_bert_20230209113854/logs.json.
	/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:616: UserWarning: Checkpoint directory /home/tmp/LibMultiLabel/runs/EUR-Lex_bert_20230209113854 exists and is not empty.
	rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
	Traceback (most recent call last):
	File "main.py", line 222, in <module>
	main()
	File "main.py", line 209, in main
	trainer.train()
	File "/home/tmp/LibMultiLabel/torch_trainer.py", line 210, in train
	self.trainer.fit(self.model, train_loader, val_loader)
	File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
	self._call_and_handle_interrupt(
	File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
	return trainer_fn(args, *kwargs)
	File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
	results = self._run(model, ckpt_path=self.ckpt_path)
	File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1147, in _run
	self.strategy.setup(self)
	File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/single_device.py", line 73, in setup
	self.model_to_device()
	File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/single_device.py", line 70, in model_to_device
	self.model.to(self.root_device)
	File "/home/tmp/.local/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 113, in to
	return super().to(args, *kwargs)
	File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 989, in to
	return self._apply(convert)
	File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
	module._apply(fn)
	File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
	module._apply(fn)
	File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
	module._apply(fn)
	[Previous line repeated 2 more times]
	File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 664, in _apply
	param_applied = fn(param)
	File "/home/tmp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 987, in convert
	return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
	torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 23.70 GiB total capacity; 8.00 KiB already allocated; 704.00 KiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF