Created
February 6, 2025 16:04
-
-
Save hacobe/6f8ce6710df0f9c6ccb8fdd06b4b3d05 to your computer and use it in GitHub Desktop.
"RuntimeError: The weights trying to be saved contained shared tensors" when saving a checkpoint
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
~/weak-to-strong$ python train_simple.py | |
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4 | |
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" | |
2025-02-06 15:50:40.128067: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. | |
2025-02-06 15:50:40.139675: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered | |
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR | |
E0000 00:00:1738857040.153991 3821 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered | |
E0000 00:00:1738857040.157866 3821 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered | |
2025-02-06 15:50:40.170523: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. | |
To enable the following instructions: AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI, in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
Warning sciq has less than 20000 docs, using all: Index 19999 out of range for dataset of size 11679. | |
Warning sciq has less than 10000 docs, using all: Index 9999 out of range for dataset of size 1000. | |
len(train1): 5839 len(train2): 5840 | |
Training model model, size gpt2 | |
LR 5e-05 batch_size 32 minibatch_size 1 | |
Step: 0/364 Recent losses: 0.6931471824645996 0.53125 1 | |
Step: 10/364 Recent losses: 0.6948256373405457 0.528125 10 | |
Step: 20/364 Recent losses: 0.6936358749866486 0.465625 10 | |
Step: 30/364 Recent losses: 0.6933627307415009 0.490625 10 | |
Step: 40/364 Recent losses: 0.6931856632232666 0.4875 10 | |
Step: 50/364 Recent losses: 0.6935216307640075 0.475 10 | |
Step: 60/364 Recent losses: 0.6913963377475738 0.50625 10 | |
Step: 70/364 Recent losses: 0.6890555322170258 0.55625 10 | |
Step: 80/364 Recent losses: 0.6870683670043946 0.56875 10 | |
Step: 90/364 Recent losses: 0.6914982795715332 0.54375 10 | |
Step: 100/364 Recent losses: 0.6720430195331574 0.61875 10 | |
Step: 110/364 Recent losses: 0.7044513106346131 0.5125 10 | |
Step: 120/364 Recent losses: 0.674769926071167 0.553125 10 | |
Step: 130/364 Recent losses: 0.6654023945331573 0.6125 10 | |
Step: 140/364 Recent losses: 0.6679559290409088 0.615625 10 | |
Step: 150/364 Recent losses: 0.6934084057807922 0.553125 10 | |
Step: 160/364 Recent losses: 0.6785114765167236 0.59375 10 | |
Step: 170/364 Recent losses: 0.6639248609542847 0.621875 10 | |
Step: 180/364 Recent losses: 0.6477442264556885 0.640625 10 | |
Step: 190/364 Recent losses: 0.6538057565689087 0.625 10 | |
Step: 200/364 Recent losses: 0.6566005527973175 0.603125 10 | |
Step: 210/364 Recent losses: 0.6251852810382843 0.665625 10 | |
Step: 220/364 Recent losses: 0.5724308907985687 0.728125 10 | |
Step: 230/364 Recent losses: 0.5788760840892792 0.728125 10 | |
Step: 240/364 Recent losses: 0.5940948903560639 0.7 10 | |
Step: 250/364 Recent losses: 0.5363911271095276 0.759375 10 | |
Step: 260/364 Recent losses: 0.5096236258745194 0.78125 10 | |
Step: 270/364 Recent losses: 0.48319518864154815 0.784375 10 | |
Step: 280/364 Recent losses: 0.5106918245553971 0.7625 10 | |
Step: 290/364 Recent losses: 0.5980292201042176 0.6875 10 | |
Step: 300/364 Recent losses: 0.5381115406751633 0.740625 10 | |
Step: 310/364 Recent losses: 0.5191604405641556 0.759375 10 | |
Step: 320/364 Recent losses: 0.5419570624828338 0.73125 10 | |
Step: 330/364 Recent losses: 0.5074462443590164 0.7625 10 | |
Step: 340/364 Recent losses: 0.5323177635669708 0.74375 10 | |
Step: 350/364 Recent losses: 0.5150761365890503 0.759375 10 | |
Step: 360/364 Recent losses: 0.5014643102884293 0.7875 10 | |
Final evaluation: | |
Accuracy: 0.656 +/- 0.015022117027902557 | |
Model training took 141.28682947158813 seconds | |
Traceback (most recent call last): | |
File "/home/ubuntu/weak-to-strong/train_simple.py", line 327, in <module> | |
fire.Fire(main) | |
File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire | |
component_trace = _Fire(component, args, parsed_flag_args, context, name) | |
File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire | |
component, remaining_args = _CallAndUpdateTrace( | |
File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace | |
component = fn(*varargs, **kwargs) | |
File "/home/ubuntu/weak-to-strong/train_simple.py", line 280, in main | |
test_results, weak_ds = train_and_save_model( | |
File "/home/ubuntu/weak-to-strong/weak_to_strong/train.py", line 271, in train_and_save_model | |
(model if hasattr(model, "save_pretrained") else model.module).save_pretrained( | |
File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2905, in save_pretrained | |
raise RuntimeError( | |
RuntimeError: The weights trying to be saved contained shared tensors [{'transformer.wte.weight', 'lm.transformer.wte.weight'}, {'lm.transformer.wpe.weight', 'transformer.wpe.weight'}, {'transformer.h.0.ln_1.weight', 'lm.transformer.h.0.ln_1.weight'}, {'lm.transformer.h.0.ln_1.bias', 'transformer.h.0.ln_1.bias'}, {'transformer.h.0.attn.c_attn.weight', 'lm.transformer.h.0.attn.c_attn.weight'}, {'lm.transformer.h.0.attn.c_attn.bias', 'transformer.h.0.attn.c_attn.bias'}, {'transformer.h.0.attn.c_proj.weight', 'lm.transformer.h.0.attn.c_proj.weight'}, {'lm.transformer.h.0.attn.c_proj.bias', 'transformer.h.0.attn.c_proj.bias'}, {'transformer.h.0.ln_2.weight', 'lm.transformer.h.0.ln_2.weight'}, {'transformer.h.0.ln_2.bias', 'lm.transformer.h.0.ln_2.bias'}, {'lm.transformer.h.0.mlp.c_fc.weight', 'transformer.h.0.mlp.c_fc.weight'}, {'lm.transformer.h.0.mlp.c_fc.bias', 'transformer.h.0.mlp.c_fc.bias'}, {'lm.transformer.h.0.mlp.c_proj.weight', 'transformer.h.0.mlp.c_proj.weight'}, {'transformer.h.0.mlp.c_proj.bias', 'lm.transformer.h.0.mlp.c_proj.bias'}, {'transformer.h.1.ln_1.weight', 'lm.transformer.h.1.ln_1.weight'}, {'lm.transformer.h.1.ln_1.bias', 'transformer.h.1.ln_1.bias'}, {'lm.transformer.h.1.attn.c_attn.weight', 'transformer.h.1.attn.c_attn.weight'}, {'transformer.h.1.attn.c_attn.bias', 'lm.transformer.h.1.attn.c_attn.bias'}, {'lm.transformer.h.1.attn.c_proj.weight', 'transformer.h.1.attn.c_proj.weight'}, {'lm.transformer.h.1.attn.c_proj.bias', 'transformer.h.1.attn.c_proj.bias'}, {'lm.transformer.h.1.ln_2.weight', 'transformer.h.1.ln_2.weight'}, {'lm.transformer.h.1.ln_2.bias', 'transformer.h.1.ln_2.bias'}, {'lm.transformer.h.1.mlp.c_fc.weight', 'transformer.h.1.mlp.c_fc.weight'}, {'lm.transformer.h.1.mlp.c_fc.bias', 'transformer.h.1.mlp.c_fc.bias'}, {'transformer.h.1.mlp.c_proj.weight', 'lm.transformer.h.1.mlp.c_proj.weight'}, {'transformer.h.1.mlp.c_proj.bias', 'lm.transformer.h.1.mlp.c_proj.bias'}, {'lm.transformer.h.2.ln_1.weight', 'transformer.h.2.ln_1.weight'}, {'lm.transformer.h.2.ln_1.bias', 'transformer.h.2.ln_1.bias'}, {'transformer.h.2.attn.c_attn.weight', 'lm.transformer.h.2.attn.c_attn.weight'}, {'transformer.h.2.attn.c_attn.bias', 'lm.transformer.h.2.attn.c_attn.bias'}, {'lm.transformer.h.2.attn.c_proj.weight', 'transformer.h.2.attn.c_proj.weight'}, {'transformer.h.2.attn.c_proj.bias', 'lm.transformer.h.2.attn.c_proj.bias'}, {'transformer.h.2.ln_2.weight', 'lm.transformer.h.2.ln_2.weight'}, {'lm.transformer.h.2.ln_2.bias', 'transformer.h.2.ln_2.bias'}, {'transformer.h.2.mlp.c_fc.weight', 'lm.transformer.h.2.mlp.c_fc.weight'}, {'transformer.h.2.mlp.c_fc.bias', 'lm.transformer.h.2.mlp.c_fc.bias'}, {'transformer.h.2.mlp.c_proj.weight', 'lm.transformer.h.2.mlp.c_proj.weight'}, {'lm.transformer.h.2.mlp.c_proj.bias', 'transformer.h.2.mlp.c_proj.bias'}, {'transformer.h.3.ln_1.weight', 'lm.transformer.h.3.ln_1.weight'}, {'transformer.h.3.ln_1.bias', 'lm.transformer.h.3.ln_1.bias'}, {'transformer.h.3.attn.c_attn.weight', 'lm.transformer.h.3.attn.c_attn.weight'}, {'lm.transformer.h.3.attn.c_attn.bias', 'transformer.h.3.attn.c_attn.bias'}, {'transformer.h.3.attn.c_proj.weight', 'lm.transformer.h.3.attn.c_proj.weight'}, {'transformer.h.3.attn.c_proj.bias', 'lm.transformer.h.3.attn.c_proj.bias'}, {'lm.transformer.h.3.ln_2.weight', 'transformer.h.3.ln_2.weight'}, {'transformer.h.3.ln_2.bias', 'lm.transformer.h.3.ln_2.bias'}, {'transformer.h.3.mlp.c_fc.weight', 'lm.transformer.h.3.mlp.c_fc.weight'}, {'transformer.h.3.mlp.c_fc.bias', 'lm.transformer.h.3.mlp.c_fc.bias'}, {'lm.transformer.h.3.mlp.c_proj.weight', 'transformer.h.3.mlp.c_proj.weight'}, {'transformer.h.3.mlp.c_proj.bias', 'lm.transformer.h.3.mlp.c_proj.bias'}, {'lm.transformer.h.4.ln_1.weight', 'transformer.h.4.ln_1.weight'}, {'lm.transformer.h.4.ln_1.bias', 'transformer.h.4.ln_1.bias'}, {'lm.transformer.h.4.attn.c_attn.weight', 'transformer.h.4.attn.c_attn.weight'}, {'lm.transformer.h.4.attn.c_attn.bias', 'transformer.h.4.attn.c_attn.bias'}, {'lm.transformer.h.4.attn.c_proj.weight', 'transformer.h.4.attn.c_proj.weight'}, {'lm.transformer.h.4.attn.c_proj.bias', 'transformer.h.4.attn.c_proj.bias'}, {'transformer.h.4.ln_2.weight', 'lm.transformer.h.4.ln_2.weight'}, {'lm.transformer.h.4.ln_2.bias', 'transformer.h.4.ln_2.bias'}, {'lm.transformer.h.4.mlp.c_fc.weight', 'transformer.h.4.mlp.c_fc.weight'}, {'lm.transformer.h.4.mlp.c_fc.bias', 'transformer.h.4.mlp.c_fc.bias'}, {'lm.transformer.h.4.mlp.c_proj.weight', 'transformer.h.4.mlp.c_proj.weight'}, {'transformer.h.4.mlp.c_proj.bias', 'lm.transformer.h.4.mlp.c_proj.bias'}, {'lm.transformer.h.5.ln_1.weight', 'transformer.h.5.ln_1.weight'}, {'lm.transformer.h.5.ln_1.bias', 'transformer.h.5.ln_1.bias'}, {'lm.transformer.h.5.attn.c_attn.weight', 'transformer.h.5.attn.c_attn.weight'}, {'lm.transformer.h.5.attn.c_attn.bias', 'transformer.h.5.attn.c_attn.bias'}, {'lm.transformer.h.5.attn.c_proj.weight', 'transformer.h.5.attn.c_proj.weight'}, {'lm.transformer.h.5.attn.c_proj.bias', 'transformer.h.5.attn.c_proj.bias'}, {'transformer.h.5.ln_2.weight', 'lm.transformer.h.5.ln_2.weight'}, {'transformer.h.5.ln_2.bias', 'lm.transformer.h.5.ln_2.bias'}, {'transformer.h.5.mlp.c_fc.weight', 'lm.transformer.h.5.mlp.c_fc.weight'}, {'transformer.h.5.mlp.c_fc.bias', 'lm.transformer.h.5.mlp.c_fc.bias'}, {'lm.transformer.h.5.mlp.c_proj.weight', 'transformer.h.5.mlp.c_proj.weight'}, {'lm.transformer.h.5.mlp.c_proj.bias', 'transformer.h.5.mlp.c_proj.bias'}, {'lm.transformer.h.6.ln_1.weight', 'transformer.h.6.ln_1.weight'}, {'transformer.h.6.ln_1.bias', 'lm.transformer.h.6.ln_1.bias'}, {'lm.transformer.h.6.attn.c_attn.weight', 'transformer.h.6.attn.c_attn.weight'}, {'transformer.h.6.attn.c_attn.bias', 'lm.transformer.h.6.attn.c_attn.bias'}, {'transformer.h.6.attn.c_proj.weight', 'lm.transformer.h.6.attn.c_proj.weight'}, {'transformer.h.6.attn.c_proj.bias', 'lm.transformer.h.6.attn.c_proj.bias'}, {'transformer.h.6.ln_2.weight', 'lm.transformer.h.6.ln_2.weight'}, {'transformer.h.6.ln_2.bias', 'lm.transformer.h.6.ln_2.bias'}, {'transformer.h.6.mlp.c_fc.weight', 'lm.transformer.h.6.mlp.c_fc.weight'}, {'transformer.h.6.mlp.c_fc.bias', 'lm.transformer.h.6.mlp.c_fc.bias'}, {'transformer.h.6.mlp.c_proj.weight', 'lm.transformer.h.6.mlp.c_proj.weight'}, {'transformer.h.6.mlp.c_proj.bias', 'lm.transformer.h.6.mlp.c_proj.bias'}, {'transformer.h.7.ln_1.weight', 'lm.transformer.h.7.ln_1.weight'}, {'transformer.h.7.ln_1.bias', 'lm.transformer.h.7.ln_1.bias'}, {'lm.transformer.h.7.attn.c_attn.weight', 'transformer.h.7.attn.c_attn.weight'}, {'lm.transformer.h.7.attn.c_attn.bias', 'transformer.h.7.attn.c_attn.bias'}, {'transformer.h.7.attn.c_proj.weight', 'lm.transformer.h.7.attn.c_proj.weight'}, {'lm.transformer.h.7.attn.c_proj.bias', 'transformer.h.7.attn.c_proj.bias'}, {'lm.transformer.h.7.ln_2.weight', 'transformer.h.7.ln_2.weight'}, {'transformer.h.7.ln_2.bias', 'lm.transformer.h.7.ln_2.bias'}, {'transformer.h.7.mlp.c_fc.weight', 'lm.transformer.h.7.mlp.c_fc.weight'}, {'lm.transformer.h.7.mlp.c_fc.bias', 'transformer.h.7.mlp.c_fc.bias'}, {'lm.transformer.h.7.mlp.c_proj.weight', 'transformer.h.7.mlp.c_proj.weight'}, {'transformer.h.7.mlp.c_proj.bias', 'lm.transformer.h.7.mlp.c_proj.bias'}, {'lm.transformer.h.8.ln_1.weight', 'transformer.h.8.ln_1.weight'}, {'lm.transformer.h.8.ln_1.bias', 'transformer.h.8.ln_1.bias'}, {'lm.transformer.h.8.attn.c_attn.weight', 'transformer.h.8.attn.c_attn.weight'}, {'transformer.h.8.attn.c_attn.bias', 'lm.transformer.h.8.attn.c_attn.bias'}, {'lm.transformer.h.8.attn.c_proj.weight', 'transformer.h.8.attn.c_proj.weight'}, {'lm.transformer.h.8.attn.c_proj.bias', 'transformer.h.8.attn.c_proj.bias'}, {'transformer.h.8.ln_2.weight', 'lm.transformer.h.8.ln_2.weight'}, {'transformer.h.8.ln_2.bias', 'lm.transformer.h.8.ln_2.bias'}, {'lm.transformer.h.8.mlp.c_fc.weight', 'transformer.h.8.mlp.c_fc.weight'}, {'transformer.h.8.mlp.c_fc.bias', 'lm.transformer.h.8.mlp.c_fc.bias'}, {'transformer.h.8.mlp.c_proj.weight', 'lm.transformer.h.8.mlp.c_proj.weight'}, {'transformer.h.8.mlp.c_proj.bias', 'lm.transformer.h.8.mlp.c_proj.bias'}, {'lm.transformer.h.9.ln_1.weight', 'transformer.h.9.ln_1.weight'}, {'transformer.h.9.ln_1.bias', 'lm.transformer.h.9.ln_1.bias'}, {'transformer.h.9.attn.c_attn.weight', 'lm.transformer.h.9.attn.c_attn.weight'}, {'transformer.h.9.attn.c_attn.bias', 'lm.transformer.h.9.attn.c_attn.bias'}, {'lm.transformer.h.9.attn.c_proj.weight', 'transformer.h.9.attn.c_proj.weight'}, {'transformer.h.9.attn.c_proj.bias', 'lm.transformer.h.9.attn.c_proj.bias'}, {'lm.transformer.h.9.ln_2.weight', 'transformer.h.9.ln_2.weight'}, {'transformer.h.9.ln_2.bias', 'lm.transformer.h.9.ln_2.bias'}, {'lm.transformer.h.9.mlp.c_fc.weight', 'transformer.h.9.mlp.c_fc.weight'}, {'lm.transformer.h.9.mlp.c_fc.bias', 'transformer.h.9.mlp.c_fc.bias'}, {'lm.transformer.h.9.mlp.c_proj.weight', 'transformer.h.9.mlp.c_proj.weight'}, {'transformer.h.9.mlp.c_proj.bias', 'lm.transformer.h.9.mlp.c_proj.bias'}, {'transformer.h.10.ln_1.weight', 'lm.transformer.h.10.ln_1.weight'}, {'lm.transformer.h.10.ln_1.bias', 'transformer.h.10.ln_1.bias'}, {'lm.transformer.h.10.attn.c_attn.weight', 'transformer.h.10.attn.c_attn.weight'}, {'transformer.h.10.attn.c_attn.bias', 'lm.transformer.h.10.attn.c_attn.bias'}, {'transformer.h.10.attn.c_proj.weight', 'lm.transformer.h.10.attn.c_proj.weight'}, {'transformer.h.10.attn.c_proj.bias', 'lm.transformer.h.10.attn.c_proj.bias'}, {'lm.transformer.h.10.ln_2.weight', 'transformer.h.10.ln_2.weight'}, {'transformer.h.10.ln_2.bias', 'lm.transformer.h.10.ln_2.bias'}, {'lm.transformer.h.10.mlp.c_fc.weight', 'transformer.h.10.mlp.c_fc.weight'}, {'lm.transformer.h.10.mlp.c_fc.bias', 'transformer.h.10.mlp.c_fc.bias'}, {'transformer.h.10.mlp.c_proj.weight', 'lm.transformer.h.10.mlp.c_proj.weight'}, {'lm.transformer.h.10.mlp.c_proj.bias', 'transformer.h.10.mlp.c_proj.bias'}, {'transformer.h.11.ln_1.weight', 'lm.transformer.h.11.ln_1.weight'}, {'lm.transformer.h.11.ln_1.bias', 'transformer.h.11.ln_1.bias'}, {'lm.transformer.h.11.attn.c_attn.weight', 'transformer.h.11.attn.c_attn.weight'}, {'lm.transformer.h.11.attn.c_attn.bias', 'transformer.h.11.attn.c_attn.bias'}, {'transformer.h.11.attn.c_proj.weight', 'lm.transformer.h.11.attn.c_proj.weight'}, {'lm.transformer.h.11.attn.c_proj.bias', 'transformer.h.11.attn.c_proj.bias'}, {'transformer.h.11.ln_2.weight', 'lm.transformer.h.11.ln_2.weight'}, {'lm.transformer.h.11.ln_2.bias', 'transformer.h.11.ln_2.bias'}, {'lm.transformer.h.11.mlp.c_fc.weight', 'transformer.h.11.mlp.c_fc.weight'}, {'lm.transformer.h.11.mlp.c_fc.bias', 'transformer.h.11.mlp.c_fc.bias'}, {'transformer.h.11.mlp.c_proj.weight', 'lm.transformer.h.11.mlp.c_proj.weight'}, {'transformer.h.11.mlp.c_proj.bias', 'lm.transformer.h.11.mlp.c_proj.bias'}, {'lm.transformer.ln_f.weight', 'transformer.ln_f.weight'}, {'lm.transformer.ln_f.bias', 'transformer.ln_f.bias'}] that are mismatching the transformers base configuration. Try saving using `safe_serialization=False` or remove this tensor sharing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment