Created
November 28, 2023 15:35
-
-
Save mdouze/c85f69f7ac997cdc9b9096e3640e0423 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python -u train_qinco.py --db bigann1B --M 8 --L 2 --h 256 --lr 0.001 --ngpu 4 --model models/test_model.pt | |
args: Namespace(todo=['train_rq', 'train', 'train_ivf'], db='bigann1B', training_data='', nt=500000, nval=10000, db_scale=-1, ivf=False, M=8, L=2, K=256, h=256, rq_beam_size=1, ngpu=4, lr=0.001, | |
max_epochs=1000, batch_size=1024, RQ_filename='', IVF_filename='', model='models/test_model.pt', checkpoint='') | |
nb processors 80 | |
model name : Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz | |
Tue Nov 28 07:57:28 2023 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+======================+======================| | |
| 0 Tesla V100-SXM2-16GB On | 00000000:06:00.0 Off | 0 | | |
| N/A 31C P0 42W / 300W | 0MiB / 16384MiB | 0% Default | | |
| | | N/A | | |
+-----------------------------------------+----------------------+----------------------+ | |
| 1 Tesla V100-SXM2-16GB On | 00000000:07:00.0 Off | 0 | | |
| N/A 33C P0 43W / 300W | 0MiB / 16384MiB | 0% Default | | |
| | | N/A | | |
+-----------------------------------------+----------------------+----------------------+ | |
| 2 Tesla V100-SXM2-16GB On | 00000000:0A:00.0 Off | 0 | | |
| N/A 33C P0 42W / 300W | 0MiB / 16384MiB | 0% Default | | |
| | | N/A | | |
+-----------------------------------------+----------------------+----------------------+ | |
| 3 Tesla V100-SXM2-16GB On | 00000000:85:00.0 Off | 0 | | |
| N/A 31C P0 43W / 300W | 0MiB / 16384MiB | 0% Default | | |
| | | N/A | | |
+-----------------------------------------+----------------------+----------------------+ | |
+---------------------------------------------------------------------------------------+ | |
| Processes: | | |
| GPU GI CI PID Type Process name GPU Memory | | |
| ID ID Usage | | |
|=======================================================================================| | |
| No running processes found | | |
+---------------------------------------------------------------------------------------+ | |
Loading dataset bigann1B | |
dataset in dimension 128, with metric L2, size: Q 10000 B 1000000000 T 100000000 | |
Training set: (500000, 128), validation: (10000, 128) | |
====================== residual quantizer training | |
training RQ 8x8, beam_size=1 | |
[14.87 s] training done | |
train set MSE=26038 validation MSE=26406.3 | |
RQ centroids size (8, 256, 128) | |
====================== training | |
Initializing model from RQ | |
QINCo( | |
(codebook0): Embedding(256, 128) | |
(step1): QINCoStep( | |
(codebook): Embedding(256, 128) | |
(MLPconcat): Linear(in_features=256, out_features=128, bias=True) | |
(residual_block0): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
(residual_block1): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
) | |
(step2): QINCoStep( | |
(codebook): Embedding(256, 128) | |
(MLPconcat): Linear(in_features=256, out_features=128, bias=True) | |
(residual_block0): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
(residual_block1): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
) | |
(step3): QINCoStep( | |
(codebook): Embedding(256, 128) | |
(MLPconcat): Linear(in_features=256, out_features=128, bias=True) | |
(residual_block0): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
(residual_block1): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
) | |
(step4): QINCoStep( | |
(codebook): Embedding(256, 128) | |
(MLPconcat): Linear(in_features=256, out_features=128, bias=True) | |
(residual_block0): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
(residual_block1): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
) | |
(step5): QINCoStep( | |
(codebook): Embedding(256, 128) | |
(MLPconcat): Linear(in_features=256, out_features=128, bias=True) | |
(residual_block0): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
(residual_block1): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
) | |
(step6): QINCoStep( | |
(codebook): Embedding(256, 128) | |
(MLPconcat): Linear(in_features=256, out_features=128, bias=True) | |
(residual_block0): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
(residual_block1): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
) | |
(step7): QINCoStep( | |
(codebook): Embedding(256, 128) | |
(MLPconcat): Linear(in_features=256, out_features=128, bias=True) | |
(residual_block0): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
(residual_block1): Sequential( | |
(0): Linear(in_features=128, out_features=256, bias=False) | |
(1): ReLU() | |
(2): Linear(in_features=256, out_features=128, bias=False) | |
) | |
) | |
) | |
nb trainable parameters 1409920 | |
Setting scaling factor to 246.0 | |
Running on 4 GPUs | |
Start train_job rank=0 | |
Setting up distribtued data parallel bs=1024 | |
Before optimization: val MSE=568655E=568655 | |
[3.18 s] epoch 0 lr=0.001 | |
End of epoch 0 train loss 0.0474199 val MSE=24048.6 | |
Best validation loss so far, storing models/test_model.pt | |
[21.16 s] epoch 1 lr=0.001 | |
End of epoch 1 train loss 0.0418424 val MSE=22044.5 | |
Best validation loss so far, storing models/test_model.pt | |
[38.31 s] epoch 2 lr=0.001 | |
End of epoch 2 train loss 0.0403962 val MSE=20648.4 | |
Best validation loss so far, storing models/test_model.pt | |
[55.48 s] epoch 3 lr=0.001 | |
End of epoch 3 train loss 0.039486 val MSE=19536.5 | |
Best validation loss so far, storing models/test_model.pt | |
[72.58 s] epoch 4 lr=0.001 | |
End of epoch 4 train loss 0.0388234 val MSE=18717.2 | |
Best validation loss so far, storing models/test_model.pt | |
[89.75 s] epoch 5 lr=0.001 | |
End of epoch 5 train loss 0.038346 val MSE=18023.5 | |
Best validation loss so far, storing models/test_model.pt | |
[106.88 s] epoch 6 lr=0.001 | |
End of epoch 6 train loss 0.0379748 val MSE=17650.8 | |
Best validation loss so far, storing models/test_model.pt | |
[124.07 s] epoch 7 lr=0.001 | |
End of epoch 7 train loss 0.0376687 val MSE=17181.9 | |
Best validation loss so far, storing models/test_model.pt | |
[141.17 s] epoch 8 lr=0.001 | |
End of epoch 8 train loss 0.0374314 val MSE=16818.1 | |
Best validation loss so far, storing models/test_model.pt | |
[158.35 s] epoch 9 lr=0.001 | |
End of epoch 9 train loss 0.0372263 val MSE=16460.2 | |
Best validation loss so far, storing models/test_model.pt | |
[175.42 s] epoch 10 lr=0.001 | |
End of epoch 10 train loss 0.0370482 val MSE=16194.4 | |
Best validation loss so far, storing models/test_model.pt | |
[192.62 s] epoch 11 lr=0.001 | |
End of epoch 11 train loss 0.0369036 val MSE=16079 | |
Best validation loss so far, storing models/test_model.pt | |
[209.69 s] epoch 12 lr=0.001 | |
End of epoch 12 train loss 0.036769 val MSE=15892.3 | |
Best validation loss so far, storing models/test_model.pt | |
[226.88 s] epoch 13 lr=0.001 | |
End of epoch 13 train loss 0.0366591 val MSE=15725.9 | |
Best validation loss so far, storing models/test_model.pt | |
[243.97 s] epoch 14 lr=0.001 | |
End of epoch 14 train loss 0.0365515 val MSE=15519.1 | |
Best validation loss so far, storing models/test_model.pt | |
[261.08 s] epoch 15 lr=0.001 | |
End of epoch 15 train loss 0.0364743 val MSE=15442.4 | |
Best validation loss so far, storing models/test_model.pt | |
[278.22 s] epoch 16 lr=0.001 | |
End of epoch 16 train loss 0.0363938 val MSE=15313.2 | |
Best validation loss so far, storing models/test_model.pt | |
[295.33 s] epoch 17 lr=0.001 | |
End of epoch 17 train loss 0.0363131 val MSE=15185.6 | |
Best validation loss so far, storing models/test_model.pt | |
[312.43 s] epoch 18 lr=0.001 | |
End of epoch 18 train loss 0.0362477 val MSE=15110.6 | |
Best validation loss so far, storing models/test_model.pt | |
[329.55 s] epoch 19 lr=0.001 | |
End of epoch 19 train loss 0.0361872 val MSE=15137.5 | |
[346.63 s] epoch 20 lr=0.001 | |
End of epoch 20 train loss 0.0361288 val MSE=14985.2 | |
Best validation loss so far, storing models/test_model.pt | |
[363.72 s] epoch 21 lr=0.001 | |
End of epoch 21 train loss 0.0360852 val MSE=14950.1 | |
Best validation loss so far, storing models/test_model.pt | |
[380.86 s] epoch 22 lr=0.001 | |
End of epoch 22 train loss 0.036039 val MSE=14874.6 | |
Best validation loss so far, storing models/test_model.pt | |
[398.00 s] epoch 23 lr=0.001 | |
End of epoch 23 train loss 0.0359996 val MSE=14845.6 | |
Best validation loss so far, storing models/test_model.pt | |
[415.11 s] epoch 24 lr=0.001 | |
End of epoch 24 train loss 0.0359464 val MSE=14748.7 | |
Best validation loss so far, storing models/test_model.pt | |
[432.20 s] epoch 25 lr=0.001 | |
End of epoch 25 train loss 0.0359245 val MSE=14725.5 | |
Best validation loss so far, storing models/test_model.pt | |
[449.32 s] epoch 26 lr=0.001 | |
End of epoch 26 train loss 0.0358768 val MSE=14605.4 | |
Best validation loss so far, storing models/test_model.pt | |
[466.45 s] epoch 27 lr=0.001 | |
End of epoch 27 train loss 0.0358431 val MSE=14694.2 | |
[483.48 s] epoch 28 lr=0.001 | |
End of epoch 28 train loss 0.0358138 val MSE=14610.1 | |
[500.53 s] epoch 29 lr=0.001 | |
End of epoch 29 train loss 0.0357878 val MSE=14601 | |
Best validation loss so far, storing models/test_model.pt | |
[517.61 s] epoch 30 lr=0.001 | |
End of epoch 30 train loss 0.0357512 val MSE=14527.5 | |
Best validation loss so far, storing models/test_model.pt | |
[534.74 s] epoch 31 lr=0.001 | |
End of epoch 31 train loss 0.0357195 val MSE=14466.7 | |
Best validation loss so far, storing models/test_model.pt | |
[551.82 s] epoch 32 lr=0.001 | |
End of epoch 32 train loss 0.0356953 val MSE=14502.5 | |
[568.88 s] epoch 33 lr=0.001 | |
End of epoch 33 train loss 0.035675 val MSE=14487.2 | |
[585.94 s] epoch 34 lr=0.001 | |
End of epoch 34 train loss 0.035653 val MSE=14484.8 | |
[603.02 s] epoch 35 lr=0.001 | |
End of epoch 35 train loss 0.0356293 val MSE=14475 | |
[620.05 s] epoch 36 lr=0.001 | |
End of epoch 36 train loss 0.0356023 val MSE=14385.8 | |
Best validation loss so far, storing models/test_model.pt | |
[637.15 s] epoch 37 lr=0.001 | |
End of epoch 37 train loss 0.035583 val MSE=14360.3 | |
Best validation loss so far, storing models/test_model.pt | |
[654.26 s] epoch 38 lr=0.001 | |
End of epoch 38 train loss 0.0355544 val MSE=14337.7 | |
Best validation loss so far, storing models/test_model.pt | |
[671.37 s] epoch 39 lr=0.001 | |
End of epoch 39 train loss 0.0355401 val MSE=14360.6 | |
[688.43 s] epoch 40 lr=0.001 | |
End of epoch 40 train loss 0.0355217 val MSE=14288.9 | |
Best validation loss so far, storing models/test_model.pt | |
[705.53 s] epoch 41 lr=0.001 | |
End of epoch 41 train loss 0.0355137 val MSE=14323.7 | |
[722.58 s] epoch 42 lr=0.001 | |
End of epoch 42 train loss 0.0354883 val MSE=14270.8 | |
Best validation loss so far, storing models/test_model.pt | |
[739.69 s] epoch 43 lr=0.001 | |
End of epoch 43 train loss 0.0354726 val MSE=14311.4 | |
[756.79 s] epoch 44 lr=0.001 | |
End of epoch 44 train loss 0.0354563 val MSE=14178.6 | |
Best validation loss so far, storing models/test_model.pt | |
[773.98 s] epoch 45 lr=0.001 | |
End of epoch 45 train loss 0.035442 val MSE=14214.2 | |
[791.10 s] epoch 46 lr=0.001 | |
End of epoch 46 train loss 0.0354202 val MSE=14211 | |
[808.15 s] epoch 47 lr=0.001 | |
End of epoch 47 train loss 0.0354138 val MSE=14179.1 | |
[825.22 s] epoch 48 lr=0.001 | |
End of epoch 48 train loss 0.0353981 val MSE=14136.5 | |
Best validation loss so far, storing models/test_model.pt | |
[842.29 s] epoch 49 lr=0.001 | |
End of epoch 49 train loss 0.0353856 val MSE=14136.8 | |
[859.41 s] epoch 50 lr=0.001 | |
End of epoch 50 train loss 0.0353724 val MSE=14186.3 | |
[876.48 s] epoch 51 lr=0.001 | |
End of epoch 51 train loss 0.0353531 val MSE=14086.1 | |
Best validation loss so far, storing models/test_model.pt | |
[893.58 s] epoch 52 lr=0.001 | |
End of epoch 52 train loss 0.0353404 val MSE=14076.8 | |
Best validation loss so far, storing models/test_model.pt | |
[910.65 s] epoch 53 lr=0.001 | |
End of epoch 53 train loss 0.0353271 val MSE=14112.9 | |
[927.76 s] epoch 54 lr=0.001 | |
End of epoch 54 train loss 0.0353135 val MSE=14077.5 | |
[944.81 s] epoch 55 lr=0.001 | |
End of epoch 55 train loss 0.0353028 val MSE=14129.4 | |
[961.92 s] epoch 56 lr=0.001 | |
End of epoch 56 train loss 0.035289 val MSE=14139.3 | |
[979.33 s] epoch 57 lr=0.001 | |
End of epoch 57 train loss 0.0352794 val MSE=14057.3 | |
Best validation loss so far, storing models/test_model.pt | |
[996.84 s] epoch 58 lr=0.001 | |
End of epoch 58 train loss 0.035267 val MSE=14141.5 | |
[1013.77 s] epoch 59 lr=0.001 | |
End of epoch 59 train loss 0.0352646 val MSE=14043.6 | |
Best validation loss so far, storing models/test_model.pt | |
[1030.71 s] epoch 60 lr=0.001 | |
End of epoch 60 train loss 0.0352546 val MSE=14031.1 | |
Best validation loss so far, storing models/test_model.pt | |
[1047.77 s] epoch 61 lr=0.001 | |
End of epoch 61 train loss 0.0352413 val MSE=14035.5 | |
[1064.96 s] epoch 62 lr=0.001 | |
End of epoch 62 train loss 0.0352205 val MSE=14077.1 | |
[1081.92 s] epoch 63 lr=0.001 | |
End of epoch 63 train loss 0.0352137 val MSE=14115.9 | |
[1098.95 s] epoch 64 lr=0.001 | |
End of epoch 64 train loss 0.0352076 val MSE=14017.5 | |
Best validation loss so far, storing models/test_model.pt | |
[1116.44 s] epoch 65 lr=0.001 | |
End of epoch 65 train loss 0.0352072 val MSE=13964.2 | |
Best validation loss so far, storing models/test_model.pt | |
[1133.42 s] epoch 66 lr=0.001 | |
End of epoch 66 train loss 0.0351961 val MSE=14037.7 | |
[1150.38 s] epoch 67 lr=0.001 | |
End of epoch 67 train loss 0.035179 val MSE=13938 | |
Best validation loss so far, storing models/test_model.pt | |
[1167.38 s] epoch 68 lr=0.001 | |
End of epoch 68 train loss 0.0351727 val MSE=14010.6 | |
[1184.37 s] epoch 69 lr=0.001 | |
End of epoch 69 train loss 0.0351638 val MSE=13925.7 | |
Best validation loss so far, storing models/test_model.pt | |
[1201.35 s] epoch 70 lr=0.001 | |
End of epoch 70 train loss 0.0351474 val MSE=13972.8 | |
[1218.37 s] epoch 71 lr=0.001 | |
End of epoch 71 train loss 0.0351449 val MSE=13949.5 | |
[1235.34 s] epoch 72 lr=0.001 | |
End of epoch 72 train loss 0.0351377 val MSE=14018 | |
[1252.32 s] epoch 73 lr=0.001 | |
End of epoch 73 train loss 0.0351269 val MSE=13963.5 | |
[1269.25 s] epoch 74 lr=0.001 | |
End of epoch 74 train loss 0.0351198 val MSE=13973.7 | |
[1286.21 s] epoch 75 lr=0.001 | |
End of epoch 75 train loss 0.0351066 val MSE=13872.6 | |
Best validation loss so far, storing models/test_model.pt | |
[1303.22 s] epoch 76 lr=0.001 | |
End of epoch 76 train loss 0.0351025 val MSE=13919.4 | |
[1320.24 s] epoch 77 lr=0.001 | |
End of epoch 77 train loss 0.0350998 val MSE=13896.3 | |
[1337.40 s] epoch 78 lr=0.001 | |
End of epoch 78 train loss 0.0350904 val MSE=13911.9 | |
[1354.56 s] epoch 79 lr=0.001 | |
End of epoch 79 train loss 0.035081 val MSE=13940.7 | |
[1371.65 s] epoch 80 lr=0.001 | |
End of epoch 80 train loss 0.0350802 val MSE=13912.2 | |
[1389.01 s] epoch 81 lr=0.001 | |
End of epoch 81 train loss 0.03507 val MSE=13861.8 | |
Best validation loss so far, storing models/test_model.pt | |
[1405.98 s] epoch 82 lr=0.001 | |
End of epoch 82 train loss 0.0350648 val MSE=13946.6 | |
[1423.00 s] epoch 83 lr=0.001 | |
End of epoch 83 train loss 0.0350582 val MSE=13929.8 | |
[1440.05 s] epoch 84 lr=0.001 | |
End of epoch 84 train loss 0.0350461 val MSE=13899.6 | |
[1457.10 s] epoch 85 lr=0.001 | |
End of epoch 85 train loss 0.0350462 val MSE=13883.6 | |
[1474.14 s] epoch 86 lr=0.001 | |
End of epoch 86 train loss 0.0350334 val MSE=13848.7 | |
Best validation loss so far, storing models/test_model.pt | |
[1491.24 s] epoch 87 lr=0.001 | |
End of epoch 87 train loss 0.0350324 val MSE=13918.4 | |
[1508.28 s] epoch 88 lr=0.001 | |
End of epoch 88 train loss 0.0350158 val MSE=13880.4 | |
[1525.29 s] epoch 89 lr=0.001 | |
End of epoch 89 train loss 0.0350155 val MSE=13932.9 | |
[1542.32 s] epoch 90 lr=0.001 | |
End of epoch 90 train loss 0.0350152 val MSE=13881.5 | |
[1559.41 s] epoch 91 lr=0.001 | |
End of epoch 91 train loss 0.0350087 val MSE=13903.9 | |
[1576.38 s] epoch 92 lr=0.001 | |
End of epoch 92 train loss 0.0350062 val MSE=13822.7 | |
Best validation loss so far, storing models/test_model.pt | |
[1593.41 s] epoch 93 lr=0.001 | |
End of epoch 93 train loss 0.0350026 val MSE=13866.3 | |
[1610.34 s] epoch 94 lr=0.001 | |
End of epoch 94 train loss 0.034997 val MSE=13850.8 | |
[1627.40 s] epoch 95 lr=0.001 | |
End of epoch 95 train loss 0.03499 val MSE=13786.5 | |
Best validation loss so far, storing models/test_model.pt | |
[1644.37 s] epoch 96 lr=0.001 | |
End of epoch 96 train loss 0.0349817 val MSE=13861 | |
[1661.36 s] epoch 97 lr=0.001 | |
End of epoch 97 train loss 0.0349857 val MSE=13849.2 | |
[1678.29 s] epoch 98 lr=0.001 | |
End of epoch 98 train loss 0.0349745 val MSE=13825.3 | |
[1695.30 s] epoch 99 lr=0.001 | |
End of epoch 99 train loss 0.0349738 val MSE=13823.2 | |
[1712.24 s] epoch 100 lr=0.001 | |
End of epoch 100 train loss 0.0349657 val MSE=13851.2 | |
[1729.18 s] epoch 101 lr=0.001 | |
End of epoch 101 train loss 0.0349583 val MSE=13892 | |
[1746.11 s] epoch 102 lr=0.001 | |
End of epoch 102 train loss 0.0349473 val MSE=13819.2 | |
[1763.09 s] epoch 103 lr=0.001 | |
End of epoch 103 train loss 0.0349428 val MSE=13819.1 | |
[1780.05 s] epoch 104 lr=0.001 | |
End of epoch 104 train loss 0.0349477 val MSE=13811.5 | |
[1796.98 s] epoch 105 lr=0.001 | |
End of epoch 105 train loss 0.0349389 val MSE=13755.3 | |
Best validation loss so far, storing models/test_model.pt | |
[1813.96 s] epoch 106 lr=0.001 | |
End of epoch 106 train loss 0.0349356 val MSE=13820.1 | |
[1830.95 s] epoch 107 lr=0.001 | |
End of epoch 107 train loss 0.0349218 val MSE=13838.6 | |
[1847.92 s] epoch 108 lr=0.001 | |
End of epoch 108 train loss 0.0349209 val MSE=13812 | |
[1865.34 s] epoch 109 lr=0.001 | |
End of epoch 109 train loss 0.0349153 val MSE=13732.1 | |
Best validation loss so far, storing models/test_model.pt | |
[1882.79 s] epoch 110 lr=0.001 | |
End of epoch 110 train loss 0.0349254 val MSE=13757.6 | |
[1900.19 s] epoch 111 lr=0.001 | |
End of epoch 111 train loss 0.034914 val MSE=13773.8 | |
[1917.32 s] epoch 112 lr=0.001 | |
End of epoch 112 train loss 0.0349089 val MSE=13829.4 | |
[1934.26 s] epoch 113 lr=0.001 | |
End of epoch 113 train loss 0.0349052 val MSE=13823.3 | |
[1951.24 s] epoch 114 lr=0.001 | |
End of epoch 114 train loss 0.0348975 val MSE=13791 | |
[1968.27 s] epoch 115 lr=0.001 | |
End of epoch 115 train loss 0.0348917 val MSE=13761.4 | |
[1985.36 s] epoch 116 lr=0.001 | |
End of epoch 116 train loss 0.0348968 val MSE=13797.3 | |
[2002.41 s] epoch 117 lr=0.001 | |
End of epoch 117 train loss 0.0348868 val MSE=13734.2 | |
[2019.48 s] epoch 118 lr=0.001 | |
End of epoch 118 train loss 0.0348782 val MSE=13688 | |
Best validation loss so far, storing models/test_model.pt | |
[2036.54 s] epoch 119 lr=0.001 | |
End of epoch 119 train loss 0.0348779 val MSE=13720.6 | |
[2053.64 s] epoch 120 lr=0.001 | |
End of epoch 120 train loss 0.0348834 val MSE=13757 | |
[2070.69 s] epoch 121 lr=0.001 | |
End of epoch 121 train loss 0.0348723 val MSE=13733.7 | |
[2087.76 s] epoch 122 lr=0.001 | |
End of epoch 122 train loss 0.0348692 val MSE=13845.6 | |
[2104.81 s] epoch 123 lr=0.001 | |
End of epoch 123 train loss 0.0348614 val MSE=13707.3 | |
[2121.89 s] epoch 124 lr=0.001 | |
End of epoch 124 train loss 0.0348586 val MSE=13777.7 | |
[2138.96 s] epoch 125 lr=0.001 | |
End of epoch 125 train loss 0.0348534 val MSE=13762 | |
[2155.99 s] epoch 126 lr=0.001 | |
End of epoch 126 train loss 0.0348556 val MSE=13764 | |
[2173.03 s] epoch 127 lr=0.001 | |
End of epoch 127 train loss 0.0348563 val MSE=13759.1 | |
[2190.12 s] epoch 128 lr=0.001 | |
End of epoch 128 train loss 0.0348509 val MSE=13698.4 | |
Val loss did not improve for 10 epochs, reduce LR | |
[2207.17 s] epoch 129 lr=0.0001 | |
End of epoch 129 train loss 0.0339622 val MSE=12991.3 | |
Best validation loss so far, storing models/test_model.pt | |
[2224.25 s] epoch 130 lr=0.0001 | |
End of epoch 130 train loss 0.0337109 val MSE=12915.7 | |
Best validation loss so far, storing models/test_model.pt | |
[2241.32 s] epoch 131 lr=0.0001 | |
End of epoch 131 train loss 0.033613 val MSE=12893 | |
Best validation loss so far, storing models/test_model.pt | |
[2258.43 s] epoch 132 lr=0.0001 | |
End of epoch 132 train loss 0.0335497 val MSE=12912.7 | |
[2275.48 s] epoch 133 lr=0.0001 | |
End of epoch 133 train loss 0.0335045 val MSE=12925.9 | |
[2292.52 s] epoch 134 lr=0.0001 | |
End of epoch 134 train loss 0.0334663 val MSE=12902.4 | |
[2309.56 s] epoch 135 lr=0.0001 | |
End of epoch 135 train loss 0.0334346 val MSE=12931.5 | |
[2326.65 s] epoch 136 lr=0.0001 | |
End of epoch 136 train loss 0.0334061 val MSE=12904.1 | |
[2343.70 s] epoch 137 lr=0.0001 | |
End of epoch 137 train loss 0.0333828 val MSE=12890.6 | |
Best validation loss so far, storing models/test_model.pt | |
[2360.77 s] epoch 138 lr=0.0001 | |
End of epoch 138 train loss 0.0333611 val MSE=12885.1 | |
Best validation loss so far, storing models/test_model.pt | |
[2377.86 s] epoch 139 lr=0.0001 | |
End of epoch 139 train loss 0.0333408 val MSE=12878.6 | |
Best validation loss so far, storing models/test_model.pt | |
[2394.97 s] epoch 140 lr=0.0001 | |
End of epoch 140 train loss 0.0333246 val MSE=12870.6 | |
Best validation loss so far, storing models/test_model.pt | |
[2412.07 s] epoch 141 lr=0.0001 | |
End of epoch 141 train loss 0.0333066 val MSE=12877.2 | |
[2429.12 s] epoch 142 lr=0.0001 | |
End of epoch 142 train loss 0.0332918 val MSE=12857.1 | |
Best validation loss so far, storing models/test_model.pt | |
[2446.22 s] epoch 143 lr=0.0001 | |
End of epoch 143 train loss 0.0332767 val MSE=12872.6 | |
[2463.31 s] epoch 144 lr=0.0001 | |
End of epoch 144 train loss 0.0332651 val MSE=12884.6 | |
[2480.35 s] epoch 145 lr=0.0001 | |
End of epoch 145 train loss 0.0332513 val MSE=12864.2 | |
[2497.42 s] epoch 146 lr=0.0001 | |
End of epoch 146 train loss 0.0332435 val MSE=12886.3 | |
[2514.57 s] epoch 147 lr=0.0001 | |
End of epoch 147 train loss 0.0332279 val MSE=12898.8 | |
[2531.67 s] epoch 148 lr=0.0001 | |
End of epoch 148 train loss 0.0332206 val MSE=12892.7 | |
[2548.68 s] epoch 149 lr=0.0001 | |
End of epoch 149 train loss 0.0332094 val MSE=12880.3 | |
[2565.73 s] epoch 150 lr=0.0001 | |
End of epoch 150 train loss 0.0332005 val MSE=12871.1 | |
[2582.76 s] epoch 151 lr=0.0001 | |
End of epoch 151 train loss 0.0331914 val MSE=12861 | |
[2599.81 s] epoch 152 lr=0.0001 | |
End of epoch 152 train loss 0.0331836 val MSE=12865.8 | |
Val loss did not improve for 10 epochs, reduce LR | |
[2616.82 s] epoch 153 lr=1e-05 | |
End of epoch 153 train loss 0.0330098 val MSE=12807.1 | |
Best validation loss so far, storing models/test_model.pt | |
[2633.89 s] epoch 154 lr=1e-05 | |
End of epoch 154 train loss 0.0329883 val MSE=12809.3 | |
[2650.98 s] epoch 155 lr=1e-05 | |
End of epoch 155 train loss 0.0329796 val MSE=12800.4 | |
Best validation loss so far, storing models/test_model.pt | |
[2668.06 s] epoch 156 lr=1e-05 | |
End of epoch 156 train loss 0.0329735 val MSE=12811.2 | |
[2685.12 s] epoch 157 lr=1e-05 | |
End of epoch 157 train loss 0.0329685 val MSE=12806.8 | |
[2702.14 s] epoch 158 lr=1e-05 | |
End of epoch 158 train loss 0.0329637 val MSE=12809.2 | |
[2719.89 s] epoch 159 lr=1e-05 | |
End of epoch 159 train loss 0.0329608 val MSE=12807.8 | |
[2736.97 s] epoch 160 lr=1e-05 | |
End of epoch 160 train loss 0.0329578 val MSE=12806.5 | |
[2754.06 s] epoch 161 lr=1e-05 | |
End of epoch 161 train loss 0.0329552 val MSE=12809.8 | |
[2771.14 s] epoch 162 lr=1e-05 | |
End of epoch 162 train loss 0.0329529 val MSE=12804.8 | |
[2788.27 s] epoch 163 lr=1e-05 | |
End of epoch 163 train loss 0.0329498 val MSE=12810.1 | |
[2805.36 s] epoch 164 lr=1e-05 | |
End of epoch 164 train loss 0.0329475 val MSE=12823.1 | |
[2822.45 s] epoch 165 lr=1e-05 | |
End of epoch 165 train loss 0.0329453 val MSE=12811.2 | |
Val loss did not improve for 10 epochs, reduce LR | |
[2839.50 s] epoch 166 lr=1e-06 | |
End of epoch 166 train loss 0.0329214 val MSE=12814.9 | |
[2856.64 s] epoch 167 lr=1e-06 | |
End of epoch 167 train loss 0.0329197 val MSE=12811.3 | |
[2873.73 s] epoch 168 lr=1e-06 | |
End of epoch 168 train loss 0.0329192 val MSE=12812.5 | |
[2890.79 s] epoch 169 lr=1e-06 | |
End of epoch 169 train loss 0.0329189 val MSE=12811.2 | |
[2907.87 s] epoch 170 lr=1e-06 | |
End of epoch 170 train loss 0.0329187 val MSE=12814.8 | |
[2924.98 s] epoch 171 lr=1e-06 | |
End of epoch 171 train loss 0.0329184 val MSE=12816.7 | |
[2942.07 s] epoch 172 lr=1e-06 | |
End of epoch 172 train loss 0.0329184 val MSE=12812.9 | |
[2959.57 s] epoch 173 lr=1e-06 | |
End of epoch 173 train loss 0.0329184 val MSE=12812.9 | |
[2977.04 s] epoch 174 lr=1e-06 | |
End of epoch 174 train loss 0.0329173 val MSE=12814.7 | |
[2994.30 s] epoch 175 lr=1e-06 | |
End of epoch 175 train loss 0.0329176 val MSE=12814.6 | |
[3011.38 s] epoch 176 lr=1e-06 | |
End of epoch 176 train loss 0.0329165 val MSE=12812.6 | |
Val loss did not improve for 10 epochs, reduce LR | |
[3028.87 s] epoch 177 lr=1e-07 | |
End of epoch 177 train loss 0.0329145 val MSE=12815.6 | |
[3046.31 s] epoch 178 lr=1e-07 | |
End of epoch 178 train loss 0.032914 val MSE=12815.6 | |
[3063.71 s] epoch 179 lr=1e-07 | |
End of epoch 179 train loss 0.0329146 val MSE=12815.3 | |
[3081.04 s] epoch 180 lr=1e-07 | |
End of epoch 180 train loss 0.0329149 val MSE=12814.9 | |
[3098.53 s] epoch 181 lr=1e-07 | |
End of epoch 181 train loss 0.0329145 val MSE=12814.2 | |
[3115.63 s] epoch 182 lr=1e-07 | |
End of epoch 182 train loss 0.032914 val MSE=12814.1 | |
[3132.66 s] epoch 183 lr=1e-07 | |
End of epoch 183 train loss 0.0329136 val MSE=12814.5 | |
[3149.66 s] epoch 184 lr=1e-07 | |
End of epoch 184 train loss 0.0329143 val MSE=12815.4 | |
[3166.70 s] epoch 185 lr=1e-07 | |
End of epoch 185 train loss 0.0329141 val MSE=12813.4 | |
[3183.75 s] epoch 186 lr=1e-07 | |
End of epoch 186 train loss 0.032914 val MSE=12814.7 | |
[3200.78 s] epoch 187 lr=1e-07 | |
End of epoch 187 train loss 0.0329136 val MSE=12813.9 | |
Val loss did not improve for 10 epochs, reduce LR | |
LR too small, stopping | |
Stop train_job rank=0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment