Created
December 30, 2016 19:38
-
-
Save rachtsingh/1c372a62f420fa972e8fed2a50c2da7b to your computer and use it in GitHub Desktop.
seq2seq with layer normalization enabled
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading data from 'data/small-train.t7'... | |
* vocabulary size: source = 50004; target = 50004 | |
* additional features: source = 0; target = 0 | |
* maximum sequence length: source = 50; target = 51 | |
* number of training sentences: 100000 | |
* maximum batch size: 64 | |
Building model... | |
* using input feeding | |
Initializing parameters... | |
* number of parameters: 84818004 | |
Preparing memory optimization... | |
* sharing 76% of output/gradInput tensors memory between clones | |
Start training... | |
Epoch 1 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 739 ; Perplexity 262030.13 | |
Epoch 1 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1041 ; Perplexity 55043.06 | |
Epoch 1 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1215 ; Perplexity 25610.45 | |
Epoch 1 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1290 ; Perplexity 14892.75 | |
Epoch 1 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1351 ; Perplexity 9645.93 | |
Epoch 1 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1392 ; Perplexity 7104.51 | |
Epoch 1 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1416 ; Perplexity 5650.19 | |
Epoch 1 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1454 ; Perplexity 4640.61 | |
Epoch 1 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1485 ; Perplexity 3900.92 | |
Epoch 1 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1504 ; Perplexity 3392.54 | |
Epoch 1 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1519 ; Perplexity 3023.87 | |
Epoch 1 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1533 ; Perplexity 2739.26 | |
Epoch 1 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1552 ; Perplexity 2507.95 | |
Epoch 1 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1567 ; Perplexity 2316.60 | |
Epoch 1 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1572 ; Perplexity 2161.89 | |
Epoch 1 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1582 ; Perplexity 2023.06 | |
Epoch 1 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1589 ; Perplexity 1905.28 | |
Epoch 1 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1593 ; Perplexity 1801.04 | |
Epoch 1 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1598 ; Perplexity 1705.17 | |
Epoch 1 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1607 ; Perplexity 1618.89 | |
Epoch 1 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1611 ; Perplexity 1545.47 | |
Epoch 1 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1614 ; Perplexity 1481.86 | |
Epoch 1 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1617 ; Perplexity 1422.67 | |
Epoch 1 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1619 ; Perplexity 1368.80 | |
Epoch 1 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1627 ; Perplexity 1314.19 | |
Epoch 1 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1626 ; Perplexity 1271.36 | |
Epoch 1 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1629 ; Perplexity 1228.62 | |
Epoch 1 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1626 ; Perplexity 1191.57 | |
Epoch 1 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1634 ; Perplexity 1152.65 | |
Epoch 1 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1637 ; Perplexity 1118.02 | |
Epoch 1 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1638 ; Perplexity 1086.31 | |
Validation perplexity: 369.20971129663 | |
Saving checkpoint to 'models/layer2_epoch1_369.21.t7'... | |
Epoch 2 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1722 ; Perplexity 429.12 | |
Epoch 2 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1670 ; Perplexity 427.39 | |
Epoch 2 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1676 ; Perplexity 427.64 | |
Epoch 2 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1682 ; Perplexity 424.14 | |
Epoch 2 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 422.36 | |
Epoch 2 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 416.05 | |
Epoch 2 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 413.06 | |
Epoch 2 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1699 ; Perplexity 408.36 | |
Epoch 2 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 405.32 | |
Epoch 2 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 401.17 | |
Epoch 2 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 399.63 | |
Epoch 2 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 396.20 | |
Epoch 2 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 394.26 | |
Epoch 2 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 391.51 | |
Epoch 2 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 388.30 | |
Epoch 2 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 385.28 | |
Epoch 2 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 382.33 | |
Epoch 2 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 380.26 | |
Epoch 2 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 377.71 | |
Epoch 2 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 375.50 | |
Epoch 2 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 373.46 | |
Epoch 2 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 371.79 | |
Epoch 2 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 369.53 | |
Epoch 2 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 367.05 | |
Epoch 2 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 365.41 | |
Epoch 2 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 363.58 | |
Epoch 2 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 361.58 | |
Epoch 2 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 360.03 | |
Epoch 2 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 357.75 | |
Epoch 2 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 355.94 | |
Epoch 2 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 354.67 | |
Validation perplexity: 256.58764506613 | |
Saving checkpoint to 'models/layer2_epoch2_256.59.t7'... | |
Epoch 3 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1829 ; Perplexity 303.51 | |
Epoch 3 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1732 ; Perplexity 296.83 | |
Epoch 3 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 294.55 | |
Epoch 3 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 294.69 | |
Epoch 3 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 294.09 | |
Epoch 3 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1711 ; Perplexity 293.10 | |
Epoch 3 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1726 ; Perplexity 293.54 | |
Epoch 3 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1725 ; Perplexity 293.74 | |
Epoch 3 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1725 ; Perplexity 292.27 | |
Epoch 3 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 290.13 | |
Epoch 3 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 288.81 | |
Epoch 3 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 288.26 | |
Epoch 3 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 287.32 | |
Epoch 3 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 286.14 | |
Epoch 3 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 285.04 | |
Epoch 3 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1718 ; Perplexity 284.09 | |
Epoch 3 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 282.92 | |
Epoch 3 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 281.71 | |
Epoch 3 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 280.73 | |
Epoch 3 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 279.32 | |
Epoch 3 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 277.81 | |
Epoch 3 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 276.67 | |
Epoch 3 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 275.63 | |
Epoch 3 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 274.77 | |
Epoch 3 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 273.34 | |
Epoch 3 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 272.09 | |
Epoch 3 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 271.47 | |
Epoch 3 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 270.67 | |
Epoch 3 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 269.97 | |
Epoch 3 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 269.18 | |
Epoch 3 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 268.43 | |
Validation perplexity: 211.26914097121 | |
Saving checkpoint to 'models/layer2_epoch3_211.27.t7'... | |
Epoch 4 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1721 ; Perplexity 242.01 | |
Epoch 4 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 241.64 | |
Epoch 4 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1678 ; Perplexity 240.22 | |
Epoch 4 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 239.09 | |
Epoch 4 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1722 ; Perplexity 239.13 | |
Epoch 4 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1719 ; Perplexity 239.40 | |
Epoch 4 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 239.28 | |
Epoch 4 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 242.00 | |
Epoch 4 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 244.38 | |
Epoch 4 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1711 ; Perplexity 245.71 | |
Epoch 4 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 246.82 | |
Epoch 4 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 251.09 | |
Epoch 4 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.86 | |
Epoch 4 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 252.61 | |
Epoch 4 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 251.86 | |
Epoch 4 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 251.06 | |
Epoch 4 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 251.26 | |
Epoch 4 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 250.95 | |
Epoch 4 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 250.35 | |
Epoch 4 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 250.24 | |
Epoch 4 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 251.09 | |
Epoch 4 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 252.06 | |
Epoch 4 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.18 | |
Epoch 4 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.95 | |
Epoch 4 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 253.35 | |
Epoch 4 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1711 ; Perplexity 253.39 | |
Epoch 4 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 253.01 | |
Epoch 4 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.77 | |
Epoch 4 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 252.40 | |
Epoch 4 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 251.76 | |
Epoch 4 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 251.39 | |
Validation perplexity: 194.63330533388 | |
Saving checkpoint to 'models/layer2_epoch4_194.63.t7'... | |
Epoch 5 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1689 ; Perplexity 234.53 | |
Epoch 5 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1656 ; Perplexity 230.86 | |
Epoch 5 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1679 ; Perplexity 227.81 | |
Epoch 5 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 228.31 | |
Epoch 5 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 228.68 | |
Epoch 5 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 227.03 | |
Epoch 5 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 226.53 | |
Epoch 5 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 224.88 | |
Epoch 5 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 224.52 | |
Epoch 5 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1700 ; Perplexity 223.65 | |
Epoch 5 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1700 ; Perplexity 223.45 | |
Epoch 5 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 222.96 | |
Epoch 5 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 222.45 | |
Epoch 5 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 221.56 | |
Epoch 5 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 220.92 | |
Epoch 5 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 220.29 | |
Epoch 5 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1699 ; Perplexity 219.87 | |
Epoch 5 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1698 ; Perplexity 219.42 | |
Epoch 5 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 219.52 | |
Epoch 5 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 219.21 | |
Epoch 5 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 219.33 | |
Epoch 5 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 219.21 | |
Epoch 5 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 219.05 | |
Epoch 5 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 218.63 | |
Epoch 5 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 218.62 | |
Epoch 5 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 218.34 | |
Epoch 5 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 218.08 | |
Epoch 5 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 217.91 | |
Epoch 5 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 218.22 | |
Epoch 5 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 217.86 | |
Epoch 5 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 217.52 | |
Validation perplexity: 170.92977721733 | |
Saving checkpoint to 'models/layer2_epoch5_170.93.t7'... | |
Epoch 6 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1644 ; Perplexity 192.60 | |
Epoch 6 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 201.10 | |
Epoch 6 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 203.90 | |
Epoch 6 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 202.93 | |
Epoch 6 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 202.45 | |
Epoch 6 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 203.62 | |
Epoch 6 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1718 ; Perplexity 202.28 | |
Epoch 6 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1720 ; Perplexity 201.32 | |
Epoch 6 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 199.89 | |
Epoch 6 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 200.01 | |
Epoch 6 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 200.41 | |
Epoch 6 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 200.79 | |
Epoch 6 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1719 ; Perplexity 200.81 | |
Epoch 6 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1728 ; Perplexity 201.30 | |
Epoch 6 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1727 ; Perplexity 201.53 | |
Epoch 6 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1726 ; Perplexity 202.11 | |
Epoch 6 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1725 ; Perplexity 202.34 | |
Epoch 6 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1727 ; Perplexity 203.33 | |
Epoch 6 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1727 ; Perplexity 203.70 | |
Epoch 6 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1723 ; Perplexity 203.42 | |
Epoch 6 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1719 ; Perplexity 203.16 | |
Epoch 6 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1718 ; Perplexity 202.99 | |
Epoch 6 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 202.68 | |
Epoch 6 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 202.31 | |
Epoch 6 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 201.99 | |
Epoch 6 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 201.51 | |
Epoch 6 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 201.52 | |
Epoch 6 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 201.39 | |
Epoch 6 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 201.18 | |
Epoch 6 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 201.04 | |
Epoch 6 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 200.95 | |
Validation perplexity: 161.07151983724 | |
Saving checkpoint to 'models/layer2_epoch6_161.07.t7'... | |
Epoch 7 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1643 ; Perplexity 185.46 | |
Epoch 7 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1629 ; Perplexity 182.62 | |
Epoch 7 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1663 ; Perplexity 185.53 | |
Epoch 7 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1679 ; Perplexity 186.52 | |
Epoch 7 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1676 ; Perplexity 185.86 | |
Epoch 7 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1688 ; Perplexity 188.68 | |
Epoch 7 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 189.29 | |
Epoch 7 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 188.79 | |
Epoch 7 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 189.59 | |
Epoch 7 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 189.04 | |
Epoch 7 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 189.09 | |
Epoch 7 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 188.59 | |
Epoch 7 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 188.85 | |
Epoch 7 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 188.72 | |
Epoch 7 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 188.56 | |
Epoch 7 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 188.83 | |
Epoch 7 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 188.66 | |
Epoch 7 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 188.35 | |
Epoch 7 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 187.94 | |
Epoch 7 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 187.07 | |
Epoch 7 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 186.45 | |
Epoch 7 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 186.51 | |
Epoch 7 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 186.53 | |
Epoch 7 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 186.07 | |
Epoch 7 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 185.90 | |
Epoch 7 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 185.62 | |
Epoch 7 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 185.27 | |
Epoch 7 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 185.07 | |
Epoch 7 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 185.14 | |
Epoch 7 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 185.00 | |
Epoch 7 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 184.80 | |
Validation perplexity: 150.74778202042 | |
Saving checkpoint to 'models/layer2_epoch7_150.75.t7'... | |
Epoch 8 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1565 ; Perplexity 173.57 | |
Epoch 8 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1639 ; Perplexity 172.95 | |
Epoch 8 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1666 ; Perplexity 173.36 | |
Epoch 8 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1654 ; Perplexity 173.30 | |
Epoch 8 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1662 ; Perplexity 174.01 | |
Epoch 8 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1659 ; Perplexity 174.91 | |
Epoch 8 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1676 ; Perplexity 174.82 | |
Epoch 8 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1681 ; Perplexity 175.32 | |
Epoch 8 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1689 ; Perplexity 176.03 | |
Epoch 8 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1685 ; Perplexity 175.77 | |
Epoch 8 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 176.51 | |
Epoch 8 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 176.66 | |
Epoch 8 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1690 ; Perplexity 176.70 | |
Epoch 8 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1690 ; Perplexity 176.52 | |
Epoch 8 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1695 ; Perplexity 176.89 | |
Epoch 8 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 177.36 | |
Epoch 8 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1694 ; Perplexity 177.94 | |
Epoch 8 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 178.35 | |
Epoch 8 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 178.17 | |
Epoch 8 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 177.98 | |
Epoch 8 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1694 ; Perplexity 178.24 | |
Epoch 8 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1698 ; Perplexity 178.53 | |
Epoch 8 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 179.08 | |
Epoch 8 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 178.99 | |
Epoch 8 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 178.98 | |
Epoch 8 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 179.16 | |
Epoch 8 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 179.52 | |
Epoch 8 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 179.30 | |
Epoch 8 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 179.31 | |
Epoch 8 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 178.93 | |
Epoch 8 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 178.77 | |
Validation perplexity: 144.36360414428 | |
Saving checkpoint to 'models/layer2_epoch8_144.36.t7'... | |
Epoch 9 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1621 ; Perplexity 157.25 | |
Epoch 9 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1664 ; Perplexity 161.18 | |
Epoch 9 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 166.22 | |
Epoch 9 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1691 ; Perplexity 167.00 | |
Epoch 9 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1695 ; Perplexity 169.06 | |
Epoch 9 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1694 ; Perplexity 169.24 | |
Epoch 9 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1683 ; Perplexity 169.18 | |
Epoch 9 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1688 ; Perplexity 169.81 | |
Epoch 9 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1687 ; Perplexity 170.27 | |
Epoch 9 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1688 ; Perplexity 169.48 | |
Epoch 9 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 169.71 | |
Epoch 9 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 169.23 | |
Epoch 9 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1692 ; Perplexity 168.96 | |
Epoch 9 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1698 ; Perplexity 168.93 | |
Epoch 9 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1699 ; Perplexity 168.71 | |
Epoch 9 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 168.94 | |
Epoch 9 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 168.62 | |
Epoch 9 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 168.58 | |
Epoch 9 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 168.74 | |
Epoch 9 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 168.25 | |
Epoch 9 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 168.11 | |
Epoch 9 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 167.96 | |
Epoch 9 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 167.36 | |
Epoch 9 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 167.23 | |
Epoch 9 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 167.12 | |
Epoch 9 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 167.14 | |
Epoch 9 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 166.62 | |
Epoch 9 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 166.45 | |
Epoch 9 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 166.47 | |
Epoch 9 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 166.54 | |
Epoch 9 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 166.47 | |
Validation perplexity: 136.15701878949 | |
Saving checkpoint to 'models/layer2_epoch9_136.16.t7'... | |
Epoch 10 ; Iteration 50/1587 ; Learning rate 0.5000 ; Source tokens/s 1657 ; Perplexity 134.50 | |
Epoch 10 ; Iteration 100/1587 ; Learning rate 0.5000 ; Source tokens/s 1689 ; Perplexity 138.81 | |
Epoch 10 ; Iteration 150/1587 ; Learning rate 0.5000 ; Source tokens/s 1709 ; Perplexity 139.99 | |
Epoch 10 ; Iteration 200/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 140.88 | |
Epoch 10 ; Iteration 250/1587 ; Learning rate 0.5000 ; Source tokens/s 1706 ; Perplexity 141.07 | |
Epoch 10 ; Iteration 300/1587 ; Learning rate 0.5000 ; Source tokens/s 1695 ; Perplexity 140.48 | |
Epoch 10 ; Iteration 350/1587 ; Learning rate 0.5000 ; Source tokens/s 1708 ; Perplexity 141.52 | |
Epoch 10 ; Iteration 400/1587 ; Learning rate 0.5000 ; Source tokens/s 1723 ; Perplexity 142.02 | |
Epoch 10 ; Iteration 450/1587 ; Learning rate 0.5000 ; Source tokens/s 1712 ; Perplexity 141.03 | |
Epoch 10 ; Iteration 500/1587 ; Learning rate 0.5000 ; Source tokens/s 1709 ; Perplexity 141.20 | |
Epoch 10 ; Iteration 550/1587 ; Learning rate 0.5000 ; Source tokens/s 1708 ; Perplexity 140.81 | |
Epoch 10 ; Iteration 600/1587 ; Learning rate 0.5000 ; Source tokens/s 1706 ; Perplexity 141.01 | |
Epoch 10 ; Iteration 650/1587 ; Learning rate 0.5000 ; Source tokens/s 1705 ; Perplexity 141.41 | |
Epoch 10 ; Iteration 700/1587 ; Learning rate 0.5000 ; Source tokens/s 1702 ; Perplexity 142.02 | |
Epoch 10 ; Iteration 750/1587 ; Learning rate 0.5000 ; Source tokens/s 1698 ; Perplexity 141.56 | |
Epoch 10 ; Iteration 800/1587 ; Learning rate 0.5000 ; Source tokens/s 1696 ; Perplexity 141.36 | |
Epoch 10 ; Iteration 850/1587 ; Learning rate 0.5000 ; Source tokens/s 1694 ; Perplexity 141.15 | |
Epoch 10 ; Iteration 900/1587 ; Learning rate 0.5000 ; Source tokens/s 1694 ; Perplexity 140.80 | |
Epoch 10 ; Iteration 950/1587 ; Learning rate 0.5000 ; Source tokens/s 1693 ; Perplexity 140.83 | |
Epoch 10 ; Iteration 1000/1587 ; Learning rate 0.5000 ; Source tokens/s 1692 ; Perplexity 140.77 | |
Epoch 10 ; Iteration 1050/1587 ; Learning rate 0.5000 ; Source tokens/s 1694 ; Perplexity 140.93 | |
Epoch 10 ; Iteration 1100/1587 ; Learning rate 0.5000 ; Source tokens/s 1699 ; Perplexity 141.35 | |
Epoch 10 ; Iteration 1150/1587 ; Learning rate 0.5000 ; Source tokens/s 1703 ; Perplexity 141.65 | |
Epoch 10 ; Iteration 1200/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 141.26 | |
Epoch 10 ; Iteration 1250/1587 ; Learning rate 0.5000 ; Source tokens/s 1700 ; Perplexity 141.07 | |
Epoch 10 ; Iteration 1300/1587 ; Learning rate 0.5000 ; Source tokens/s 1702 ; Perplexity 140.88 | |
Epoch 10 ; Iteration 1350/1587 ; Learning rate 0.5000 ; Source tokens/s 1700 ; Perplexity 140.74 | |
Epoch 10 ; Iteration 1400/1587 ; Learning rate 0.5000 ; Source tokens/s 1700 ; Perplexity 140.74 | |
Epoch 10 ; Iteration 1450/1587 ; Learning rate 0.5000 ; Source tokens/s 1701 ; Perplexity 140.59 | |
Epoch 10 ; Iteration 1500/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 140.57 | |
Epoch 10 ; Iteration 1550/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 140.64 | |
Validation perplexity: 123.56979918086 | |
Saving checkpoint to 'models/layer2_epoch10_123.57.t7'... | |
Epoch 11 ; Iteration 50/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 140.05 | |
Epoch 11 ; Iteration 100/1587 ; Learning rate 0.2500 ; Source tokens/s 1756 ; Perplexity 137.90 | |
Epoch 11 ; Iteration 150/1587 ; Learning rate 0.2500 ; Source tokens/s 1743 ; Perplexity 136.79 | |
Epoch 11 ; Iteration 200/1587 ; Learning rate 0.2500 ; Source tokens/s 1743 ; Perplexity 134.28 | |
Epoch 11 ; Iteration 250/1587 ; Learning rate 0.2500 ; Source tokens/s 1755 ; Perplexity 136.32 | |
Epoch 11 ; Iteration 300/1587 ; Learning rate 0.2500 ; Source tokens/s 1745 ; Perplexity 135.24 | |
Epoch 11 ; Iteration 350/1587 ; Learning rate 0.2500 ; Source tokens/s 1735 ; Perplexity 134.41 | |
Epoch 11 ; Iteration 400/1587 ; Learning rate 0.2500 ; Source tokens/s 1730 ; Perplexity 134.01 | |
Epoch 11 ; Iteration 450/1587 ; Learning rate 0.2500 ; Source tokens/s 1717 ; Perplexity 133.43 | |
Epoch 11 ; Iteration 500/1587 ; Learning rate 0.2500 ; Source tokens/s 1713 ; Perplexity 134.15 | |
Epoch 11 ; Iteration 550/1587 ; Learning rate 0.2500 ; Source tokens/s 1708 ; Perplexity 133.58 | |
Epoch 11 ; Iteration 600/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.87 | |
Epoch 11 ; Iteration 650/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 133.19 | |
Epoch 11 ; Iteration 700/1587 ; Learning rate 0.2500 ; Source tokens/s 1708 ; Perplexity 133.45 | |
Epoch 11 ; Iteration 750/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 133.10 | |
Epoch 11 ; Iteration 800/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.91 | |
Epoch 11 ; Iteration 850/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 132.84 | |
Epoch 11 ; Iteration 900/1587 ; Learning rate 0.2500 ; Source tokens/s 1702 ; Perplexity 132.70 | |
Epoch 11 ; Iteration 950/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.92 | |
Epoch 11 ; Iteration 1000/1587 ; Learning rate 0.2500 ; Source tokens/s 1702 ; Perplexity 132.62 | |
Epoch 11 ; Iteration 1050/1587 ; Learning rate 0.2500 ; Source tokens/s 1699 ; Perplexity 132.45 | |
Epoch 11 ; Iteration 1100/1587 ; Learning rate 0.2500 ; Source tokens/s 1700 ; Perplexity 132.50 | |
Epoch 11 ; Iteration 1150/1587 ; Learning rate 0.2500 ; Source tokens/s 1702 ; Perplexity 132.43 | |
Epoch 11 ; Iteration 1200/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.41 | |
Epoch 11 ; Iteration 1250/1587 ; Learning rate 0.2500 ; Source tokens/s 1706 ; Perplexity 132.32 | |
Epoch 11 ; Iteration 1300/1587 ; Learning rate 0.2500 ; Source tokens/s 1706 ; Perplexity 132.51 | |
Epoch 11 ; Iteration 1350/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 132.44 | |
Epoch 11 ; Iteration 1400/1587 ; Learning rate 0.2500 ; Source tokens/s 1706 ; Perplexity 132.35 | |
Epoch 11 ; Iteration 1450/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.03 | |
Epoch 11 ; Iteration 1500/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 131.91 | |
Epoch 11 ; Iteration 1550/1587 ; Learning rate 0.2500 ; Source tokens/s 1703 ; Perplexity 131.72 | |
Validation perplexity: 122.53690966475 | |
Saving checkpoint to 'models/layer2_epoch11_122.54.t7'... | |
Epoch 12 ; Iteration 50/1587 ; Learning rate 0.1250 ; Source tokens/s 1696 ; Perplexity 126.63 | |
Epoch 12 ; Iteration 100/1587 ; Learning rate 0.1250 ; Source tokens/s 1733 ; Perplexity 132.78 | |
Epoch 12 ; Iteration 150/1587 ; Learning rate 0.1250 ; Source tokens/s 1719 ; Perplexity 130.82 | |
Epoch 12 ; Iteration 200/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 130.99 | |
Epoch 12 ; Iteration 250/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 131.75 | |
Epoch 12 ; Iteration 300/1587 ; Learning rate 0.1250 ; Source tokens/s 1691 ; Perplexity 130.67 | |
Epoch 12 ; Iteration 350/1587 ; Learning rate 0.1250 ; Source tokens/s 1697 ; Perplexity 129.66 | |
Epoch 12 ; Iteration 400/1587 ; Learning rate 0.1250 ; Source tokens/s 1698 ; Perplexity 129.25 | |
Epoch 12 ; Iteration 450/1587 ; Learning rate 0.1250 ; Source tokens/s 1690 ; Perplexity 127.47 | |
Epoch 12 ; Iteration 500/1587 ; Learning rate 0.1250 ; Source tokens/s 1689 ; Perplexity 127.06 | |
Epoch 12 ; Iteration 550/1587 ; Learning rate 0.1250 ; Source tokens/s 1696 ; Perplexity 127.82 | |
Epoch 12 ; Iteration 600/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 128.56 | |
Epoch 12 ; Iteration 650/1587 ; Learning rate 0.1250 ; Source tokens/s 1703 ; Perplexity 128.26 | |
Epoch 12 ; Iteration 700/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 128.16 | |
Epoch 12 ; Iteration 750/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 128.12 | |
Epoch 12 ; Iteration 800/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 127.95 | |
Epoch 12 ; Iteration 850/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.59 | |
Epoch 12 ; Iteration 900/1587 ; Learning rate 0.1250 ; Source tokens/s 1703 ; Perplexity 127.26 | |
Epoch 12 ; Iteration 950/1587 ; Learning rate 0.1250 ; Source tokens/s 1702 ; Perplexity 126.98 | |
Epoch 12 ; Iteration 1000/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.22 | |
Epoch 12 ; Iteration 1050/1587 ; Learning rate 0.1250 ; Source tokens/s 1699 ; Perplexity 126.84 | |
Epoch 12 ; Iteration 1100/1587 ; Learning rate 0.1250 ; Source tokens/s 1700 ; Perplexity 126.85 | |
Epoch 12 ; Iteration 1150/1587 ; Learning rate 0.1250 ; Source tokens/s 1702 ; Perplexity 127.02 | |
Epoch 12 ; Iteration 1200/1587 ; Learning rate 0.1250 ; Source tokens/s 1704 ; Perplexity 127.18 | |
Epoch 12 ; Iteration 1250/1587 ; Learning rate 0.1250 ; Source tokens/s 1707 ; Perplexity 127.51 | |
Epoch 12 ; Iteration 1300/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 127.47 | |
Epoch 12 ; Iteration 1350/1587 ; Learning rate 0.1250 ; Source tokens/s 1708 ; Perplexity 127.63 | |
Epoch 12 ; Iteration 1400/1587 ; Learning rate 0.1250 ; Source tokens/s 1709 ; Perplexity 127.79 | |
Epoch 12 ; Iteration 1450/1587 ; Learning rate 0.1250 ; Source tokens/s 1707 ; Perplexity 127.91 | |
Epoch 12 ; Iteration 1500/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.87 | |
Epoch 12 ; Iteration 1550/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.82 | |
Validation perplexity: 116.08285402734 | |
Saving checkpoint to 'models/layer2_epoch12_116.08.t7'... | |
Epoch 13 ; Iteration 50/1587 ; Learning rate 0.0625 ; Source tokens/s 1689 ; Perplexity 124.97 | |
Epoch 13 ; Iteration 100/1587 ; Learning rate 0.0625 ; Source tokens/s 1692 ; Perplexity 128.43 | |
Epoch 13 ; Iteration 150/1587 ; Learning rate 0.0625 ; Source tokens/s 1667 ; Perplexity 127.50 | |
Epoch 13 ; Iteration 200/1587 ; Learning rate 0.0625 ; Source tokens/s 1677 ; Perplexity 127.47 | |
Epoch 13 ; Iteration 250/1587 ; Learning rate 0.0625 ; Source tokens/s 1661 ; Perplexity 125.64 | |
Epoch 13 ; Iteration 300/1587 ; Learning rate 0.0625 ; Source tokens/s 1656 ; Perplexity 125.52 | |
Epoch 13 ; Iteration 350/1587 ; Learning rate 0.0625 ; Source tokens/s 1667 ; Perplexity 126.32 | |
Epoch 13 ; Iteration 400/1587 ; Learning rate 0.0625 ; Source tokens/s 1681 ; Perplexity 126.17 | |
Epoch 13 ; Iteration 450/1587 ; Learning rate 0.0625 ; Source tokens/s 1681 ; Perplexity 125.55 | |
Epoch 13 ; Iteration 500/1587 ; Learning rate 0.0625 ; Source tokens/s 1679 ; Perplexity 125.35 | |
Epoch 13 ; Iteration 550/1587 ; Learning rate 0.0625 ; Source tokens/s 1682 ; Perplexity 125.38 | |
Epoch 13 ; Iteration 600/1587 ; Learning rate 0.0625 ; Source tokens/s 1672 ; Perplexity 124.34 | |
Epoch 13 ; Iteration 650/1587 ; Learning rate 0.0625 ; Source tokens/s 1676 ; Perplexity 124.24 | |
Epoch 13 ; Iteration 700/1587 ; Learning rate 0.0625 ; Source tokens/s 1686 ; Perplexity 124.74 | |
Epoch 13 ; Iteration 750/1587 ; Learning rate 0.0625 ; Source tokens/s 1688 ; Perplexity 124.75 | |
Epoch 13 ; Iteration 800/1587 ; Learning rate 0.0625 ; Source tokens/s 1690 ; Perplexity 125.13 | |
Epoch 13 ; Iteration 850/1587 ; Learning rate 0.0625 ; Source tokens/s 1694 ; Perplexity 125.10 | |
Epoch 13 ; Iteration 900/1587 ; Learning rate 0.0625 ; Source tokens/s 1698 ; Perplexity 125.66 | |
Epoch 13 ; Iteration 950/1587 ; Learning rate 0.0625 ; Source tokens/s 1695 ; Perplexity 125.21 | |
Epoch 13 ; Iteration 1000/1587 ; Learning rate 0.0625 ; Source tokens/s 1695 ; Perplexity 124.89 | |
Epoch 13 ; Iteration 1050/1587 ; Learning rate 0.0625 ; Source tokens/s 1696 ; Perplexity 125.06 | |
Epoch 13 ; Iteration 1100/1587 ; Learning rate 0.0625 ; Source tokens/s 1696 ; Perplexity 125.01 | |
Epoch 13 ; Iteration 1150/1587 ; Learning rate 0.0625 ; Source tokens/s 1699 ; Perplexity 124.86 | |
Epoch 13 ; Iteration 1200/1587 ; Learning rate 0.0625 ; Source tokens/s 1704 ; Perplexity 125.07 | |
Epoch 13 ; Iteration 1250/1587 ; Learning rate 0.0625 ; Source tokens/s 1704 ; Perplexity 124.86 | |
Epoch 13 ; Iteration 1300/1587 ; Learning rate 0.0625 ; Source tokens/s 1703 ; Perplexity 125.09 | |
Epoch 13 ; Iteration 1350/1587 ; Learning rate 0.0625 ; Source tokens/s 1706 ; Perplexity 125.07 | |
Epoch 13 ; Iteration 1400/1587 ; Learning rate 0.0625 ; Source tokens/s 1703 ; Perplexity 124.81 | |
Epoch 13 ; Iteration 1450/1587 ; Learning rate 0.0625 ; Source tokens/s 1702 ; Perplexity 124.65 | |
Epoch 13 ; Iteration 1500/1587 ; Learning rate 0.0625 ; Source tokens/s 1703 ; Perplexity 124.74 | |
Epoch 13 ; Iteration 1550/1587 ; Learning rate 0.0625 ; Source tokens/s 1704 ; Perplexity 124.72 | |
Validation perplexity: 114.32841593845 | |
Saving checkpoint to 'models/layer2_epoch13_114.33.t7'... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment