Skip to content

Instantly share code, notes, and snippets.

@rachtsingh
Created December 30, 2016 19:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rachtsingh/1c372a62f420fa972e8fed2a50c2da7b to your computer and use it in GitHub Desktop.
Save rachtsingh/1c372a62f420fa972e8fed2a50c2da7b to your computer and use it in GitHub Desktop.
seq2seq with layer normalization enabled
Loading data from 'data/small-train.t7'...
* vocabulary size: source = 50004; target = 50004
* additional features: source = 0; target = 0
* maximum sequence length: source = 50; target = 51
* number of training sentences: 100000
* maximum batch size: 64
Building model...
* using input feeding
Initializing parameters...
* number of parameters: 84818004
Preparing memory optimization...
* sharing 76% of output/gradInput tensors memory between clones
Start training...
Epoch 1 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 739 ; Perplexity 262030.13
Epoch 1 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1041 ; Perplexity 55043.06
Epoch 1 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1215 ; Perplexity 25610.45
Epoch 1 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1290 ; Perplexity 14892.75
Epoch 1 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1351 ; Perplexity 9645.93
Epoch 1 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1392 ; Perplexity 7104.51
Epoch 1 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1416 ; Perplexity 5650.19
Epoch 1 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1454 ; Perplexity 4640.61
Epoch 1 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1485 ; Perplexity 3900.92
Epoch 1 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1504 ; Perplexity 3392.54
Epoch 1 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1519 ; Perplexity 3023.87
Epoch 1 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1533 ; Perplexity 2739.26
Epoch 1 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1552 ; Perplexity 2507.95
Epoch 1 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1567 ; Perplexity 2316.60
Epoch 1 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1572 ; Perplexity 2161.89
Epoch 1 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1582 ; Perplexity 2023.06
Epoch 1 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1589 ; Perplexity 1905.28
Epoch 1 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1593 ; Perplexity 1801.04
Epoch 1 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1598 ; Perplexity 1705.17
Epoch 1 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1607 ; Perplexity 1618.89
Epoch 1 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1611 ; Perplexity 1545.47
Epoch 1 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1614 ; Perplexity 1481.86
Epoch 1 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1617 ; Perplexity 1422.67
Epoch 1 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1619 ; Perplexity 1368.80
Epoch 1 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1627 ; Perplexity 1314.19
Epoch 1 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1626 ; Perplexity 1271.36
Epoch 1 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1629 ; Perplexity 1228.62
Epoch 1 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1626 ; Perplexity 1191.57
Epoch 1 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1634 ; Perplexity 1152.65
Epoch 1 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1637 ; Perplexity 1118.02
Epoch 1 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1638 ; Perplexity 1086.31
Validation perplexity: 369.20971129663
Saving checkpoint to 'models/layer2_epoch1_369.21.t7'...
Epoch 2 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1722 ; Perplexity 429.12
Epoch 2 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1670 ; Perplexity 427.39
Epoch 2 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1676 ; Perplexity 427.64
Epoch 2 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1682 ; Perplexity 424.14
Epoch 2 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 422.36
Epoch 2 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 416.05
Epoch 2 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 413.06
Epoch 2 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1699 ; Perplexity 408.36
Epoch 2 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 405.32
Epoch 2 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 401.17
Epoch 2 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 399.63
Epoch 2 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 396.20
Epoch 2 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 394.26
Epoch 2 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 391.51
Epoch 2 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 388.30
Epoch 2 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 385.28
Epoch 2 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 382.33
Epoch 2 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 380.26
Epoch 2 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 377.71
Epoch 2 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 375.50
Epoch 2 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 373.46
Epoch 2 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 371.79
Epoch 2 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 369.53
Epoch 2 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 367.05
Epoch 2 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 365.41
Epoch 2 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 363.58
Epoch 2 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 361.58
Epoch 2 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 360.03
Epoch 2 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 357.75
Epoch 2 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 355.94
Epoch 2 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 354.67
Validation perplexity: 256.58764506613
Saving checkpoint to 'models/layer2_epoch2_256.59.t7'...
Epoch 3 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1829 ; Perplexity 303.51
Epoch 3 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1732 ; Perplexity 296.83
Epoch 3 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 294.55
Epoch 3 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 294.69
Epoch 3 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 294.09
Epoch 3 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1711 ; Perplexity 293.10
Epoch 3 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1726 ; Perplexity 293.54
Epoch 3 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1725 ; Perplexity 293.74
Epoch 3 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1725 ; Perplexity 292.27
Epoch 3 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 290.13
Epoch 3 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 288.81
Epoch 3 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 288.26
Epoch 3 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 287.32
Epoch 3 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 286.14
Epoch 3 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 285.04
Epoch 3 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1718 ; Perplexity 284.09
Epoch 3 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 282.92
Epoch 3 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 281.71
Epoch 3 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 280.73
Epoch 3 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 279.32
Epoch 3 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 277.81
Epoch 3 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 276.67
Epoch 3 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 275.63
Epoch 3 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 274.77
Epoch 3 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 273.34
Epoch 3 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 272.09
Epoch 3 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 271.47
Epoch 3 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 270.67
Epoch 3 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 269.97
Epoch 3 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 269.18
Epoch 3 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 268.43
Validation perplexity: 211.26914097121
Saving checkpoint to 'models/layer2_epoch3_211.27.t7'...
Epoch 4 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1721 ; Perplexity 242.01
Epoch 4 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 241.64
Epoch 4 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1678 ; Perplexity 240.22
Epoch 4 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 239.09
Epoch 4 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1722 ; Perplexity 239.13
Epoch 4 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1719 ; Perplexity 239.40
Epoch 4 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 239.28
Epoch 4 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 242.00
Epoch 4 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 244.38
Epoch 4 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1711 ; Perplexity 245.71
Epoch 4 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 246.82
Epoch 4 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 251.09
Epoch 4 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.86
Epoch 4 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 252.61
Epoch 4 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 251.86
Epoch 4 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 251.06
Epoch 4 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 251.26
Epoch 4 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 250.95
Epoch 4 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 250.35
Epoch 4 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 250.24
Epoch 4 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 251.09
Epoch 4 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 252.06
Epoch 4 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.18
Epoch 4 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.95
Epoch 4 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 253.35
Epoch 4 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1711 ; Perplexity 253.39
Epoch 4 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 253.01
Epoch 4 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 252.77
Epoch 4 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 252.40
Epoch 4 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 251.76
Epoch 4 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 251.39
Validation perplexity: 194.63330533388
Saving checkpoint to 'models/layer2_epoch4_194.63.t7'...
Epoch 5 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1689 ; Perplexity 234.53
Epoch 5 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1656 ; Perplexity 230.86
Epoch 5 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1679 ; Perplexity 227.81
Epoch 5 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 228.31
Epoch 5 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 228.68
Epoch 5 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 227.03
Epoch 5 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 226.53
Epoch 5 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 224.88
Epoch 5 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1710 ; Perplexity 224.52
Epoch 5 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1700 ; Perplexity 223.65
Epoch 5 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1700 ; Perplexity 223.45
Epoch 5 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 222.96
Epoch 5 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 222.45
Epoch 5 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 221.56
Epoch 5 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 220.92
Epoch 5 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 220.29
Epoch 5 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1699 ; Perplexity 219.87
Epoch 5 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1698 ; Perplexity 219.42
Epoch 5 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 219.52
Epoch 5 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 219.21
Epoch 5 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 219.33
Epoch 5 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 219.21
Epoch 5 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 219.05
Epoch 5 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 218.63
Epoch 5 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 218.62
Epoch 5 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 218.34
Epoch 5 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 218.08
Epoch 5 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 217.91
Epoch 5 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 218.22
Epoch 5 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 217.86
Epoch 5 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 217.52
Validation perplexity: 170.92977721733
Saving checkpoint to 'models/layer2_epoch5_170.93.t7'...
Epoch 6 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1644 ; Perplexity 192.60
Epoch 6 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 201.10
Epoch 6 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 203.90
Epoch 6 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 202.93
Epoch 6 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 202.45
Epoch 6 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 203.62
Epoch 6 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1718 ; Perplexity 202.28
Epoch 6 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1720 ; Perplexity 201.32
Epoch 6 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 199.89
Epoch 6 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 200.01
Epoch 6 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 200.41
Epoch 6 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 200.79
Epoch 6 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1719 ; Perplexity 200.81
Epoch 6 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1728 ; Perplexity 201.30
Epoch 6 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1727 ; Perplexity 201.53
Epoch 6 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1726 ; Perplexity 202.11
Epoch 6 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1725 ; Perplexity 202.34
Epoch 6 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1727 ; Perplexity 203.33
Epoch 6 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1727 ; Perplexity 203.70
Epoch 6 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1723 ; Perplexity 203.42
Epoch 6 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1719 ; Perplexity 203.16
Epoch 6 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1718 ; Perplexity 202.99
Epoch 6 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1716 ; Perplexity 202.68
Epoch 6 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 202.31
Epoch 6 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 201.99
Epoch 6 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 201.51
Epoch 6 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 201.52
Epoch 6 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 201.39
Epoch 6 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 201.18
Epoch 6 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 201.04
Epoch 6 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 200.95
Validation perplexity: 161.07151983724
Saving checkpoint to 'models/layer2_epoch6_161.07.t7'...
Epoch 7 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1643 ; Perplexity 185.46
Epoch 7 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1629 ; Perplexity 182.62
Epoch 7 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1663 ; Perplexity 185.53
Epoch 7 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1679 ; Perplexity 186.52
Epoch 7 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1676 ; Perplexity 185.86
Epoch 7 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1688 ; Perplexity 188.68
Epoch 7 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 189.29
Epoch 7 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 188.79
Epoch 7 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 189.59
Epoch 7 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 189.04
Epoch 7 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 189.09
Epoch 7 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 188.59
Epoch 7 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 188.85
Epoch 7 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 188.72
Epoch 7 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1714 ; Perplexity 188.56
Epoch 7 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1717 ; Perplexity 188.83
Epoch 7 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1713 ; Perplexity 188.66
Epoch 7 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 188.35
Epoch 7 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 187.94
Epoch 7 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 187.07
Epoch 7 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 186.45
Epoch 7 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 186.51
Epoch 7 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 186.53
Epoch 7 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 186.07
Epoch 7 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 185.90
Epoch 7 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 185.62
Epoch 7 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 185.27
Epoch 7 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 185.07
Epoch 7 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 185.14
Epoch 7 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 185.00
Epoch 7 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 184.80
Validation perplexity: 150.74778202042
Saving checkpoint to 'models/layer2_epoch7_150.75.t7'...
Epoch 8 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1565 ; Perplexity 173.57
Epoch 8 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1639 ; Perplexity 172.95
Epoch 8 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1666 ; Perplexity 173.36
Epoch 8 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1654 ; Perplexity 173.30
Epoch 8 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1662 ; Perplexity 174.01
Epoch 8 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1659 ; Perplexity 174.91
Epoch 8 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1676 ; Perplexity 174.82
Epoch 8 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1681 ; Perplexity 175.32
Epoch 8 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1689 ; Perplexity 176.03
Epoch 8 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1685 ; Perplexity 175.77
Epoch 8 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 176.51
Epoch 8 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 176.66
Epoch 8 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1690 ; Perplexity 176.70
Epoch 8 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1690 ; Perplexity 176.52
Epoch 8 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1695 ; Perplexity 176.89
Epoch 8 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 177.36
Epoch 8 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1694 ; Perplexity 177.94
Epoch 8 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1701 ; Perplexity 178.35
Epoch 8 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 178.17
Epoch 8 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 177.98
Epoch 8 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1694 ; Perplexity 178.24
Epoch 8 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1698 ; Perplexity 178.53
Epoch 8 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 179.08
Epoch 8 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 178.99
Epoch 8 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 178.98
Epoch 8 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 179.16
Epoch 8 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 179.52
Epoch 8 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 179.30
Epoch 8 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 179.31
Epoch 8 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 178.93
Epoch 8 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 178.77
Validation perplexity: 144.36360414428
Saving checkpoint to 'models/layer2_epoch8_144.36.t7'...
Epoch 9 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1621 ; Perplexity 157.25
Epoch 9 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1664 ; Perplexity 161.18
Epoch 9 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 166.22
Epoch 9 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1691 ; Perplexity 167.00
Epoch 9 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1695 ; Perplexity 169.06
Epoch 9 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1694 ; Perplexity 169.24
Epoch 9 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1683 ; Perplexity 169.18
Epoch 9 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1688 ; Perplexity 169.81
Epoch 9 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1687 ; Perplexity 170.27
Epoch 9 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1688 ; Perplexity 169.48
Epoch 9 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 169.71
Epoch 9 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 169.23
Epoch 9 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1692 ; Perplexity 168.96
Epoch 9 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1698 ; Perplexity 168.93
Epoch 9 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1699 ; Perplexity 168.71
Epoch 9 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1697 ; Perplexity 168.94
Epoch 9 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 168.62
Epoch 9 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 168.58
Epoch 9 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 168.74
Epoch 9 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 168.25
Epoch 9 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 168.11
Epoch 9 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 167.96
Epoch 9 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1702 ; Perplexity 167.36
Epoch 9 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1704 ; Perplexity 167.23
Epoch 9 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1708 ; Perplexity 167.12
Epoch 9 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 167.14
Epoch 9 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 166.62
Epoch 9 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1706 ; Perplexity 166.45
Epoch 9 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1705 ; Perplexity 166.47
Epoch 9 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1707 ; Perplexity 166.54
Epoch 9 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1709 ; Perplexity 166.47
Validation perplexity: 136.15701878949
Saving checkpoint to 'models/layer2_epoch9_136.16.t7'...
Epoch 10 ; Iteration 50/1587 ; Learning rate 0.5000 ; Source tokens/s 1657 ; Perplexity 134.50
Epoch 10 ; Iteration 100/1587 ; Learning rate 0.5000 ; Source tokens/s 1689 ; Perplexity 138.81
Epoch 10 ; Iteration 150/1587 ; Learning rate 0.5000 ; Source tokens/s 1709 ; Perplexity 139.99
Epoch 10 ; Iteration 200/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 140.88
Epoch 10 ; Iteration 250/1587 ; Learning rate 0.5000 ; Source tokens/s 1706 ; Perplexity 141.07
Epoch 10 ; Iteration 300/1587 ; Learning rate 0.5000 ; Source tokens/s 1695 ; Perplexity 140.48
Epoch 10 ; Iteration 350/1587 ; Learning rate 0.5000 ; Source tokens/s 1708 ; Perplexity 141.52
Epoch 10 ; Iteration 400/1587 ; Learning rate 0.5000 ; Source tokens/s 1723 ; Perplexity 142.02
Epoch 10 ; Iteration 450/1587 ; Learning rate 0.5000 ; Source tokens/s 1712 ; Perplexity 141.03
Epoch 10 ; Iteration 500/1587 ; Learning rate 0.5000 ; Source tokens/s 1709 ; Perplexity 141.20
Epoch 10 ; Iteration 550/1587 ; Learning rate 0.5000 ; Source tokens/s 1708 ; Perplexity 140.81
Epoch 10 ; Iteration 600/1587 ; Learning rate 0.5000 ; Source tokens/s 1706 ; Perplexity 141.01
Epoch 10 ; Iteration 650/1587 ; Learning rate 0.5000 ; Source tokens/s 1705 ; Perplexity 141.41
Epoch 10 ; Iteration 700/1587 ; Learning rate 0.5000 ; Source tokens/s 1702 ; Perplexity 142.02
Epoch 10 ; Iteration 750/1587 ; Learning rate 0.5000 ; Source tokens/s 1698 ; Perplexity 141.56
Epoch 10 ; Iteration 800/1587 ; Learning rate 0.5000 ; Source tokens/s 1696 ; Perplexity 141.36
Epoch 10 ; Iteration 850/1587 ; Learning rate 0.5000 ; Source tokens/s 1694 ; Perplexity 141.15
Epoch 10 ; Iteration 900/1587 ; Learning rate 0.5000 ; Source tokens/s 1694 ; Perplexity 140.80
Epoch 10 ; Iteration 950/1587 ; Learning rate 0.5000 ; Source tokens/s 1693 ; Perplexity 140.83
Epoch 10 ; Iteration 1000/1587 ; Learning rate 0.5000 ; Source tokens/s 1692 ; Perplexity 140.77
Epoch 10 ; Iteration 1050/1587 ; Learning rate 0.5000 ; Source tokens/s 1694 ; Perplexity 140.93
Epoch 10 ; Iteration 1100/1587 ; Learning rate 0.5000 ; Source tokens/s 1699 ; Perplexity 141.35
Epoch 10 ; Iteration 1150/1587 ; Learning rate 0.5000 ; Source tokens/s 1703 ; Perplexity 141.65
Epoch 10 ; Iteration 1200/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 141.26
Epoch 10 ; Iteration 1250/1587 ; Learning rate 0.5000 ; Source tokens/s 1700 ; Perplexity 141.07
Epoch 10 ; Iteration 1300/1587 ; Learning rate 0.5000 ; Source tokens/s 1702 ; Perplexity 140.88
Epoch 10 ; Iteration 1350/1587 ; Learning rate 0.5000 ; Source tokens/s 1700 ; Perplexity 140.74
Epoch 10 ; Iteration 1400/1587 ; Learning rate 0.5000 ; Source tokens/s 1700 ; Perplexity 140.74
Epoch 10 ; Iteration 1450/1587 ; Learning rate 0.5000 ; Source tokens/s 1701 ; Perplexity 140.59
Epoch 10 ; Iteration 1500/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 140.57
Epoch 10 ; Iteration 1550/1587 ; Learning rate 0.5000 ; Source tokens/s 1704 ; Perplexity 140.64
Validation perplexity: 123.56979918086
Saving checkpoint to 'models/layer2_epoch10_123.57.t7'...
Epoch 11 ; Iteration 50/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 140.05
Epoch 11 ; Iteration 100/1587 ; Learning rate 0.2500 ; Source tokens/s 1756 ; Perplexity 137.90
Epoch 11 ; Iteration 150/1587 ; Learning rate 0.2500 ; Source tokens/s 1743 ; Perplexity 136.79
Epoch 11 ; Iteration 200/1587 ; Learning rate 0.2500 ; Source tokens/s 1743 ; Perplexity 134.28
Epoch 11 ; Iteration 250/1587 ; Learning rate 0.2500 ; Source tokens/s 1755 ; Perplexity 136.32
Epoch 11 ; Iteration 300/1587 ; Learning rate 0.2500 ; Source tokens/s 1745 ; Perplexity 135.24
Epoch 11 ; Iteration 350/1587 ; Learning rate 0.2500 ; Source tokens/s 1735 ; Perplexity 134.41
Epoch 11 ; Iteration 400/1587 ; Learning rate 0.2500 ; Source tokens/s 1730 ; Perplexity 134.01
Epoch 11 ; Iteration 450/1587 ; Learning rate 0.2500 ; Source tokens/s 1717 ; Perplexity 133.43
Epoch 11 ; Iteration 500/1587 ; Learning rate 0.2500 ; Source tokens/s 1713 ; Perplexity 134.15
Epoch 11 ; Iteration 550/1587 ; Learning rate 0.2500 ; Source tokens/s 1708 ; Perplexity 133.58
Epoch 11 ; Iteration 600/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.87
Epoch 11 ; Iteration 650/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 133.19
Epoch 11 ; Iteration 700/1587 ; Learning rate 0.2500 ; Source tokens/s 1708 ; Perplexity 133.45
Epoch 11 ; Iteration 750/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 133.10
Epoch 11 ; Iteration 800/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.91
Epoch 11 ; Iteration 850/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 132.84
Epoch 11 ; Iteration 900/1587 ; Learning rate 0.2500 ; Source tokens/s 1702 ; Perplexity 132.70
Epoch 11 ; Iteration 950/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.92
Epoch 11 ; Iteration 1000/1587 ; Learning rate 0.2500 ; Source tokens/s 1702 ; Perplexity 132.62
Epoch 11 ; Iteration 1050/1587 ; Learning rate 0.2500 ; Source tokens/s 1699 ; Perplexity 132.45
Epoch 11 ; Iteration 1100/1587 ; Learning rate 0.2500 ; Source tokens/s 1700 ; Perplexity 132.50
Epoch 11 ; Iteration 1150/1587 ; Learning rate 0.2500 ; Source tokens/s 1702 ; Perplexity 132.43
Epoch 11 ; Iteration 1200/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.41
Epoch 11 ; Iteration 1250/1587 ; Learning rate 0.2500 ; Source tokens/s 1706 ; Perplexity 132.32
Epoch 11 ; Iteration 1300/1587 ; Learning rate 0.2500 ; Source tokens/s 1706 ; Perplexity 132.51
Epoch 11 ; Iteration 1350/1587 ; Learning rate 0.2500 ; Source tokens/s 1705 ; Perplexity 132.44
Epoch 11 ; Iteration 1400/1587 ; Learning rate 0.2500 ; Source tokens/s 1706 ; Perplexity 132.35
Epoch 11 ; Iteration 1450/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 132.03
Epoch 11 ; Iteration 1500/1587 ; Learning rate 0.2500 ; Source tokens/s 1704 ; Perplexity 131.91
Epoch 11 ; Iteration 1550/1587 ; Learning rate 0.2500 ; Source tokens/s 1703 ; Perplexity 131.72
Validation perplexity: 122.53690966475
Saving checkpoint to 'models/layer2_epoch11_122.54.t7'...
Epoch 12 ; Iteration 50/1587 ; Learning rate 0.1250 ; Source tokens/s 1696 ; Perplexity 126.63
Epoch 12 ; Iteration 100/1587 ; Learning rate 0.1250 ; Source tokens/s 1733 ; Perplexity 132.78
Epoch 12 ; Iteration 150/1587 ; Learning rate 0.1250 ; Source tokens/s 1719 ; Perplexity 130.82
Epoch 12 ; Iteration 200/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 130.99
Epoch 12 ; Iteration 250/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 131.75
Epoch 12 ; Iteration 300/1587 ; Learning rate 0.1250 ; Source tokens/s 1691 ; Perplexity 130.67
Epoch 12 ; Iteration 350/1587 ; Learning rate 0.1250 ; Source tokens/s 1697 ; Perplexity 129.66
Epoch 12 ; Iteration 400/1587 ; Learning rate 0.1250 ; Source tokens/s 1698 ; Perplexity 129.25
Epoch 12 ; Iteration 450/1587 ; Learning rate 0.1250 ; Source tokens/s 1690 ; Perplexity 127.47
Epoch 12 ; Iteration 500/1587 ; Learning rate 0.1250 ; Source tokens/s 1689 ; Perplexity 127.06
Epoch 12 ; Iteration 550/1587 ; Learning rate 0.1250 ; Source tokens/s 1696 ; Perplexity 127.82
Epoch 12 ; Iteration 600/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 128.56
Epoch 12 ; Iteration 650/1587 ; Learning rate 0.1250 ; Source tokens/s 1703 ; Perplexity 128.26
Epoch 12 ; Iteration 700/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 128.16
Epoch 12 ; Iteration 750/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 128.12
Epoch 12 ; Iteration 800/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 127.95
Epoch 12 ; Iteration 850/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.59
Epoch 12 ; Iteration 900/1587 ; Learning rate 0.1250 ; Source tokens/s 1703 ; Perplexity 127.26
Epoch 12 ; Iteration 950/1587 ; Learning rate 0.1250 ; Source tokens/s 1702 ; Perplexity 126.98
Epoch 12 ; Iteration 1000/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.22
Epoch 12 ; Iteration 1050/1587 ; Learning rate 0.1250 ; Source tokens/s 1699 ; Perplexity 126.84
Epoch 12 ; Iteration 1100/1587 ; Learning rate 0.1250 ; Source tokens/s 1700 ; Perplexity 126.85
Epoch 12 ; Iteration 1150/1587 ; Learning rate 0.1250 ; Source tokens/s 1702 ; Perplexity 127.02
Epoch 12 ; Iteration 1200/1587 ; Learning rate 0.1250 ; Source tokens/s 1704 ; Perplexity 127.18
Epoch 12 ; Iteration 1250/1587 ; Learning rate 0.1250 ; Source tokens/s 1707 ; Perplexity 127.51
Epoch 12 ; Iteration 1300/1587 ; Learning rate 0.1250 ; Source tokens/s 1706 ; Perplexity 127.47
Epoch 12 ; Iteration 1350/1587 ; Learning rate 0.1250 ; Source tokens/s 1708 ; Perplexity 127.63
Epoch 12 ; Iteration 1400/1587 ; Learning rate 0.1250 ; Source tokens/s 1709 ; Perplexity 127.79
Epoch 12 ; Iteration 1450/1587 ; Learning rate 0.1250 ; Source tokens/s 1707 ; Perplexity 127.91
Epoch 12 ; Iteration 1500/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.87
Epoch 12 ; Iteration 1550/1587 ; Learning rate 0.1250 ; Source tokens/s 1705 ; Perplexity 127.82
Validation perplexity: 116.08285402734
Saving checkpoint to 'models/layer2_epoch12_116.08.t7'...
Epoch 13 ; Iteration 50/1587 ; Learning rate 0.0625 ; Source tokens/s 1689 ; Perplexity 124.97
Epoch 13 ; Iteration 100/1587 ; Learning rate 0.0625 ; Source tokens/s 1692 ; Perplexity 128.43
Epoch 13 ; Iteration 150/1587 ; Learning rate 0.0625 ; Source tokens/s 1667 ; Perplexity 127.50
Epoch 13 ; Iteration 200/1587 ; Learning rate 0.0625 ; Source tokens/s 1677 ; Perplexity 127.47
Epoch 13 ; Iteration 250/1587 ; Learning rate 0.0625 ; Source tokens/s 1661 ; Perplexity 125.64
Epoch 13 ; Iteration 300/1587 ; Learning rate 0.0625 ; Source tokens/s 1656 ; Perplexity 125.52
Epoch 13 ; Iteration 350/1587 ; Learning rate 0.0625 ; Source tokens/s 1667 ; Perplexity 126.32
Epoch 13 ; Iteration 400/1587 ; Learning rate 0.0625 ; Source tokens/s 1681 ; Perplexity 126.17
Epoch 13 ; Iteration 450/1587 ; Learning rate 0.0625 ; Source tokens/s 1681 ; Perplexity 125.55
Epoch 13 ; Iteration 500/1587 ; Learning rate 0.0625 ; Source tokens/s 1679 ; Perplexity 125.35
Epoch 13 ; Iteration 550/1587 ; Learning rate 0.0625 ; Source tokens/s 1682 ; Perplexity 125.38
Epoch 13 ; Iteration 600/1587 ; Learning rate 0.0625 ; Source tokens/s 1672 ; Perplexity 124.34
Epoch 13 ; Iteration 650/1587 ; Learning rate 0.0625 ; Source tokens/s 1676 ; Perplexity 124.24
Epoch 13 ; Iteration 700/1587 ; Learning rate 0.0625 ; Source tokens/s 1686 ; Perplexity 124.74
Epoch 13 ; Iteration 750/1587 ; Learning rate 0.0625 ; Source tokens/s 1688 ; Perplexity 124.75
Epoch 13 ; Iteration 800/1587 ; Learning rate 0.0625 ; Source tokens/s 1690 ; Perplexity 125.13
Epoch 13 ; Iteration 850/1587 ; Learning rate 0.0625 ; Source tokens/s 1694 ; Perplexity 125.10
Epoch 13 ; Iteration 900/1587 ; Learning rate 0.0625 ; Source tokens/s 1698 ; Perplexity 125.66
Epoch 13 ; Iteration 950/1587 ; Learning rate 0.0625 ; Source tokens/s 1695 ; Perplexity 125.21
Epoch 13 ; Iteration 1000/1587 ; Learning rate 0.0625 ; Source tokens/s 1695 ; Perplexity 124.89
Epoch 13 ; Iteration 1050/1587 ; Learning rate 0.0625 ; Source tokens/s 1696 ; Perplexity 125.06
Epoch 13 ; Iteration 1100/1587 ; Learning rate 0.0625 ; Source tokens/s 1696 ; Perplexity 125.01
Epoch 13 ; Iteration 1150/1587 ; Learning rate 0.0625 ; Source tokens/s 1699 ; Perplexity 124.86
Epoch 13 ; Iteration 1200/1587 ; Learning rate 0.0625 ; Source tokens/s 1704 ; Perplexity 125.07
Epoch 13 ; Iteration 1250/1587 ; Learning rate 0.0625 ; Source tokens/s 1704 ; Perplexity 124.86
Epoch 13 ; Iteration 1300/1587 ; Learning rate 0.0625 ; Source tokens/s 1703 ; Perplexity 125.09
Epoch 13 ; Iteration 1350/1587 ; Learning rate 0.0625 ; Source tokens/s 1706 ; Perplexity 125.07
Epoch 13 ; Iteration 1400/1587 ; Learning rate 0.0625 ; Source tokens/s 1703 ; Perplexity 124.81
Epoch 13 ; Iteration 1450/1587 ; Learning rate 0.0625 ; Source tokens/s 1702 ; Perplexity 124.65
Epoch 13 ; Iteration 1500/1587 ; Learning rate 0.0625 ; Source tokens/s 1703 ; Perplexity 124.74
Epoch 13 ; Iteration 1550/1587 ; Learning rate 0.0625 ; Source tokens/s 1704 ; Perplexity 124.72
Validation perplexity: 114.32841593845
Saving checkpoint to 'models/layer2_epoch13_114.33.t7'...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment