Skip to content

Instantly share code, notes, and snippets.

@rachtsingh
Created December 30, 2016 03:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rachtsingh/e97b4be011f4b86c47956848725e8095 to your computer and use it in GitHub Desktop.
Save rachtsingh/e97b4be011f4b86c47956848725e8095 to your computer and use it in GitHub Desktop.
Baseline log
Loading data from 'data/small-train.t7'...
* vocabulary size: source = 50004; target = 50004
* additional features: source = 0; target = 0
* maximum sequence length: source = 50; target = 51
* number of training sentences: 100000
* maximum batch size: 64
Building model...
* using input feeding
Initializing parameters...
* number of parameters: 84814004
Preparing memory optimization...
* sharing 69% of output/gradInput tensors memory between clones
Start training...
Epoch 1 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 732 ; Perplexity 615210.38
Epoch 1 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1069 ; Perplexity 105114.37
Epoch 1 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1227 ; Perplexity 38544.59
Epoch 1 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1371 ; Perplexity 18725.93
Epoch 1 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1455 ; Perplexity 11406.14
Epoch 1 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1514 ; Perplexity 8083.89
Epoch 1 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1543 ; Perplexity 6289.71
Epoch 1 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1562 ; Perplexity 5138.64
Epoch 1 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1579 ; Perplexity 4310.08
Epoch 1 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1615 ; Perplexity 3626.10
Epoch 1 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1632 ; Perplexity 3157.90
Epoch 1 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1653 ; Perplexity 2777.50
Epoch 1 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1659 ; Perplexity 2503.01
Epoch 1 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1670 ; Perplexity 2271.96
Epoch 1 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1680 ; Perplexity 2080.22
Epoch 1 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1693 ; Perplexity 1912.76
Epoch 1 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1696 ; Perplexity 1781.79
Epoch 1 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1703 ; Perplexity 1661.91
Epoch 1 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1700 ; Perplexity 1567.93
Epoch 1 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1712 ; Perplexity 1471.73
Epoch 1 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1715 ; Perplexity 1389.92
Epoch 1 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1723 ; Perplexity 1314.89
Epoch 1 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1726 ; Perplexity 1250.81
Epoch 1 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1729 ; Perplexity 1194.76
Epoch 1 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1731 ; Perplexity 1144.58
Epoch 1 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1740 ; Perplexity 1094.53
Epoch 1 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1743 ; Perplexity 1048.14
Epoch 1 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1745 ; Perplexity 1008.05
Epoch 1 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1751 ; Perplexity 967.57
Epoch 1 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1755 ; Perplexity 932.41
Epoch 1 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1759 ; Perplexity 898.71
Validation perplexity: 275.10064334583
Saving checkpoint to 'models/baseline_epoch1_275.10.t7'...
Epoch 2 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1788 ; Perplexity 270.24
Epoch 2 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1855 ; Perplexity 274.90
Epoch 2 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1824 ; Perplexity 275.08
Epoch 2 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1805 ; Perplexity 271.69
Epoch 2 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1818 ; Perplexity 267.60
Epoch 2 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1822 ; Perplexity 264.24
Epoch 2 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1805 ; Perplexity 260.15
Epoch 2 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1797 ; Perplexity 255.19
Epoch 2 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1801 ; Perplexity 250.83
Epoch 2 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1803 ; Perplexity 246.35
Epoch 2 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1820 ; Perplexity 243.61
Epoch 2 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1820 ; Perplexity 240.35
Epoch 2 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1826 ; Perplexity 237.36
Epoch 2 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1821 ; Perplexity 233.45
Epoch 2 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1821 ; Perplexity 230.06
Epoch 2 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1826 ; Perplexity 226.20
Epoch 2 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1825 ; Perplexity 222.75
Epoch 2 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1836 ; Perplexity 220.01
Epoch 2 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1838 ; Perplexity 216.68
Epoch 2 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1834 ; Perplexity 213.42
Epoch 2 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1835 ; Perplexity 210.49
Epoch 2 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1835 ; Perplexity 207.81
Epoch 2 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1833 ; Perplexity 204.49
Epoch 2 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1835 ; Perplexity 201.47
Epoch 2 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1834 ; Perplexity 198.52
Epoch 2 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1833 ; Perplexity 195.71
Epoch 2 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1838 ; Perplexity 192.95
Epoch 2 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1841 ; Perplexity 190.10
Epoch 2 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1839 ; Perplexity 187.43
Epoch 2 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1840 ; Perplexity 184.93
Epoch 2 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1837 ; Perplexity 182.65
Validation perplexity: 94.961664704637
Saving checkpoint to 'models/baseline_epoch2_94.96.t7'...
Epoch 3 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1821 ; Perplexity 104.79
Epoch 3 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1850 ; Perplexity 105.00
Epoch 3 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1832 ; Perplexity 103.46
Epoch 3 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1847 ; Perplexity 102.35
Epoch 3 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1833 ; Perplexity 101.39
Epoch 3 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1842 ; Perplexity 101.58
Epoch 3 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1841 ; Perplexity 100.87
Epoch 3 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1835 ; Perplexity 100.89
Epoch 3 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1834 ; Perplexity 99.90
Epoch 3 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1836 ; Perplexity 99.40
Epoch 3 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1833 ; Perplexity 98.06
Epoch 3 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1834 ; Perplexity 97.24
Epoch 3 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1831 ; Perplexity 96.49
Epoch 3 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1830 ; Perplexity 95.83
Epoch 3 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1821 ; Perplexity 94.96
Epoch 3 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1826 ; Perplexity 94.36
Epoch 3 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1824 ; Perplexity 93.93
Epoch 3 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1823 ; Perplexity 93.00
Epoch 3 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1822 ; Perplexity 92.36
Epoch 3 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1825 ; Perplexity 91.76
Epoch 3 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1826 ; Perplexity 91.31
Epoch 3 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1826 ; Perplexity 90.59
Epoch 3 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 1827 ; Perplexity 89.91
Epoch 3 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 1827 ; Perplexity 89.37
Epoch 3 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 1828 ; Perplexity 88.79
Epoch 3 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 1835 ; Perplexity 88.23
Epoch 3 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 1833 ; Perplexity 87.68
Epoch 3 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 1831 ; Perplexity 87.09
Epoch 3 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 1834 ; Perplexity 86.62
Epoch 3 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 1837 ; Perplexity 86.11
Epoch 3 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 1837 ; Perplexity 85.62
Validation perplexity: 55.471133808158
Saving checkpoint to 'models/baseline_epoch3_55.47.t7'...
Epoch 4 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 1928 ; Perplexity 60.31
Epoch 4 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 1864 ; Perplexity 59.71
Epoch 4 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 1869 ; Perplexity 60.24
Epoch 4 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 1863 ; Perplexity 59.44
Epoch 4 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 1863 ; Perplexity 59.33
Epoch 4 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 1858 ; Perplexity 58.78
Epoch 4 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 1784 ; Perplexity 58.58
Epoch 4 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 1564 ; Perplexity 58.83
Epoch 4 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 1415 ; Perplexity 58.76
Epoch 4 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 1334 ; Perplexity 58.35
Epoch 4 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 1264 ; Perplexity 58.25
Epoch 4 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 1212 ; Perplexity 58.03
Epoch 4 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 1169 ; Perplexity 57.93
Epoch 4 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 1133 ; Perplexity 57.86
Epoch 4 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 1108 ; Perplexity 57.66
Epoch 4 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 1084 ; Perplexity 57.55
Epoch 4 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 1064 ; Perplexity 57.44
Epoch 4 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 1047 ; Perplexity 57.39
Epoch 4 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 1034 ; Perplexity 57.12
Epoch 4 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 1021 ; Perplexity 57.00
Epoch 4 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 1012 ; Perplexity 56.93
Epoch 4 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 1002 ; Perplexity 56.60
Epoch 4 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 993 ; Perplexity 56.51
Epoch 4 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 985 ; Perplexity 56.18
Epoch 4 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 980 ; Perplexity 56.12
Epoch 4 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 973 ; Perplexity 55.96
Epoch 4 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 967 ; Perplexity 55.83
Epoch 4 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 962 ; Perplexity 55.71
Epoch 4 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 956 ; Perplexity 55.48
Epoch 4 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 953 ; Perplexity 55.32
Epoch 4 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 55.15
Validation perplexity: 43.754144572934
Saving checkpoint to 'models/baseline_epoch4_43.75.t7'...
Epoch 5 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 874 ; Perplexity 42.76
Epoch 5 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 850 ; Perplexity 41.09
Epoch 5 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 824 ; Perplexity 41.11
Epoch 5 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 40.90
Epoch 5 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 41.27
Epoch 5 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 41.87
Epoch 5 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 41.56
Epoch 5 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 41.56
Epoch 5 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 829 ; Perplexity 41.35
Epoch 5 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 41.30
Epoch 5 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 828 ; Perplexity 41.23
Epoch 5 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 830 ; Perplexity 41.34
Epoch 5 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 41.51
Epoch 5 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 41.60
Epoch 5 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 41.63
Epoch 5 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 41.76
Epoch 5 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 834 ; Perplexity 41.86
Epoch 5 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 834 ; Perplexity 41.73
Epoch 5 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 835 ; Perplexity 41.66
Epoch 5 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 835 ; Perplexity 41.62
Epoch 5 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 41.62
Epoch 5 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 41.52
Epoch 5 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 41.48
Epoch 5 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 835 ; Perplexity 41.38
Epoch 5 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 41.30
Epoch 5 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 41.30
Epoch 5 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 41.20
Epoch 5 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 41.12
Epoch 5 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 842 ; Perplexity 41.14
Epoch 5 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 41.12
Epoch 5 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 41.07
Validation perplexity: 35.248375765707
Saving checkpoint to 'models/baseline_epoch5_35.25.t7'...
Epoch 6 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 811 ; Perplexity 32.54
Epoch 6 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 807 ; Perplexity 32.51
Epoch 6 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 818 ; Perplexity 32.59
Epoch 6 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 801 ; Perplexity 32.15
Epoch 6 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 812 ; Perplexity 32.23
Epoch 6 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 818 ; Perplexity 32.35
Epoch 6 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 830 ; Perplexity 32.59
Epoch 6 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 32.73
Epoch 6 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 32.69
Epoch 6 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 829 ; Perplexity 32.74
Epoch 6 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 829 ; Perplexity 32.93
Epoch 6 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 829 ; Perplexity 33.09
Epoch 6 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 830 ; Perplexity 32.98
Epoch 6 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 829 ; Perplexity 32.94
Epoch 6 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 32.84
Epoch 6 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 32.74
Epoch 6 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 32.69
Epoch 6 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 32.63
Epoch 6 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 32.73
Epoch 6 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 32.67
Epoch 6 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 32.71
Epoch 6 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 831 ; Perplexity 32.67
Epoch 6 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 834 ; Perplexity 32.79
Epoch 6 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 834 ; Perplexity 32.82
Epoch 6 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 833 ; Perplexity 32.81
Epoch 6 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 32.84
Epoch 6 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 32.84
Epoch 6 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 833 ; Perplexity 32.88
Epoch 6 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 833 ; Perplexity 32.91
Epoch 6 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 834 ; Perplexity 32.91
Epoch 6 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 835 ; Perplexity 32.89
Validation perplexity: 31.98985843374
Saving checkpoint to 'models/baseline_epoch6_31.99.t7'...
Epoch 7 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 846 ; Perplexity 27.00
Epoch 7 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 864 ; Perplexity 26.83
Epoch 7 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 847 ; Perplexity 26.66
Epoch 7 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 26.58
Epoch 7 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 846 ; Perplexity 26.90
Epoch 7 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 841 ; Perplexity 26.73
Epoch 7 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 844 ; Perplexity 26.69
Epoch 7 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 842 ; Perplexity 26.64
Epoch 7 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 26.70
Epoch 7 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 26.71
Epoch 7 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 26.69
Epoch 7 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 26.82
Epoch 7 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 27.01
Epoch 7 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 834 ; Perplexity 26.99
Epoch 7 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 27.11
Epoch 7 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 834 ; Perplexity 27.15
Epoch 7 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 27.09
Epoch 7 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 27.18
Epoch 7 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 27.17
Epoch 7 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 27.20
Epoch 7 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 27.30
Epoch 7 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 27.31
Epoch 7 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 27.35
Epoch 7 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 27.39
Epoch 7 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 27.40
Epoch 7 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 841 ; Perplexity 27.43
Epoch 7 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 27.42
Epoch 7 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 841 ; Perplexity 27.45
Epoch 7 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 27.48
Epoch 7 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 27.50
Epoch 7 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 27.50
Validation perplexity: 28.895515910504
Saving checkpoint to 'models/baseline_epoch7_28.90.t7'...
Epoch 8 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 823 ; Perplexity 22.72
Epoch 8 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 22.44
Epoch 8 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 863 ; Perplexity 22.41
Epoch 8 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 855 ; Perplexity 22.78
Epoch 8 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 849 ; Perplexity 22.36
Epoch 8 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 853 ; Perplexity 22.67
Epoch 8 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 851 ; Perplexity 22.65
Epoch 8 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 849 ; Perplexity 22.61
Epoch 8 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 847 ; Perplexity 22.70
Epoch 8 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 846 ; Perplexity 22.68
Epoch 8 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 22.76
Epoch 8 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 841 ; Perplexity 22.86
Epoch 8 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 844 ; Perplexity 23.01
Epoch 8 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 844 ; Perplexity 23.06
Epoch 8 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 844 ; Perplexity 23.13
Epoch 8 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 845 ; Perplexity 23.19
Epoch 8 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 23.17
Epoch 8 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 23.15
Epoch 8 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 23.27
Epoch 8 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 23.28
Epoch 8 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 23.37
Epoch 8 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 23.48
Epoch 8 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 23.49
Epoch 8 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 835 ; Perplexity 23.47
Epoch 8 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 23.48
Epoch 8 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 835 ; Perplexity 23.48
Epoch 8 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 23.55
Epoch 8 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 23.55
Epoch 8 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 835 ; Perplexity 23.56
Epoch 8 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 23.62
Epoch 8 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 23.60
Validation perplexity: 27.403117648073
Saving checkpoint to 'models/baseline_epoch8_27.40.t7'...
Epoch 9 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 856 ; Perplexity 19.06
Epoch 9 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 19.52
Epoch 9 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 822 ; Perplexity 19.22
Epoch 9 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 814 ; Perplexity 19.31
Epoch 9 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 19.79
Epoch 9 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 850 ; Perplexity 19.65
Epoch 9 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 842 ; Perplexity 19.66
Epoch 9 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 836 ; Perplexity 19.62
Epoch 9 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 19.76
Epoch 9 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 844 ; Perplexity 19.93
Epoch 9 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 841 ; Perplexity 19.95
Epoch 9 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 19.93
Epoch 9 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 19.93
Epoch 9 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 20.04
Epoch 9 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 20.09
Epoch 9 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 20.21
Epoch 9 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 20.21
Epoch 9 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 841 ; Perplexity 20.33
Epoch 9 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 843 ; Perplexity 20.33
Epoch 9 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 842 ; Perplexity 20.38
Epoch 9 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 841 ; Perplexity 20.37
Epoch 9 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 20.43
Epoch 9 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 840 ; Perplexity 20.47
Epoch 9 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 20.51
Epoch 9 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 20.48
Epoch 9 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 20.48
Epoch 9 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 20.52
Epoch 9 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 20.54
Epoch 9 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 837 ; Perplexity 20.56
Epoch 9 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 838 ; Perplexity 20.59
Epoch 9 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 839 ; Perplexity 20.64
Validation perplexity: 25.825762106436
Saving checkpoint to 'models/baseline_epoch9_25.83.t7'...
Epoch 10 ; Iteration 50/1587 ; Learning rate 0.5000 ; Source tokens/s 818 ; Perplexity 15.92
Epoch 10 ; Iteration 100/1587 ; Learning rate 0.5000 ; Source tokens/s 822 ; Perplexity 15.73
Epoch 10 ; Iteration 150/1587 ; Learning rate 0.5000 ; Source tokens/s 817 ; Perplexity 15.87
Epoch 10 ; Iteration 200/1587 ; Learning rate 0.5000 ; Source tokens/s 823 ; Perplexity 15.53
Epoch 10 ; Iteration 250/1587 ; Learning rate 0.5000 ; Source tokens/s 827 ; Perplexity 15.60
Epoch 10 ; Iteration 300/1587 ; Learning rate 0.5000 ; Source tokens/s 826 ; Perplexity 15.51
Epoch 10 ; Iteration 350/1587 ; Learning rate 0.5000 ; Source tokens/s 829 ; Perplexity 15.45
Epoch 10 ; Iteration 400/1587 ; Learning rate 0.5000 ; Source tokens/s 820 ; Perplexity 15.37
Epoch 10 ; Iteration 450/1587 ; Learning rate 0.5000 ; Source tokens/s 819 ; Perplexity 15.29
Epoch 10 ; Iteration 500/1587 ; Learning rate 0.5000 ; Source tokens/s 817 ; Perplexity 15.23
Epoch 10 ; Iteration 550/1587 ; Learning rate 0.5000 ; Source tokens/s 820 ; Perplexity 15.21
Epoch 10 ; Iteration 600/1587 ; Learning rate 0.5000 ; Source tokens/s 822 ; Perplexity 15.28
Epoch 10 ; Iteration 650/1587 ; Learning rate 0.5000 ; Source tokens/s 822 ; Perplexity 15.29
Epoch 10 ; Iteration 700/1587 ; Learning rate 0.5000 ; Source tokens/s 824 ; Perplexity 15.30
Epoch 10 ; Iteration 750/1587 ; Learning rate 0.5000 ; Source tokens/s 826 ; Perplexity 15.26
Epoch 10 ; Iteration 800/1587 ; Learning rate 0.5000 ; Source tokens/s 827 ; Perplexity 15.31
Epoch 10 ; Iteration 850/1587 ; Learning rate 0.5000 ; Source tokens/s 827 ; Perplexity 15.32
Epoch 10 ; Iteration 900/1587 ; Learning rate 0.5000 ; Source tokens/s 826 ; Perplexity 15.33
Epoch 10 ; Iteration 950/1587 ; Learning rate 0.5000 ; Source tokens/s 826 ; Perplexity 15.35
Epoch 10 ; Iteration 1000/1587 ; Learning rate 0.5000 ; Source tokens/s 828 ; Perplexity 15.41
Epoch 10 ; Iteration 1050/1587 ; Learning rate 0.5000 ; Source tokens/s 832 ; Perplexity 15.45
Epoch 10 ; Iteration 1100/1587 ; Learning rate 0.5000 ; Source tokens/s 832 ; Perplexity 15.48
Epoch 10 ; Iteration 1150/1587 ; Learning rate 0.5000 ; Source tokens/s 833 ; Perplexity 15.43
Epoch 10 ; Iteration 1200/1587 ; Learning rate 0.5000 ; Source tokens/s 832 ; Perplexity 15.42
Epoch 10 ; Iteration 1250/1587 ; Learning rate 0.5000 ; Source tokens/s 832 ; Perplexity 15.41
Epoch 10 ; Iteration 1300/1587 ; Learning rate 0.5000 ; Source tokens/s 833 ; Perplexity 15.44
Epoch 10 ; Iteration 1350/1587 ; Learning rate 0.5000 ; Source tokens/s 836 ; Perplexity 15.46
Epoch 10 ; Iteration 1400/1587 ; Learning rate 0.5000 ; Source tokens/s 839 ; Perplexity 15.49
Epoch 10 ; Iteration 1450/1587 ; Learning rate 0.5000 ; Source tokens/s 840 ; Perplexity 15.48
Epoch 10 ; Iteration 1500/1587 ; Learning rate 0.5000 ; Source tokens/s 841 ; Perplexity 15.49
Epoch 10 ; Iteration 1550/1587 ; Learning rate 0.5000 ; Source tokens/s 840 ; Perplexity 15.50
Validation perplexity: 23.989800570611
Saving checkpoint to 'models/baseline_epoch10_23.99.t7'...
Epoch 11 ; Iteration 50/1587 ; Learning rate 0.2500 ; Source tokens/s 780 ; Perplexity 12.87
Epoch 11 ; Iteration 100/1587 ; Learning rate 0.2500 ; Source tokens/s 831 ; Perplexity 13.25
Epoch 11 ; Iteration 150/1587 ; Learning rate 0.2500 ; Source tokens/s 839 ; Perplexity 13.02
Epoch 11 ; Iteration 200/1587 ; Learning rate 0.2500 ; Source tokens/s 844 ; Perplexity 12.91
Epoch 11 ; Iteration 250/1587 ; Learning rate 0.2500 ; Source tokens/s 847 ; Perplexity 13.01
Epoch 11 ; Iteration 300/1587 ; Learning rate 0.2500 ; Source tokens/s 841 ; Perplexity 12.95
Epoch 11 ; Iteration 350/1587 ; Learning rate 0.2500 ; Source tokens/s 839 ; Perplexity 12.90
Epoch 11 ; Iteration 400/1587 ; Learning rate 0.2500 ; Source tokens/s 836 ; Perplexity 12.89
Epoch 11 ; Iteration 450/1587 ; Learning rate 0.2500 ; Source tokens/s 833 ; Perplexity 12.87
Epoch 11 ; Iteration 500/1587 ; Learning rate 0.2500 ; Source tokens/s 835 ; Perplexity 12.79
Epoch 11 ; Iteration 550/1587 ; Learning rate 0.2500 ; Source tokens/s 834 ; Perplexity 12.77
Epoch 11 ; Iteration 600/1587 ; Learning rate 0.2500 ; Source tokens/s 839 ; Perplexity 12.80
Epoch 11 ; Iteration 650/1587 ; Learning rate 0.2500 ; Source tokens/s 843 ; Perplexity 12.86
Epoch 11 ; Iteration 700/1587 ; Learning rate 0.2500 ; Source tokens/s 842 ; Perplexity 12.89
Epoch 11 ; Iteration 750/1587 ; Learning rate 0.2500 ; Source tokens/s 842 ; Perplexity 12.87
Epoch 11 ; Iteration 800/1587 ; Learning rate 0.2500 ; Source tokens/s 839 ; Perplexity 12.86
Epoch 11 ; Iteration 850/1587 ; Learning rate 0.2500 ; Source tokens/s 841 ; Perplexity 12.89
Epoch 11 ; Iteration 900/1587 ; Learning rate 0.2500 ; Source tokens/s 838 ; Perplexity 12.86
Epoch 11 ; Iteration 950/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.83
Epoch 11 ; Iteration 1000/1587 ; Learning rate 0.2500 ; Source tokens/s 839 ; Perplexity 12.85
Epoch 11 ; Iteration 1050/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.80
Epoch 11 ; Iteration 1100/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.80
Epoch 11 ; Iteration 1150/1587 ; Learning rate 0.2500 ; Source tokens/s 838 ; Perplexity 12.85
Epoch 11 ; Iteration 1200/1587 ; Learning rate 0.2500 ; Source tokens/s 838 ; Perplexity 12.84
Epoch 11 ; Iteration 1250/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.83
Epoch 11 ; Iteration 1300/1587 ; Learning rate 0.2500 ; Source tokens/s 838 ; Perplexity 12.84
Epoch 11 ; Iteration 1350/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.86
Epoch 11 ; Iteration 1400/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.86
Epoch 11 ; Iteration 1450/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.87
Epoch 11 ; Iteration 1500/1587 ; Learning rate 0.2500 ; Source tokens/s 837 ; Perplexity 12.86
Epoch 11 ; Iteration 1550/1587 ; Learning rate 0.2500 ; Source tokens/s 835 ; Perplexity 12.83
Validation perplexity: 22.960085550097
Saving checkpoint to 'models/baseline_epoch11_22.96.t7'...
Epoch 12 ; Iteration 50/1587 ; Learning rate 0.1250 ; Source tokens/s 863 ; Perplexity 12.03
Epoch 12 ; Iteration 100/1587 ; Learning rate 0.1250 ; Source tokens/s 835 ; Perplexity 11.52
Epoch 12 ; Iteration 150/1587 ; Learning rate 0.1250 ; Source tokens/s 847 ; Perplexity 11.65
Epoch 12 ; Iteration 200/1587 ; Learning rate 0.1250 ; Source tokens/s 845 ; Perplexity 11.59
Epoch 12 ; Iteration 250/1587 ; Learning rate 0.1250 ; Source tokens/s 853 ; Perplexity 11.71
Epoch 12 ; Iteration 300/1587 ; Learning rate 0.1250 ; Source tokens/s 852 ; Perplexity 11.88
Epoch 12 ; Iteration 350/1587 ; Learning rate 0.1250 ; Source tokens/s 849 ; Perplexity 11.82
Epoch 12 ; Iteration 400/1587 ; Learning rate 0.1250 ; Source tokens/s 841 ; Perplexity 11.70
Epoch 12 ; Iteration 450/1587 ; Learning rate 0.1250 ; Source tokens/s 838 ; Perplexity 11.64
Epoch 12 ; Iteration 500/1587 ; Learning rate 0.1250 ; Source tokens/s 839 ; Perplexity 11.69
Epoch 12 ; Iteration 550/1587 ; Learning rate 0.1250 ; Source tokens/s 835 ; Perplexity 11.57
Epoch 12 ; Iteration 600/1587 ; Learning rate 0.1250 ; Source tokens/s 836 ; Perplexity 11.58
Epoch 12 ; Iteration 650/1587 ; Learning rate 0.1250 ; Source tokens/s 835 ; Perplexity 11.60
Epoch 12 ; Iteration 700/1587 ; Learning rate 0.1250 ; Source tokens/s 835 ; Perplexity 11.57
Epoch 12 ; Iteration 750/1587 ; Learning rate 0.1250 ; Source tokens/s 836 ; Perplexity 11.62
Epoch 12 ; Iteration 800/1587 ; Learning rate 0.1250 ; Source tokens/s 841 ; Perplexity 11.63
Epoch 12 ; Iteration 850/1587 ; Learning rate 0.1250 ; Source tokens/s 843 ; Perplexity 11.67
Epoch 12 ; Iteration 900/1587 ; Learning rate 0.1250 ; Source tokens/s 841 ; Perplexity 11.65
Epoch 12 ; Iteration 950/1587 ; Learning rate 0.1250 ; Source tokens/s 845 ; Perplexity 11.67
Epoch 12 ; Iteration 1000/1587 ; Learning rate 0.1250 ; Source tokens/s 844 ; Perplexity 11.68
Epoch 12 ; Iteration 1050/1587 ; Learning rate 0.1250 ; Source tokens/s 843 ; Perplexity 11.66
Epoch 12 ; Iteration 1100/1587 ; Learning rate 0.1250 ; Source tokens/s 842 ; Perplexity 11.65
Epoch 12 ; Iteration 1150/1587 ; Learning rate 0.1250 ; Source tokens/s 842 ; Perplexity 11.65
Epoch 12 ; Iteration 1200/1587 ; Learning rate 0.1250 ; Source tokens/s 841 ; Perplexity 11.64
Epoch 12 ; Iteration 1250/1587 ; Learning rate 0.1250 ; Source tokens/s 843 ; Perplexity 11.66
Epoch 12 ; Iteration 1300/1587 ; Learning rate 0.1250 ; Source tokens/s 842 ; Perplexity 11.64
Epoch 12 ; Iteration 1350/1587 ; Learning rate 0.1250 ; Source tokens/s 841 ; Perplexity 11.61
Epoch 12 ; Iteration 1400/1587 ; Learning rate 0.1250 ; Source tokens/s 841 ; Perplexity 11.61
Epoch 12 ; Iteration 1450/1587 ; Learning rate 0.1250 ; Source tokens/s 842 ; Perplexity 11.63
Epoch 12 ; Iteration 1500/1587 ; Learning rate 0.1250 ; Source tokens/s 841 ; Perplexity 11.61
Epoch 12 ; Iteration 1550/1587 ; Learning rate 0.1250 ; Source tokens/s 839 ; Perplexity 11.58
Validation perplexity: 22.833532364382
Saving checkpoint to 'models/baseline_epoch12_22.83.t7'...
Epoch 13 ; Iteration 50/1587 ; Learning rate 0.0625 ; Source tokens/s 815 ; Perplexity 10.74
Epoch 13 ; Iteration 100/1587 ; Learning rate 0.0625 ; Source tokens/s 824 ; Perplexity 11.03
Epoch 13 ; Iteration 150/1587 ; Learning rate 0.0625 ; Source tokens/s 828 ; Perplexity 11.10
Epoch 13 ; Iteration 200/1587 ; Learning rate 0.0625 ; Source tokens/s 821 ; Perplexity 10.92
Epoch 13 ; Iteration 250/1587 ; Learning rate 0.0625 ; Source tokens/s 828 ; Perplexity 10.89
Epoch 13 ; Iteration 300/1587 ; Learning rate 0.0625 ; Source tokens/s 832 ; Perplexity 11.01
Epoch 13 ; Iteration 350/1587 ; Learning rate 0.0625 ; Source tokens/s 830 ; Perplexity 10.93
Epoch 13 ; Iteration 400/1587 ; Learning rate 0.0625 ; Source tokens/s 831 ; Perplexity 10.95
Epoch 13 ; Iteration 450/1587 ; Learning rate 0.0625 ; Source tokens/s 829 ; Perplexity 10.93
Epoch 13 ; Iteration 500/1587 ; Learning rate 0.0625 ; Source tokens/s 829 ; Perplexity 10.94
Epoch 13 ; Iteration 550/1587 ; Learning rate 0.0625 ; Source tokens/s 828 ; Perplexity 10.90
Epoch 13 ; Iteration 600/1587 ; Learning rate 0.0625 ; Source tokens/s 829 ; Perplexity 10.89
Epoch 13 ; Iteration 650/1587 ; Learning rate 0.0625 ; Source tokens/s 843 ; Perplexity 10.89
Epoch 13 ; Iteration 700/1587 ; Learning rate 0.0625 ; Source tokens/s 877 ; Perplexity 10.84
Epoch 13 ; Iteration 750/1587 ; Learning rate 0.0625 ; Source tokens/s 906 ; Perplexity 10.85
Epoch 13 ; Iteration 800/1587 ; Learning rate 0.0625 ; Source tokens/s 939 ; Perplexity 10.89
Epoch 13 ; Iteration 850/1587 ; Learning rate 0.0625 ; Source tokens/s 930 ; Perplexity 10.88
Epoch 13 ; Iteration 900/1587 ; Learning rate 0.0625 ; Source tokens/s 921 ; Perplexity 10.85
Epoch 13 ; Iteration 950/1587 ; Learning rate 0.0625 ; Source tokens/s 916 ; Perplexity 10.82
Epoch 13 ; Iteration 1000/1587 ; Learning rate 0.0625 ; Source tokens/s 910 ; Perplexity 10.85
Epoch 13 ; Iteration 1050/1587 ; Learning rate 0.0625 ; Source tokens/s 904 ; Perplexity 10.82
Epoch 13 ; Iteration 1100/1587 ; Learning rate 0.0625 ; Source tokens/s 899 ; Perplexity 10.81
Epoch 13 ; Iteration 1150/1587 ; Learning rate 0.0625 ; Source tokens/s 897 ; Perplexity 10.83
Epoch 13 ; Iteration 1200/1587 ; Learning rate 0.0625 ; Source tokens/s 896 ; Perplexity 10.89
Epoch 13 ; Iteration 1250/1587 ; Learning rate 0.0625 ; Source tokens/s 894 ; Perplexity 10.93
Epoch 13 ; Iteration 1300/1587 ; Learning rate 0.0625 ; Source tokens/s 891 ; Perplexity 10.91
Epoch 13 ; Iteration 1350/1587 ; Learning rate 0.0625 ; Source tokens/s 890 ; Perplexity 10.95
Epoch 13 ; Iteration 1400/1587 ; Learning rate 0.0625 ; Source tokens/s 888 ; Perplexity 10.96
Epoch 13 ; Iteration 1450/1587 ; Learning rate 0.0625 ; Source tokens/s 886 ; Perplexity 10.95
Epoch 13 ; Iteration 1500/1587 ; Learning rate 0.0625 ; Source tokens/s 884 ; Perplexity 10.95
Epoch 13 ; Iteration 1550/1587 ; Learning rate 0.0625 ; Source tokens/s 883 ; Perplexity 10.94
Validation perplexity: 22.913811798902
Saving checkpoint to 'models/baseline_epoch13_22.91.t7'...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment