rachtsingh/batch_norm.log

## batch_norm.log
Loading data from 'data/small-train.t7'...
 * vocabulary size: source = 50004; target = 50004
 * additional features: source = 0; target = 0
 * maximum sequence length: source = 50; target = 51
 * number of training sentences: 100000
 * maximum batch size: 64
Building model...
 * using input feeding
Initializing parameters...
 * number of parameters: 84834004
Preparing memory optimization...
 * sharing 79% of output/gradInput tensors memory between clones
Start training...

Epoch 1 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 568 ; Perplexity 89976.49
Epoch 1 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 696 ; Perplexity 38050.13
Epoch 1 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 767 ; Perplexity 24768.86
Epoch 1 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 808 ; Perplexity 18641.86
Epoch 1 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 825 ; Perplexity 14317.86
Epoch 1 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 833 ; Perplexity 12003.24
Epoch 1 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 849 ; Perplexity 9978.76
Epoch 1 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 861 ; Perplexity 8682.25
Epoch 1 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 872 ; Perplexity 7726.82
Epoch 1 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 881 ; Perplexity 6839.93
Epoch 1 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 893 ; Perplexity 6090.45
Epoch 1 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 895 ; Perplexity 5571.32
Epoch 1 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 898 ; Perplexity 5135.74
Epoch 1 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 899 ; Perplexity 4752.13
Epoch 1 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 902 ; Perplexity 4420.40
Epoch 1 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 905 ; Perplexity 4134.08
Epoch 1 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 910 ; Perplexity 3876.58
Epoch 1 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 911 ; Perplexity 3688.81
Epoch 1 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 912 ; Perplexity 3499.98
Epoch 1 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 915 ; Perplexity 3343.33
Epoch 1 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 916 ; Perplexity 3201.50
Epoch 1 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 917 ; Perplexity 3074.55
Epoch 1 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 918 ; Perplexity 2962.87
Epoch 1 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 918 ; Perplexity 2862.08
Epoch 1 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 920 ; Perplexity 2769.04
Epoch 1 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 921 ; Perplexity 2689.65
Epoch 1 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 923 ; Perplexity 2615.62
Epoch 1 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 924 ; Perplexity 2549.87
Epoch 1 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 926 ; Perplexity 2486.84
Epoch 1 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 925 ; Perplexity 2430.59
Epoch 1 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 925 ; Perplexity 2380.55
Validation perplexity: 950.45464907671
Saving checkpoint to 'models/master2_c_epoch1_950.45.t7'...

Epoch 2 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 968 ; Perplexity 1200.15
Epoch 2 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 967 ; Perplexity 1212.35
Epoch 2 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 1195.37
Epoch 2 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 961 ; Perplexity 1194.98
Epoch 2 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 958 ; Perplexity 1173.84
Epoch 2 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 1159.89
Epoch 2 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 953 ; Perplexity 1146.81
Epoch 2 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 955 ; Perplexity 1137.07
Epoch 2 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 1127.32
Epoch 2 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 1120.15
Epoch 2 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 1112.84
Epoch 2 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 1110.48
Epoch 2 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 1107.41
Epoch 2 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 1103.62
Epoch 2 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 1099.63
Epoch 2 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 1097.62
Epoch 2 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 1096.00
Epoch 2 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 1094.80
Epoch 2 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 1094.88
Epoch 2 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 1094.81
Epoch 2 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 1094.91
Epoch 2 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 941 ; Perplexity 1094.17
Epoch 2 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 942 ; Perplexity 1093.79
Epoch 2 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 942 ; Perplexity 1093.59
Epoch 2 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 942 ; Perplexity 1094.23
Epoch 2 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 943 ; Perplexity 1094.88
Epoch 2 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 1094.58
Epoch 2 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 943 ; Perplexity 1093.94
Epoch 2 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 1094.14
Epoch 2 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 1094.49
Epoch 2 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 1094.91
Validation perplexity: 970.56549385217
Saving checkpoint to 'models/master2_c_epoch2_970.57.t7'...

Epoch 3 ; Iteration 50/1587 ; Learning rate 0.5000 ; Source tokens/s 935 ; Perplexity 1071.35
Epoch 3 ; Iteration 100/1587 ; Learning rate 0.5000 ; Source tokens/s 941 ; Perplexity 1090.61
Epoch 3 ; Iteration 150/1587 ; Learning rate 0.5000 ; Source tokens/s 933 ; Perplexity 1084.42
Epoch 3 ; Iteration 200/1587 ; Learning rate 0.5000 ; Source tokens/s 933 ; Perplexity 1091.91
Epoch 3 ; Iteration 250/1587 ; Learning rate 0.5000 ; Source tokens/s 933 ; Perplexity 1089.04
Epoch 3 ; Iteration 300/1587 ; Learning rate 0.5000 ; Source tokens/s 931 ; Perplexity 1089.90
Epoch 3 ; Iteration 350/1587 ; Learning rate 0.5000 ; Source tokens/s 932 ; Perplexity 1090.33
Epoch 3 ; Iteration 400/1587 ; Learning rate 0.5000 ; Source tokens/s 936 ; Perplexity 1090.96
Epoch 3 ; Iteration 450/1587 ; Learning rate 0.5000 ; Source tokens/s 937 ; Perplexity 1090.25
Epoch 3 ; Iteration 500/1587 ; Learning rate 0.5000 ; Source tokens/s 944 ; Perplexity 1088.60
Epoch 3 ; Iteration 550/1587 ; Learning rate 0.5000 ; Source tokens/s 950 ; Perplexity 1089.68
Epoch 3 ; Iteration 600/1587 ; Learning rate 0.5000 ; Source tokens/s 950 ; Perplexity 1089.66
Epoch 3 ; Iteration 650/1587 ; Learning rate 0.5000 ; Source tokens/s 951 ; Perplexity 1089.48
Epoch 3 ; Iteration 700/1587 ; Learning rate 0.5000 ; Source tokens/s 950 ; Perplexity 1088.60
Epoch 3 ; Iteration 750/1587 ; Learning rate 0.5000 ; Source tokens/s 948 ; Perplexity 1088.10
Epoch 3 ; Iteration 800/1587 ; Learning rate 0.5000 ; Source tokens/s 947 ; Perplexity 1086.87
Epoch 3 ; Iteration 850/1587 ; Learning rate 0.5000 ; Source tokens/s 946 ; Perplexity 1086.54
Epoch 3 ; Iteration 900/1587 ; Learning rate 0.5000 ; Source tokens/s 944 ; Perplexity 1086.92
Epoch 3 ; Iteration 950/1587 ; Learning rate 0.5000 ; Source tokens/s 945 ; Perplexity 1086.27
Epoch 3 ; Iteration 1000/1587 ; Learning rate 0.5000 ; Source tokens/s 946 ; Perplexity 1086.65
Epoch 3 ; Iteration 1050/1587 ; Learning rate 0.5000 ; Source tokens/s 947 ; Perplexity 1087.17
Epoch 3 ; Iteration 1100/1587 ; Learning rate 0.5000 ; Source tokens/s 947 ; Perplexity 1087.47
Epoch 3 ; Iteration 1150/1587 ; Learning rate 0.5000 ; Source tokens/s 947 ; Perplexity 1087.76
Epoch 3 ; Iteration 1200/1587 ; Learning rate 0.5000 ; Source tokens/s 947 ; Perplexity 1088.23
Epoch 3 ; Iteration 1250/1587 ; Learning rate 0.5000 ; Source tokens/s 947 ; Perplexity 1088.20
Epoch 3 ; Iteration 1300/1587 ; Learning rate 0.5000 ; Source tokens/s 946 ; Perplexity 1087.65
Epoch 3 ; Iteration 1350/1587 ; Learning rate 0.5000 ; Source tokens/s 945 ; Perplexity 1088.28
Epoch 3 ; Iteration 1400/1587 ; Learning rate 0.5000 ; Source tokens/s 945 ; Perplexity 1088.59
Epoch 3 ; Iteration 1450/1587 ; Learning rate 0.5000 ; Source tokens/s 946 ; Perplexity 1089.03
Epoch 3 ; Iteration 1500/1587 ; Learning rate 0.5000 ; Source tokens/s 946 ; Perplexity 1089.19
Epoch 3 ; Iteration 1550/1587 ; Learning rate 0.5000 ; Source tokens/s 945 ; Perplexity 1089.26
Validation perplexity: 962.52278471218
Saving checkpoint to 'models/master2_c_epoch3_962.52.t7'...

Epoch 4 ; Iteration 50/1587 ; Learning rate 0.2500 ; Source tokens/s 981 ; Perplexity 1086.34
Epoch 4 ; Iteration 100/1587 ; Learning rate 0.2500 ; Source tokens/s 977 ; Perplexity 1089.36
Epoch 4 ; Iteration 150/1587 ; Learning rate 0.2500 ; Source tokens/s 965 ; Perplexity 1094.90
Epoch 4 ; Iteration 200/1587 ; Learning rate 0.2500 ; Source tokens/s 964 ; Perplexity 1095.74
Epoch 4 ; Iteration 250/1587 ; Learning rate 0.2500 ; Source tokens/s 956 ; Perplexity 1091.06
Epoch 4 ; Iteration 300/1587 ; Learning rate 0.2500 ; Source tokens/s 954 ; Perplexity 1093.44
Epoch 4 ; Iteration 350/1587 ; Learning rate 0.2500 ; Source tokens/s 949 ; Perplexity 1091.02
Epoch 4 ; Iteration 400/1587 ; Learning rate 0.2500 ; Source tokens/s 947 ; Perplexity 1090.95
Epoch 4 ; Iteration 450/1587 ; Learning rate 0.2500 ; Source tokens/s 948 ; Perplexity 1091.66
Epoch 4 ; Iteration 500/1587 ; Learning rate 0.2500 ; Source tokens/s 946 ; Perplexity 1091.39
Epoch 4 ; Iteration 550/1587 ; Learning rate 0.2500 ; Source tokens/s 948 ; Perplexity 1091.22
Epoch 4 ; Iteration 600/1587 ; Learning rate 0.2500 ; Source tokens/s 950 ; Perplexity 1089.93
Epoch 4 ; Iteration 650/1587 ; Learning rate 0.2500 ; Source tokens/s 950 ; Perplexity 1089.40
Epoch 4 ; Iteration 700/1587 ; Learning rate 0.2500 ; Source tokens/s 949 ; Perplexity 1090.39
Epoch 4 ; Iteration 750/1587 ; Learning rate 0.2500 ; Source tokens/s 947 ; Perplexity 1090.05
Epoch 4 ; Iteration 800/1587 ; Learning rate 0.2500 ; Source tokens/s 948 ; Perplexity 1089.72
Epoch 4 ; Iteration 850/1587 ; Learning rate 0.2500 ; Source tokens/s 946 ; Perplexity 1089.31
Epoch 4 ; Iteration 900/1587 ; Learning rate 0.2500 ; Source tokens/s 948 ; Perplexity 1089.51
Epoch 4 ; Iteration 950/1587 ; Learning rate 0.2500 ; Source tokens/s 947 ; Perplexity 1088.93
Epoch 4 ; Iteration 1000/1587 ; Learning rate 0.2500 ; Source tokens/s 946 ; Perplexity 1089.00
Epoch 4 ; Iteration 1050/1587 ; Learning rate 0.2500 ; Source tokens/s 946 ; Perplexity 1089.05
Epoch 4 ; Iteration 1100/1587 ; Learning rate 0.2500 ; Source tokens/s 945 ; Perplexity 1088.49
Epoch 4 ; Iteration 1150/1587 ; Learning rate 0.2500 ; Source tokens/s 945 ; Perplexity 1089.00
Epoch 4 ; Iteration 1200/1587 ; Learning rate 0.2500 ; Source tokens/s 945 ; Perplexity 1089.06
Epoch 4 ; Iteration 1250/1587 ; Learning rate 0.2500 ; Source tokens/s 944 ; Perplexity 1089.66
Epoch 4 ; Iteration 1300/1587 ; Learning rate 0.2500 ; Source tokens/s 944 ; Perplexity 1089.17
Epoch 4 ; Iteration 1350/1587 ; Learning rate 0.2500 ; Source tokens/s 945 ; Perplexity 1089.43
Epoch 4 ; Iteration 1400/1587 ; Learning rate 0.2500 ; Source tokens/s 945 ; Perplexity 1089.70
Epoch 4 ; Iteration 1450/1587 ; Learning rate 0.2500 ; Source tokens/s 946 ; Perplexity 1089.70
Epoch 4 ; Iteration 1500/1587 ; Learning rate 0.2500 ; Source tokens/s 946 ; Perplexity 1089.65
Epoch 4 ; Iteration 1550/1587 ; Learning rate 0.2500 ; Source tokens/s 945 ; Perplexity 1089.34
Validation perplexity: 968.62977799903
Saving checkpoint to 'models/master2_c_epoch4_968.63.t7'...

Epoch 5 ; Iteration 50/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1056.67
Epoch 5 ; Iteration 100/1587 ; Learning rate 0.1250 ; Source tokens/s 960 ; Perplexity 1073.13
Epoch 5 ; Iteration 150/1587 ; Learning rate 0.1250 ; Source tokens/s 956 ; Perplexity 1073.70
Epoch 5 ; Iteration 200/1587 ; Learning rate 0.1250 ; Source tokens/s 954 ; Perplexity 1073.78
Epoch 5 ; Iteration 250/1587 ; Learning rate 0.1250 ; Source tokens/s 951 ; Perplexity 1082.66
Epoch 5 ; Iteration 300/1587 ; Learning rate 0.1250 ; Source tokens/s 955 ; Perplexity 1085.42
Epoch 5 ; Iteration 350/1587 ; Learning rate 0.1250 ; Source tokens/s 950 ; Perplexity 1083.44
Epoch 5 ; Iteration 400/1587 ; Learning rate 0.1250 ; Source tokens/s 956 ; Perplexity 1082.75
Epoch 5 ; Iteration 450/1587 ; Learning rate 0.1250 ; Source tokens/s 957 ; Perplexity 1086.22
Epoch 5 ; Iteration 500/1587 ; Learning rate 0.1250 ; Source tokens/s 955 ; Perplexity 1085.63
Epoch 5 ; Iteration 550/1587 ; Learning rate 0.1250 ; Source tokens/s 953 ; Perplexity 1084.17
Epoch 5 ; Iteration 600/1587 ; Learning rate 0.1250 ; Source tokens/s 953 ; Perplexity 1085.38
Epoch 5 ; Iteration 650/1587 ; Learning rate 0.1250 ; Source tokens/s 949 ; Perplexity 1084.69
Epoch 5 ; Iteration 700/1587 ; Learning rate 0.1250 ; Source tokens/s 947 ; Perplexity 1085.55
Epoch 5 ; Iteration 750/1587 ; Learning rate 0.1250 ; Source tokens/s 945 ; Perplexity 1086.10
Epoch 5 ; Iteration 800/1587 ; Learning rate 0.1250 ; Source tokens/s 947 ; Perplexity 1087.42
Epoch 5 ; Iteration 850/1587 ; Learning rate 0.1250 ; Source tokens/s 950 ; Perplexity 1088.29
Epoch 5 ; Iteration 900/1587 ; Learning rate 0.1250 ; Source tokens/s 948 ; Perplexity 1089.14
Epoch 5 ; Iteration 950/1587 ; Learning rate 0.1250 ; Source tokens/s 947 ; Perplexity 1088.31
Epoch 5 ; Iteration 1000/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1087.61
Epoch 5 ; Iteration 1050/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1087.26
Epoch 5 ; Iteration 1100/1587 ; Learning rate 0.1250 ; Source tokens/s 947 ; Perplexity 1087.29
Epoch 5 ; Iteration 1150/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1086.35
Epoch 5 ; Iteration 1200/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1087.10
Epoch 5 ; Iteration 1250/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1086.96
Epoch 5 ; Iteration 1300/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1087.26
Epoch 5 ; Iteration 1350/1587 ; Learning rate 0.1250 ; Source tokens/s 945 ; Perplexity 1087.63
Epoch 5 ; Iteration 1400/1587 ; Learning rate 0.1250 ; Source tokens/s 944 ; Perplexity 1088.35
Epoch 5 ; Iteration 1450/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1088.85
Epoch 5 ; Iteration 1500/1587 ; Learning rate 0.1250 ; Source tokens/s 946 ; Perplexity 1088.42
Epoch 5 ; Iteration 1550/1587 ; Learning rate 0.1250 ; Source tokens/s 945 ; Perplexity 1088.10
Validation perplexity: 965.09351712684
Saving checkpoint to 'models/master2_c_epoch5_965.09.t7'...

Epoch 6 ; Iteration 50/1587 ; Learning rate 0.0625 ; Source tokens/s 971 ; Perplexity 1072.91
Epoch 6 ; Iteration 100/1587 ; Learning rate 0.0625 ; Source tokens/s 971 ; Perplexity 1087.93
Epoch 6 ; Iteration 150/1587 ; Learning rate 0.0625 ; Source tokens/s 974 ; Perplexity 1089.82
Epoch 6 ; Iteration 200/1587 ; Learning rate 0.0625 ; Source tokens/s 964 ; Perplexity 1082.23
Epoch 6 ; Iteration 250/1587 ; Learning rate 0.0625 ; Source tokens/s 966 ; Perplexity 1083.00
Epoch 6 ; Iteration 300/1587 ; Learning rate 0.0625 ; Source tokens/s 973 ; Perplexity 1088.59
Epoch 6 ; Iteration 350/1587 ; Learning rate 0.0625 ; Source tokens/s 966 ; Perplexity 1088.21
Epoch 6 ; Iteration 400/1587 ; Learning rate 0.0625 ; Source tokens/s 962 ; Perplexity 1089.78
Epoch 6 ; Iteration 450/1587 ; Learning rate 0.0625 ; Source tokens/s 962 ; Perplexity 1091.51
Epoch 6 ; Iteration 500/1587 ; Learning rate 0.0625 ; Source tokens/s 962 ; Perplexity 1090.06
Epoch 6 ; Iteration 550/1587 ; Learning rate 0.0625 ; Source tokens/s 959 ; Perplexity 1092.91
Epoch 6 ; Iteration 600/1587 ; Learning rate 0.0625 ; Source tokens/s 958 ; Perplexity 1094.88
Epoch 6 ; Iteration 650/1587 ; Learning rate 0.0625 ; Source tokens/s 958 ; Perplexity 1095.66
Epoch 6 ; Iteration 700/1587 ; Learning rate 0.0625 ; Source tokens/s 957 ; Perplexity 1095.19
Epoch 6 ; Iteration 750/1587 ; Learning rate 0.0625 ; Source tokens/s 958 ; Perplexity 1093.79
Epoch 6 ; Iteration 800/1587 ; Learning rate 0.0625 ; Source tokens/s 956 ; Perplexity 1093.30
Epoch 6 ; Iteration 850/1587 ; Learning rate 0.0625 ; Source tokens/s 950 ; Perplexity 1091.97
Epoch 6 ; Iteration 900/1587 ; Learning rate 0.0625 ; Source tokens/s 949 ; Perplexity 1090.74
Epoch 6 ; Iteration 950/1587 ; Learning rate 0.0625 ; Source tokens/s 947 ; Perplexity 1091.07
Epoch 6 ; Iteration 1000/1587 ; Learning rate 0.0625 ; Source tokens/s 947 ; Perplexity 1091.13
Epoch 6 ; Iteration 1050/1587 ; Learning rate 0.0625 ; Source tokens/s 946 ; Perplexity 1090.87
Epoch 6 ; Iteration 1100/1587 ; Learning rate 0.0625 ; Source tokens/s 947 ; Perplexity 1090.72
Epoch 6 ; Iteration 1150/1587 ; Learning rate 0.0625 ; Source tokens/s 946 ; Perplexity 1091.13
Epoch 6 ; Iteration 1200/1587 ; Learning rate 0.0625 ; Source tokens/s 946 ; Perplexity 1091.02
Epoch 6 ; Iteration 1250/1587 ; Learning rate 0.0625 ; Source tokens/s 947 ; Perplexity 1090.97
Epoch 6 ; Iteration 1300/1587 ; Learning rate 0.0625 ; Source tokens/s 945 ; Perplexity 1090.11
Epoch 6 ; Iteration 1350/1587 ; Learning rate 0.0625 ; Source tokens/s 945 ; Perplexity 1090.08
Epoch 6 ; Iteration 1400/1587 ; Learning rate 0.0625 ; Source tokens/s 945 ; Perplexity 1089.14
Epoch 6 ; Iteration 1450/1587 ; Learning rate 0.0625 ; Source tokens/s 945 ; Perplexity 1089.28
Epoch 6 ; Iteration 1500/1587 ; Learning rate 0.0625 ; Source tokens/s 946 ; Perplexity 1089.73
Epoch 6 ; Iteration 1550/1587 ; Learning rate 0.0625 ; Source tokens/s 945 ; Perplexity 1088.52
Validation perplexity: 966.980620381
Saving checkpoint to 'models/master2_c_epoch6_966.98.t7'...

Epoch 7 ; Iteration 50/1587 ; Learning rate 0.0312 ; Source tokens/s 950 ; Perplexity 1084.67
Epoch 7 ; Iteration 100/1587 ; Learning rate 0.0312 ; Source tokens/s 951 ; Perplexity 1083.14
Epoch 7 ; Iteration 150/1587 ; Learning rate 0.0312 ; Source tokens/s 936 ; Perplexity 1079.40
Epoch 7 ; Iteration 200/1587 ; Learning rate 0.0312 ; Source tokens/s 927 ; Perplexity 1077.76
Epoch 7 ; Iteration 250/1587 ; Learning rate 0.0312 ; Source tokens/s 926 ; Perplexity 1078.60
Epoch 7 ; Iteration 300/1587 ; Learning rate 0.0312 ; Source tokens/s 930 ; Perplexity 1078.65
Epoch 7 ; Iteration 350/1587 ; Learning rate 0.0312 ; Source tokens/s 934 ; Perplexity 1079.10
Epoch 7 ; Iteration 400/1587 ; Learning rate 0.0312 ; Source tokens/s 928 ; Perplexity 1077.53
Epoch 7 ; Iteration 450/1587 ; Learning rate 0.0312 ; Source tokens/s 931 ; Perplexity 1077.56
Epoch 7 ; Iteration 500/1587 ; Learning rate 0.0312 ; Source tokens/s 934 ; Perplexity 1077.31
Epoch 7 ; Iteration 550/1587 ; Learning rate 0.0312 ; Source tokens/s 933 ; Perplexity 1078.69
Epoch 7 ; Iteration 600/1587 ; Learning rate 0.0312 ; Source tokens/s 937 ; Perplexity 1079.58
Epoch 7 ; Iteration 650/1587 ; Learning rate 0.0312 ; Source tokens/s 938 ; Perplexity 1080.93
Epoch 7 ; Iteration 700/1587 ; Learning rate 0.0312 ; Source tokens/s 941 ; Perplexity 1081.42
Epoch 7 ; Iteration 750/1587 ; Learning rate 0.0312 ; Source tokens/s 940 ; Perplexity 1082.54
Epoch 7 ; Iteration 800/1587 ; Learning rate 0.0312 ; Source tokens/s 941 ; Perplexity 1082.83
Epoch 7 ; Iteration 850/1587 ; Learning rate 0.0312 ; Source tokens/s 940 ; Perplexity 1083.91
Epoch 7 ; Iteration 900/1587 ; Learning rate 0.0312 ; Source tokens/s 942 ; Perplexity 1084.72
Epoch 7 ; Iteration 950/1587 ; Learning rate 0.0312 ; Source tokens/s 940 ; Perplexity 1084.26
Epoch 7 ; Iteration 1000/1587 ; Learning rate 0.0312 ; Source tokens/s 939 ; Perplexity 1084.69
Epoch 7 ; Iteration 1050/1587 ; Learning rate 0.0312 ; Source tokens/s 940 ; Perplexity 1084.43
Epoch 7 ; Iteration 1100/1587 ; Learning rate 0.0312 ; Source tokens/s 940 ; Perplexity 1085.22
Epoch 7 ; Iteration 1150/1587 ; Learning rate 0.0312 ; Source tokens/s 940 ; Perplexity 1086.08
Epoch 7 ; Iteration 1200/1587 ; Learning rate 0.0312 ; Source tokens/s 941 ; Perplexity 1087.03
Epoch 7 ; Iteration 1250/1587 ; Learning rate 0.0312 ; Source tokens/s 941 ; Perplexity 1087.32
Epoch 7 ; Iteration 1300/1587 ; Learning rate 0.0312 ; Source tokens/s 941 ; Perplexity 1087.39
Epoch 7 ; Iteration 1350/1587 ; Learning rate 0.0312 ; Source tokens/s 942 ; Perplexity 1087.68
Epoch 7 ; Iteration 1400/1587 ; Learning rate 0.0312 ; Source tokens/s 943 ; Perplexity 1088.23
Epoch 7 ; Iteration 1450/1587 ; Learning rate 0.0312 ; Source tokens/s 943 ; Perplexity 1088.46
Epoch 7 ; Iteration 1500/1587 ; Learning rate 0.0312 ; Source tokens/s 942 ; Perplexity 1087.71
Epoch 7 ; Iteration 1550/1587 ; Learning rate 0.0312 ; Source tokens/s 944 ; Perplexity 1088.10
Validation perplexity: 967.23047079327
Saving checkpoint to 'models/master2_c_epoch7_967.23.t7'...

Epoch 8 ; Iteration 50/1587 ; Learning rate 0.0156 ; Source tokens/s 962 ; Perplexity 1076.28
Epoch 8 ; Iteration 100/1587 ; Learning rate 0.0156 ; Source tokens/s 973 ; Perplexity 1075.29
Epoch 8 ; Iteration 150/1587 ; Learning rate 0.0156 ; Source tokens/s 963 ; Perplexity 1083.62
Epoch 8 ; Iteration 200/1587 ; Learning rate 0.0156 ; Source tokens/s 958 ; Perplexity 1083.32
Epoch 8 ; Iteration 250/1587 ; Learning rate 0.0156 ; Source tokens/s 961 ; Perplexity 1087.49
Epoch 8 ; Iteration 300/1587 ; Learning rate 0.0156 ; Source tokens/s 963 ; Perplexity 1090.44
Epoch 8 ; Iteration 350/1587 ; Learning rate 0.0156 ; Source tokens/s 953 ; Perplexity 1086.70
Epoch 8 ; Iteration 400/1587 ; Learning rate 0.0156 ; Source tokens/s 954 ; Perplexity 1087.24
Epoch 8 ; Iteration 450/1587 ; Learning rate 0.0156 ; Source tokens/s 949 ; Perplexity 1085.94
Epoch 8 ; Iteration 500/1587 ; Learning rate 0.0156 ; Source tokens/s 951 ; Perplexity 1085.05
Epoch 8 ; Iteration 550/1587 ; Learning rate 0.0156 ; Source tokens/s 950 ; Perplexity 1085.28
Epoch 8 ; Iteration 600/1587 ; Learning rate 0.0156 ; Source tokens/s 949 ; Perplexity 1082.99
Epoch 8 ; Iteration 650/1587 ; Learning rate 0.0156 ; Source tokens/s 943 ; Perplexity 1082.97
Epoch 8 ; Iteration 700/1587 ; Learning rate 0.0156 ; Source tokens/s 941 ; Perplexity 1082.53
Epoch 8 ; Iteration 750/1587 ; Learning rate 0.0156 ; Source tokens/s 940 ; Perplexity 1081.47
Epoch 8 ; Iteration 800/1587 ; Learning rate 0.0156 ; Source tokens/s 938 ; Perplexity 1081.44
Epoch 8 ; Iteration 850/1587 ; Learning rate 0.0156 ; Source tokens/s 937 ; Perplexity 1081.63
Epoch 8 ; Iteration 900/1587 ; Learning rate 0.0156 ; Source tokens/s 937 ; Perplexity 1082.12
Epoch 8 ; Iteration 950/1587 ; Learning rate 0.0156 ; Source tokens/s 939 ; Perplexity 1083.17
Epoch 8 ; Iteration 1000/1587 ; Learning rate 0.0156 ; Source tokens/s 939 ; Perplexity 1083.11
Epoch 8 ; Iteration 1050/1587 ; Learning rate 0.0156 ; Source tokens/s 939 ; Perplexity 1083.51