Skip to content

Instantly share code, notes, and snippets.

@rachtsingh
Created January 1, 2017 06:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rachtsingh/f742a32c58b92fc4371595e3022b90dd to your computer and use it in GitHub Desktop.
Save rachtsingh/f742a32c58b92fc4371595e3022b90dd to your computer and use it in GitHub Desktop.
Batch norm with no memory optimization on the nn.BatchNormalization modules
Loading data from 'data/small-train.t7'...
* vocabulary size: source = 50004; target = 50004
* additional features: source = 0; target = 0
* maximum sequence length: source = 50; target = 51
* number of training sentences: 100000
* maximum batch size: 64
Building model...
* using input feeding
Initializing parameters...
* number of parameters: 84834004
Preparing memory optimization...
* sharing 58% of output/gradInput tensors memory between clones
Start training...
Epoch 1 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 568 ; Perplexity 32510.26
Epoch 1 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 696 ; Perplexity 17496.41
Epoch 1 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 767 ; Perplexity 11462.83
Epoch 1 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 807 ; Perplexity 8655.52
Epoch 1 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 824 ; Perplexity 6919.21
Epoch 1 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 5663.48
Epoch 1 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 848 ; Perplexity 4643.32
Epoch 1 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 860 ; Perplexity 3965.22
Epoch 1 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 870 ; Perplexity 3407.11
Epoch 1 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 879 ; Perplexity 2967.13
Epoch 1 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 891 ; Perplexity 2591.89
Epoch 1 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 894 ; Perplexity 2351.04
Epoch 1 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 896 ; Perplexity 2141.00
Epoch 1 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 897 ; Perplexity 1960.14
Epoch 1 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 900 ; Perplexity 1807.57
Epoch 1 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 903 ; Perplexity 1673.53
Epoch 1 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 907 ; Perplexity 1548.82
Epoch 1 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 908 ; Perplexity 1451.75
Epoch 1 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 910 ; Perplexity 1359.06
Epoch 1 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 912 ; Perplexity 1277.61
Epoch 1 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 913 ; Perplexity 1206.43
Epoch 1 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 914 ; Perplexity 1142.65
Epoch 1 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 915 ; Perplexity 1083.88
Epoch 1 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 914 ; Perplexity 1031.65
Epoch 1 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 917 ; Perplexity 982.63
Epoch 1 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 918 ; Perplexity 940.26
Epoch 1 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 919 ; Perplexity 900.46
Epoch 1 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 920 ; Perplexity 863.49
Epoch 1 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 922 ; Perplexity 827.18
Epoch 1 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 922 ; Perplexity 796.47
Epoch 1 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 922 ; Perplexity 768.12
Validation perplexity: 180.53028700487
Saving checkpoint to 'models/batch_clones_2_epoch1_180.53.t7'...
Epoch 2 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 967 ; Perplexity 214.64
Epoch 2 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 965 ; Perplexity 213.82
Epoch 2 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 207.01
Epoch 2 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 959 ; Perplexity 202.22
Epoch 2 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 957 ; Perplexity 197.79
Epoch 2 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 194.84
Epoch 2 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 952 ; Perplexity 191.98
Epoch 2 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 955 ; Perplexity 188.35
Epoch 2 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 185.09
Epoch 2 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 181.83
Epoch 2 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 178.44
Epoch 2 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 176.28
Epoch 2 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 173.65
Epoch 2 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 170.90
Epoch 2 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 168.55
Epoch 2 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 165.95
Epoch 2 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 163.67
Epoch 2 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 161.13
Epoch 2 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 159.46
Epoch 2 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 157.83
Epoch 2 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 156.19
Epoch 2 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 941 ; Perplexity 154.66
Epoch 2 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 942 ; Perplexity 153.04
Epoch 2 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 943 ; Perplexity 151.18
Epoch 2 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 942 ; Perplexity 149.61
Epoch 2 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 148.25
Epoch 2 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 146.80
Epoch 2 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 145.17
Epoch 2 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 143.73
Epoch 2 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 142.27
Epoch 2 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 140.86
Validation perplexity: 83.079661384295
Saving checkpoint to 'models/batch_clones_2_epoch2_83.08.t7'...
Epoch 3 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 932 ; Perplexity 87.34
Epoch 3 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 940 ; Perplexity 87.64
Epoch 3 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 933 ; Perplexity 86.48
Epoch 3 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 933 ; Perplexity 86.08
Epoch 3 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 933 ; Perplexity 85.23
Epoch 3 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 931 ; Perplexity 84.49
Epoch 3 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 932 ; Perplexity 83.60
Epoch 3 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 936 ; Perplexity 83.37
Epoch 3 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 938 ; Perplexity 82.81
Epoch 3 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 82.62
Epoch 3 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 82.44
Epoch 3 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 82.01
Epoch 3 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 952 ; Perplexity 81.81
Epoch 3 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 81.63
Epoch 3 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 81.26
Epoch 3 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 81.09
Epoch 3 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 80.58
Epoch 3 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 80.23
Epoch 3 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 80.03
Epoch 3 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 79.71
Epoch 3 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 79.40
Epoch 3 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 79.09
Epoch 3 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 78.87
Epoch 3 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 78.54
Epoch 3 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 78.18
Epoch 3 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 77.78
Epoch 3 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 77.49
Epoch 3 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 77.10
Epoch 3 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 76.77
Epoch 3 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 76.44
Epoch 3 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 76.13
Validation perplexity: 56.82965026429
Saving checkpoint to 'models/batch_clones_2_epoch3_56.83.t7'...
Epoch 4 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 985 ; Perplexity 51.95
Epoch 4 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 978 ; Perplexity 53.01
Epoch 4 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 966 ; Perplexity 53.63
Epoch 4 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 965 ; Perplexity 54.28
Epoch 4 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 957 ; Perplexity 54.04
Epoch 4 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 955 ; Perplexity 54.30
Epoch 4 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 53.95
Epoch 4 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 53.93
Epoch 4 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.90
Epoch 4 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.65
Epoch 4 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.75
Epoch 4 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 53.78
Epoch 4 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 53.89
Epoch 4 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 53.81
Epoch 4 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 53.86
Epoch 4 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.95
Epoch 4 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.69
Epoch 4 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.72
Epoch 4 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.67
Epoch 4 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.49
Epoch 4 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.41
Epoch 4 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.31
Epoch 4 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.29
Epoch 4 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.26
Epoch 4 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 53.23
Epoch 4 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 53.06
Epoch 4 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.04
Epoch 4 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.00
Epoch 4 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 52.89
Epoch 4 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 52.87
Epoch 4 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 52.75
Validation perplexity: 45.867913380411
Saving checkpoint to 'models/batch_clones_2_epoch4_45.87.t7'...
Epoch 5 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 39.07
Epoch 5 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 962 ; Perplexity 38.63
Epoch 5 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 956 ; Perplexity 39.19
Epoch 5 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 955 ; Perplexity 38.71
Epoch 5 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 952 ; Perplexity 39.69
Epoch 5 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 956 ; Perplexity 39.56
Epoch 5 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 39.42
Epoch 5 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 957 ; Perplexity 39.84
Epoch 5 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 958 ; Perplexity 40.14
Epoch 5 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 956 ; Perplexity 40.27
Epoch 5 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 953 ; Perplexity 40.00
Epoch 5 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 953 ; Perplexity 40.17
Epoch 5 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 40.12
Epoch 5 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.01
Epoch 5 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 40.18
Epoch 5 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 40.23
Epoch 5 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 40.28
Epoch 5 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 40.29
Epoch 5 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 40.27
Epoch 5 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.27
Epoch 5 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.20
Epoch 5 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 40.20
Epoch 5 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.11
Epoch 5 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment