Skip to content

Instantly share code, notes, and snippets.

@ozancaglayan
Created November 13, 2015 15:21
Show Gist options
  • Save ozancaglayan/508b1f4405b851b8f780 to your computer and use it in GitHub Desktop.
Save ozancaglayan/508b1f4405b851b8f780 to your computer and use it in GitHub Desktop.
testarch3 log
Creating a new machine
- initializing projections with random values in the range 0.1
- initializing weights with random values in the range 0.1
Initializing Nvidia GPU card
- found 8 cards:
0: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
1: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
2: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
3: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
4: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
5: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
6: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
7: Tesla K40c with 15 CPUs x 192 threads running at 0.74 Ghz, 11519 MBytes of memory, use -arch=sm_35, utilization 0%
- using device 0
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### GPU allocate local data for 1 GPU
#### GPU 0: use local data_in from MachSplit
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
WARNING: MachSplit::SetGradOut() has no output gradient for the whole machine
#### CUDA set data_in for one GPU
WARNING: MachSplit::GetDataOut() has no output data for the whole machine
WARNING: MachSplit::GetGradOut() has no output gradient for the whole machine
- Sequential machine [4] 32- .. -640096, bs=256, passes=0/0
- Parallel machine 32-12288, bs=256, passes=0/0
- MachTab p-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- MachTab s-[20003]-384, bs=256, passes=0/0, on GPU 0, LookupTable=0x204bc0000
- total number of parameters: 7681152 (29 MBytes)
- MachTanh p-[12288]-1536, bs=256, passes=0/0, on GPU 0
- MachTanh p-[1536]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-640096, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -620093, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-620093, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -600090, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-600090, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -580087, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-580087, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -560084, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-560084, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -540081, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-540081, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -520078, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-520078, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -500075, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-500075, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -480072, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-480072, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -460069, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-460069, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -440066, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-440066, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -420063, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-420063, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -400060, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-400060, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -380057, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-380057, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -360054, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-360054, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -340051, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-340051, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -320048, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-320048, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -300045, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-300045, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -280042, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-280042, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -260039, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-260039, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -240036, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-240036, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -220033, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-220033, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -200030, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-200030, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -180027, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-180027, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -160024, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-160024, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -140021, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-140021, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -120018, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-120018, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -100015, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-100015, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -80012, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-80012, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -60009, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-60009, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -40006, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- Split machine 256-40006, bs=256, passes=0/0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- Sequential machine [2] 256- .. -20003, bs=256, passes=0/0
- MachTanh p-[256]-256, bs=256, passes=0/0, on GPU 0
- MachSoftmaxStable 256-20003, bs=256, passes=0/0, on GPU 0
- total number of parameters: 5206563 (19 MBytes)
- total number of parameters: 10347334 (39 MBytes)
- total number of parameters: 10413126 (39 MBytes)
- total number of parameters: 15553897 (59 MBytes)
- total number of parameters: 15619689 (59 MBytes)
- total number of parameters: 20760460 (79 MBytes)
- total number of parameters: 20826252 (79 MBytes)
- total number of parameters: 25967023 (99 MBytes)
- total number of parameters: 26032815 (99 MBytes)
- total number of parameters: 31173586 (118 MBytes)
- total number of parameters: 31239378 (119 MBytes)
- total number of parameters: 36380149 (138 MBytes)
- total number of parameters: 36445941 (139 MBytes)
- total number of parameters: 41586712 (158 MBytes)
- total number of parameters: 41652504 (158 MBytes)
- total number of parameters: 46793275 (178 MBytes)
- total number of parameters: 46859067 (178 MBytes)
- total number of parameters: 51999838 (198 MBytes)
- total number of parameters: 52065630 (198 MBytes)
- total number of parameters: 57206401 (218 MBytes)
- total number of parameters: 57272193 (218 MBytes)
- total number of parameters: 62412964 (238 MBytes)
- total number of parameters: 62478756 (238 MBytes)
- total number of parameters: 67619527 (257 MBytes)
- total number of parameters: 67685319 (258 MBytes)
- total number of parameters: 72826090 (277 MBytes)
- total number of parameters: 72891882 (278 MBytes)
- total number of parameters: 78032653 (297 MBytes)
- total number of parameters: 78098445 (297 MBytes)
- total number of parameters: 83239216 (317 MBytes)
- total number of parameters: 83305008 (317 MBytes)
- total number of parameters: 88445779 (337 MBytes)
- total number of parameters: 88511571 (337 MBytes)
- total number of parameters: 93652342 (357 MBytes)
- total number of parameters: 93718134 (357 MBytes)
- total number of parameters: 98858905 (377 MBytes)
- total number of parameters: 98924697 (377 MBytes)
- total number of parameters: 104065468 (396 MBytes)
- total number of parameters: 104131260 (397 MBytes)
- total number of parameters: 109272031 (416 MBytes)
- total number of parameters: 109337823 (417 MBytes)
- total number of parameters: 114478594 (436 MBytes)
- total number of parameters: 114544386 (436 MBytes)
- total number of parameters: 119685157 (456 MBytes)
- total number of parameters: 119750949 (456 MBytes)
- total number of parameters: 124891720 (476 MBytes)
- total number of parameters: 124957512 (476 MBytes)
- total number of parameters: 130098283 (496 MBytes)
- total number of parameters: 130164075 (496 MBytes)
- total number of parameters: 135304846 (516 MBytes)
- total number of parameters: 135370638 (516 MBytes)
- total number of parameters: 140511409 (536 MBytes)
- total number of parameters: 140577201 (536 MBytes)
- total number of parameters: 145717972 (555 MBytes)
- total number of parameters: 145783764 (556 MBytes)
- total number of parameters: 150924535 (575 MBytes)
- total number of parameters: 150990327 (575 MBytes)
- total number of parameters: 156131098 (595 MBytes)
- total number of parameters: 156196890 (595 MBytes)
- total number of parameters: 161337661 (615 MBytes)
- total number of parameters: 161403453 (615 MBytes)
- total number of parameters: 166544224 (635 MBytes)
- total number of parameters: 193494752 (738 MBytes)
Opening data description 'train.df'
Prefix for all data files: /lium/trad5a/iwslt/2015/caglayan/en-fr/cstm_data/w32-iwslt-vocab20k
- reading word list from file /lium/trad5a/iwslt/2015/caglayan/en-fr/cstm_data/w32-iwslt-vocab20k/all.32.en.wlist, stable sort w/r frequency, got 20003 words
- reading word list from file /lium/trad5a/iwslt/2015/caglayan/en-fr/cstm_data/w32-iwslt-vocab20k/all.32.fr.wlist, stable sort w/r frequency, got 20003 words
- all.32.en-fr.bph binary phrase pairs with 167264 entries of max length of 32, mode=0
source: vocabulary of 20003 words (bos=2, eos=1, unk=0, empty=-1)
target: vocabulary of 20003 words (bos=2, eos=1, unk=0, empty=-1)
statistics:
source: 167264 10 550 1489 2071 4462 6450 7871 9094 9260 9859 9507 9293 8862 8639 7857 7436 7028 6564 6247 5768 5405 5190 4659 4348 4065 3538 3073 2637 2160 1694 1228 950
target: 167264 86 1353 1140 2231 3707 5533 7016 8075 8479 8729 8843 8858 8368 8191 7664 7226 6822 6603 6183 5703 5461 5010 4905 4581 4328 4014 3754 3561 3195 2863 2552 2230
- 167264 phrase pairs of full length (from header)
Summary of used data: (1 factors)
- all.32.en-fr.bph 1.0000 * 167264 = 167264
- total number of examples: 167264
- dimensions: input=32, output=32
- resampling with seed 12345678
- allocating preload buffer of 0.0 GBytes
- all resampling coefficients are set to one, loading data once
- loading all data into memory ... done (0m0s)
- shuffling data 10 times ... done (0m1s)
Opening data description 'dev.df'
Prefix for all data files: /lium/trad5a/iwslt/2015/caglayan/en-fr/cstm_data/w32-iwslt-vocab20k
- reading word list from file /lium/trad5a/iwslt/2015/caglayan/en-fr/cstm_data/w32-iwslt-vocab20k/all.32.en.wlist, stable sort w/r frequency, got 20003 words
- reading word list from file /lium/trad5a/iwslt/2015/caglayan/en-fr/cstm_data/w32-iwslt-vocab20k/all.32.fr.wlist, stable sort w/r frequency, got 20003 words
- liumdev15.32.en-fr.bph binary phrase pairs with 2882 entries of max length of 32, mode=0
source: vocabulary of 20003 words (bos=2, eos=1, unk=0, empty=-1)
target: vocabulary of 20003 words (bos=2, eos=1, unk=0, empty=-1)
statistics:
source: 2882 0 11 40 29 83 106 138 141 172 156 148 179 128 132 135 123 118 134 105 96 115 76 84 76 64 73 51 61 34 37 25 12
target: 2882 1 39 26 48 60 100 106 134 142 144 149 173 133 127 118 118 117 132 111 127 89 95 87 71 72 65 58 60 50 50 39 41
- 2882 phrase pairs of full length (from header)
Summary of used data: (1 factors)
- liumdev15.32.en-fr.bph 1.0000 * 2882 = 2882
- total number of examples: 2882
- dimensions: input=32, output=32
- resampling with seed 12345678
- allocating preload buffer of 0.0 GBytes
- all resampling coefficients are set to one, loading data once
- loading all data into memory ... done (0m0s)
- WARNING: output dimension of the training data should be 2, found 32
- Let's assume this is the cascade architecture..
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
256, 20003
- this machine can predict up to 32 phrases, each with an output layer of dimension 20003
Starting training on host nv13.clusterparole.univ-lemans.fr pid 18622
- training on train.df
- validation on dev.df
- stopping training at 100 epochs
- learning rate: 3.00e-02, multiplied by 5.00e-01 if the performance deteriorates on the development data
lower bound: 1.000000e-04, stopping after 5 iterations without improvement
- scaling learning rate by sqrt of batch size
- Validation metric is perplexity.
Starting epoch 1 at Fri Nov 13 16:20:59 2015
- initial unscaled lrate=3.0000e-02, wdecay=5.0000e-04
- all data is already loaded into memory
- shuffling data 10 times ... done (0m0s)
MachSplit: machs.size: 2
MachSplit::Backw() this=0x2afeb6f0, the output gradient of machine 1 @0x2afede10 is not set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment