aokomoriuta/cuda.log

## cuda.log
----------------------------------------------
               Device Info
----------------------------------------------

----------------------------------------------
----------------------------------------------
## Benchmark :: Dense Matrix-Matrix product
----------------------------------------------

   -------------------------------
   # benchmarking single-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------
 - Execution time on device (no setup time included): 0.082591
 - GFLOPs (counting multiply&add as separate operations): 208.011

 ------ Benchmark 2: Matrix-Matrix product using ranges ------
 - Execution time on device (no setup time included): 0.010358
 - GFLOPs (counting multiply&add as separate operations): 207.326

 ------ Benchmark 3: Matrix-Matrix product using slices ------
 - Execution time on device (no setup time included): 0.010391
 - GFLOPs (counting multiply&add as separate operations): 206.668

 ------ Benchmark 4: LU factorization ------
 - Execution time on device (no setup time included): 0.775203
 - GFLOPs (counting multiply&add as separate operations): 22.1618


   -------------------------------
   # benchmarking double-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------
 - Execution time on device (no setup time included): 0.082417
 - GFLOPs (counting multiply&add as separate operations): 208.451

 ------ Benchmark 2: Matrix-Matrix product using ranges ------
 - Execution time on device (no setup time included): 0.010409
 - GFLOPs (counting multiply&add as separate operations): 206.31

 ------ Benchmark 3: Matrix-Matrix product using slices ------
 - Execution time on device (no setup time included): 0.012321
 - GFLOPs (counting multiply&add as separate operations): 174.295

 ------ Benchmark 4: LU factorization ------
 - Execution time on device (no setup time included): 0.823958
 - GFLOPs (counting multiply&add as separate operations): 20.8504


## opencl.log
----------------------------------------------
               Device Info
----------------------------------------------
CL Device Vendor ID: 4318
CL Device Name: GeForce GTX TITAN
CL Driver Version: 319.37
--------------------------------
CL Device Max Compute Units: 14
CL Device Max Work Group Size: 1024
CL Device Global Mem Size: 6441730048
CL Device Local Mem Size: 49152


----------------------------------------------
----------------------------------------------
## Benchmark :: Dense Matrix-Matrix product
----------------------------------------------

   -------------------------------
   # benchmarking single-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.024256
 - GFLOPs (counting multiply&add as separate operations): 708.273

 ------ Benchmark 2: Matrix-Matrix product using ranges ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.003088
 - GFLOPs (counting multiply&add as separate operations): 695.429

 ------ Benchmark 3: Matrix-Matrix product using slices ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.003666
 - GFLOPs (counting multiply&add as separate operations): 585.784

 ------ Benchmark 4: LU factorization ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.775367
 - GFLOPs (counting multiply&add as separate operations): 22.1571


   -------------------------------
   # benchmarking double-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.045137
 - GFLOPs (counting multiply&add as separate operations): 380.616

 ------ Benchmark 2: Matrix-Matrix product using ranges ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.005956
 - GFLOPs (counting multiply&add as separate operations): 360.558

 ------ Benchmark 3: Matrix-Matrix product using slices ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.007147
 - GFLOPs (counting multiply&add as separate operations): 300.473

 ------ Benchmark 4: LU factorization ------
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.825527
 - GFLOPs (counting multiply&add as separate operations): 20.8108
	----------------------------------------------
	Device Info
	----------------------------------------------

	----------------------------------------------
	----------------------------------------------
	## Benchmark :: Dense Matrix-Matrix product
	----------------------------------------------

	-------------------------------
	# benchmarking single-precision
	-------------------------------
	------ Benchmark 1: Matrix-Matrix product ------
	- Execution time on device (no setup time included): 0.082591
	- GFLOPs (counting multiply&add as separate operations): 208.011

	------ Benchmark 2: Matrix-Matrix product using ranges ------
	- Execution time on device (no setup time included): 0.010358
	- GFLOPs (counting multiply&add as separate operations): 207.326

	------ Benchmark 3: Matrix-Matrix product using slices ------
	- Execution time on device (no setup time included): 0.010391
	- GFLOPs (counting multiply&add as separate operations): 206.668

	------ Benchmark 4: LU factorization ------
	- Execution time on device (no setup time included): 0.775203
	- GFLOPs (counting multiply&add as separate operations): 22.1618


	-------------------------------
	# benchmarking double-precision
	-------------------------------
	------ Benchmark 1: Matrix-Matrix product ------
	- Execution time on device (no setup time included): 0.082417
	- GFLOPs (counting multiply&add as separate operations): 208.451

	------ Benchmark 2: Matrix-Matrix product using ranges ------
	- Execution time on device (no setup time included): 0.010409
	- GFLOPs (counting multiply&add as separate operations): 206.31

	------ Benchmark 3: Matrix-Matrix product using slices ------
	- Execution time on device (no setup time included): 0.012321
	- GFLOPs (counting multiply&add as separate operations): 174.295

	------ Benchmark 4: LU factorization ------
	- Execution time on device (no setup time included): 0.823958
	- GFLOPs (counting multiply&add as separate operations): 20.8504