Skip to content

Instantly share code, notes, and snippets.

@kristianlm
Created November 19, 2012 00:50
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kristianlm/4108390 to your computer and use it in GitHub Desktop.
Save kristianlm/4108390 to your computer and use it in GitHub Desktop.
Running ViennaCL on Amazon GPU cluster (cg1.4xlarge)
[ec2-user@ip-10-33-4-246 grub]$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 3D controller: nVidia Corporation GF100 [Tesla S2050] (rev a3)
00:04.0 3D controller: nVidia Corporation GF100 [Tesla S2050] (rev a3)
00:05.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
[ec2-user@ip-10-33-4-246 benchmarks]$ ./blas3bench
----------------------------------------------
Device Info
----------------------------------------------
CL Device Vendor ID: 4318
CL Device Name: Tesla M2050
CL Driver Version: 304.43
--------------------------------
CL Device Max Compute Units: 14
CL Device Max Work Group Size: 1024
CL Device Global Mem Size: 2817982464
CL Device Local Mem Size: 49152
----------------------------------------------
----------------------------------------------
## Benchmark :: Dense Matrix-Matrix product
----------------------------------------------
-------------------------------
# benchmarking single-precision
-------------------------------
------ Benchmark 1: Matrix-Matrix product ------
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.006588
- GFLOPs (counting multiply&add as one operation): 162.984
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.006632
- GFLOPs (counting multiply&add as one operation): 161.903
------ Benchmark 2: Matrix-Matrix product using ranges ------
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.001577
- GFLOPs (counting multiply&add as one operation): 85.1095
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.001575
- GFLOPs (counting multiply&add as one operation): 85.2176
------ Benchmark 3: Matrix-Matrix product using slices ------
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.001717
- GFLOPs (counting multiply&add as one operation): 78.1699
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.001669
- GFLOPs (counting multiply&add as one operation): 80.4181
-------------------------------
# benchmarking double-precision
-------------------------------
------ Benchmark 1: Matrix-Matrix product ------
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.014737
- GFLOPs (counting multiply&add as one operation): 72.8603
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.014698
- GFLOPs (counting multiply&add as one operation): 73.0536
------ Benchmark 2: Matrix-Matrix product using ranges ------
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.002634
- GFLOPs (counting multiply&add as one operation): 50.9559
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.002553
- GFLOPs (counting multiply&add as one operation): 52.5726
------ Benchmark 3: Matrix-Matrix product using slices ------
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.002977
- GFLOPs (counting multiply&add as one operation): 45.0849
- Device Name: Tesla M2050
- Execution time on device (no setup time included): 0.002863
- GFLOPs (counting multiply&add as one operation): 46.8801
[ec2-user@ip-10-33-4-246 benchmarks]$ ./vectorbench
----------------------------------------------
Device Info
----------------------------------------------
CL Device Vendor ID: 4318
CL Device Name: Tesla M2050
CL Driver Version: 304.43
--------------------------------
CL Device Max Compute Units: 14
CL Device Max Work Group Size: 1024
CL Device Global Mem Size: 2817982464
CL Device Local Mem Size: 49152
----------------------------------------------
----------------------------------------------
## Benchmark :: Vector
----------------------------------------------
-------------------------------
# benchmarking single-precision
-------------------------------
------- Vector inner products ----------
CPU time: 0.252503
CPU GFLOPS: 0.11881
Result:1.58445e+08
GPU time: 0.004479
GPU GFLOPS: 6.69792
Result: 1.58455e+08
------- Vector addition ----------
CPU time: 0.330705
CPU GFLOPS: 0.0907153
GPU time: 0.004666
GPU GFLOPS: 6.42949
------- Vector multiply add ----------
CPU time: 0.250788
CPU GFLOPS: 0.119623
GPU time: 0.00466
GPU GFLOPS: 6.43777
------- Vector complicated expression ----------
CPU time: 0.496616
CPU GFLOPS: 0.181227
GPU time: 0.055736
GPU GFLOPS: 1.61476
-------------------------------
# benchmarking double-precision
-------------------------------
------- Vector inner products ----------
CPU time: 0.25693
CPU GFLOPS: 0.116763
Result:2.01213e+08
GPU time: 0.007718
GPU GFLOPS: 3.88702
Result: 2.01213e+08
------- Vector addition ----------
CPU time: 0.339278
CPU GFLOPS: 0.0884231
GPU time: 0.008573
GPU GFLOPS: 3.49936
------- Vector multiply add ----------
CPU time: 0.256333
CPU GFLOPS: 0.117035
GPU time: 0.008537
GPU GFLOPS: 3.51412
------- Vector complicated expression ----------
CPU time: 0.501489
CPU GFLOPS: 0.179466
GPU time: 0.066808
GPU GFLOPS: 1.34714
[ec2-user@ip-10-33-4-246 sys]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5866.80
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5866.22
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5865.16
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5865.10
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 0
cpu cores : 4
apicid : 16
initial apicid : 16
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5868.28
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 1
cpu cores : 4
apicid : 18
initial apicid : 18
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5846.82
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 2
cpu cores : 4
apicid : 20
initial apicid : 20
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5861.83
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 4
apicid : 22
initial apicid : 22
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5525.50
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 8
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5746.34
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 9
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5866.53
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 10
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5866.61
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5866.53
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 12
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 0
cpu cores : 4
apicid : 17
initial apicid : 17
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5869.50
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 13
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 1
cpu cores : 4
apicid : 19
initial apicid : 19
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5866.48
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 14
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 2
cpu cores : 4
apicid : 21
initial apicid : 21
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5869.71
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
microcode : 0x11
cpu MHz : 2933.403
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 4
apicid : 23
initial apicid : 23
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips : 5866.38
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment