-
-
Save kazuki/16911d259c0dd2d1d135debb8d3ff3ce to your computer and use it in GitHub Desktop.
http://qiita.com/telmin_orca/items/2d30323a7c96db929ecf を Xeon E5-1650 v3で
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
processor : 0 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 63 | |
model name : Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz | |
stepping : 2 | |
microcode : 0x38 | |
cpu MHz : 1199.920 | |
cache size : 15360 KB | |
physical id : 0 | |
siblings : 12 | |
core id : 0 | |
cpu cores : 6 | |
apicid : 0 | |
initial apicid : 0 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 15 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb intel_ppin tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts | |
bugs : | |
bogomips : 6997.99 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 46 bits physical, 48 bits virtual | |
power management: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DDR4-2133 ECC DR 16GB x 4枚 | |
BIOS Information | |
Vendor: American Megatrends Inc. | |
Version: P3.20 | |
Release Date: 08/16/2016 | |
BIOS Revision: 5.11 | |
Base Board Information | |
Manufacturer: ASRock | |
Product Name: X99M Extreme4 | |
Memory Device | |
Array Handle: 0x000E | |
Error Information Handle: Not Provided | |
Total Width: 72 bits | |
Data Width: 72 bits | |
Size: 16384 MB | |
Form Factor: RIMM | |
Set: None | |
Locator: DIMM_A1 | |
Bank Locator: NODE 1 | |
Type: <OUT OF SPEC> | |
Type Detail: Synchronous | |
Speed: 2133 MHz | |
Manufacturer: Micron | |
Asset Tag: DIMM_A1_AssetTag | |
Part Number: 36ASF2G72PZ-2G1A2 | |
Rank: 2 | |
Configured Clock Speed: 2133 MHz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ OMP_NUM_THREADS=6 ./fftw | |
3-D FFT: 128 x 128 x 128 | |
On-board: 0.014684 msec, 14.995488 GFLOPS. | |
On-board: 0.009763 msec, 22.554633 GFLOPS. | |
3-D FFT: 256 x 256 x 256 | |
On-board: 0.277897 msec, 7.244636 GFLOPS. | |
On-board: 0.268828 msec, 7.489059 GFLOPS. | |
$ OMP_NUM_THREADS=12 ./fftw | |
3-D FFT: 128 x 128 x 128 | |
On-board: 0.010458 msec, 21.055685 GFLOPS. | |
On-board: 0.008329 msec, 26.439007 GFLOPS. | |
3-D FFT: 256 x 256 x 256 | |
On-board: 0.279447 msec, 7.204465 GFLOPS. | |
On-board: 0.272284 msec, 7.393980 GFLOPS. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
M 10240 N 10240 K 10240 al -1 b 1 | |
Dgemm start | |
memory use 2.34375 GB | |
3:3:16 initialize done. | |
254.995 GFlops, 1.73491 | |
Sgemm start | |
3:3:51 initialize done. | |
511.707 GFlops, 5.02656 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ mpirun -np 1 ./lu2_mpi -n 32768 | |
(略) | |
Error = 5.930995e-08 1.890872e-09 [41/1489] | |
update time= 2.91993e+11 ops/cycle= 80.0177 | |
update matmul time= 2.77436e+11 ops/cycle= 37.9941 | |
update swap+bcast time= 1.27753e+10 ops/cycle= 0.0344883 | |
total time= 6.09222e+11 ops/cycle= 38.5019 | |
rfact time= 3.14754e+11 ops/cycle= 0.00179896 | |
ldmul time= 3.53161e+10 ops/cycle= 58.3752 | |
colum dec with trans time= 1.60102e+11 ops/cycle= 0.00335411 | |
colum dec right time= 8.19711e+10 ops/cycle= 6.95994 | |
colum dec left time= 3.63658e+08 ops/cycle= 0.0919088 | |
rowtocol time= 2.27866e+10 ops/cycle= 0.0235665 | |
column dec in trans time= 1.24084e+11 ops/cycle= 0.0346219 | |
coltorow time= 1.32282e+10 ops/cycle= 0.0405952 | |
dgemm8 time= 5.89311e+09 ops/cycle= 0.728812 | |
dgemm16 time= 5.34285e+09 ops/cycle= 1.60774 | |
dgemm32 time= 5.71966e+09 ops/cycle= 3.00365 | |
dgemm64 time= 6.63841e+09 ops/cycle= 5.1759 | |
dgemm128 time= 5.24282e+09 ops/cycle= 13.1074 | |
main dgemm time= 3.07159e+11 ops/cycle= 69.4599 | |
col trsm time= 8.79853e+09 ops/cycle= 5.64071 | |
col update time= 7.27637e+10 ops/cycle= 0.629603 | |
col r dgemm time= 5.02651e+10 ops/cycle= 21.7888 | |
col right misc time= 2.24946e+10 ops/cycle= 0.0954667 | |
backsub time= 8.72095e+08 ops/cycle= 1.23122 | |
col dec total time= 2.42443e+11 ops/cycle= 0.00233553 | |
DGEMM2k time= 3.39549e+11 ops/cycle= 68.906 | |
DGEMM1k time= 1.12077e+10 ops/cycle= 58.2487 | |
DGEMM512 time= 8.232e+09 ops/cycle= 37.5653 | |
DGEMMrest time= 3.83344e+10 ops/cycle= 7.44828 | |
TRSM U time= 5.40459e+09 ops/cycle= 29.6683 | |
col r swap/scale time= 4.05956e+08 ops/cycle= 0.164665 | |
rfact ex coldec time= 7.22276e+10 ops/cycle= 0 | |
ld_phase1 time= 59263 ops/cycle= 1132.39 | |
rfact misc1 time= 5.00091e+09 ops/cycle= 0 | |
rfact misc2 time= 3.16592e+10 ops/cycle= 0 | |
rfact misc3 time= 2.75225e+09 ops/cycle= 0 | |
rfact misc4 time= 3.28153e+10 ops/cycle= 0 | |
copysubmats time= 1.78958e+10 ops/cycle= 0.128906 | |
============================================================================ [3/1489] | |
T/V N NB P Q Time Gflops | |
---------------------------------------------------------------------------- | |
WR01R2C32 32768 2048 1 1 174.36 1.345e+02 | |
---------------------------------------------------------------------------- | |
============================================================================ | |
T/V N NB P Q Time Gflops | |
---------------------------------------------------------------------------- | |
WR01R2C32 32768 2048 1 1 174.36 1.345e+02 | |
---------------------------------------------------------------------------- | |
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0348843 ...... PASSED | |
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0280407 ...... PASSED | |
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0048949 ...... PASSED | |
============================================================================ | |
Finished 1 tests with the following results: | |
1 tests completed and passed residual checks, | |
0 tests completed and failed residual checks, | |
0 tests skipped because of illegal input values. | |
---------------------------------------------------------------------------- | |
End of Tests. | |
============================================================================ | |
cpusec = 1578.84 wsec=174.364 134.525 Gflops | |
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0348843 ...... PASSED | |
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0280407 ...... PASSED | |
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0048949 ...... PASSED | |
============================================================================ | |
Finished 1 tests with the following results: | |
1 tests completed and passed residual checks, | |
0 tests completed and failed residual checks, | |
0 tests skipped because of illegal input values. | |
---------------------------------------------------------------------------- | |
End of Tests. | |
============================================================================ | |
cpusec = 1578.84 wsec=174.364 134.525 Gflops |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ uname | |
Linux vivace 4.10.0-gentoo #1 SMP Tue Feb 21 00:33:47 JST 2017 x86_64 Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz GenuineIntel GNU/Linux | |
$ gcc --version | |
gcc (Gentoo 6.3.0 p1.0) 6.3.0 | |
Copyright (C) 2016 Free Software Foundation, Inc. | |
This is free software; see the source for copying conditions. There is NO | |
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | |
fftw: 3.3.4 | |
openblas: 0.2.19 | |
openmpi: 2.0.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ./stream_cxx.out --arraysize 409600000 | |
------------------------------------------------------------- | |
STREAM version $Revision : 5.10 $ | |
------------------------------------------------------------- | |
This system uses 8 bytes per array element. | |
------------------------------------------------------------- | |
Array Size = 409600000 (elements), Offset = 0 (elements) | |
Memory per array = 3125 MiB (= 3.05176 GiB). | |
Total Memory required = 9375 MiB (= 9.15527 GiB). | |
Each kernel will be executed 10 times. | |
Function Best Rate MB/s Avg time Min time Max time | |
Copy: 30946.3 0.214962 0.211773 0.228922 | |
Scale: 31006.5 0.213416 0.211362 0.216842 | |
Add: 33919.8 0.290569 0.289813 0.292838 | |
Triad: 33978.6 0.291207 0.289311 0.295740 | |
------------------------------------------------------------- | |
Solution Validates: avg error less than 1.000000e-13 on all three arrays |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment