Created
November 6, 2021 17:12
-
-
Save dmonakhov/492e99c89ff43cc12a3b6b32172cb7e9 to your computer and use it in GitHub Desktop.
nvcr.io/nvidia/hpc-benchmarks:21.4-hppl xhpl crash
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
..... | |
Prog= 85.55% N_left= 1188288 Time= 870.74 Time_left= 147.12 iGF= 6253382.31 GF= 7603226.15 iGF_per= 12407.50 GF_per= 15085.77 | |
Prog= 85.94% N_left= 1177344 Time= 876.23 Time_left= 143.33 iGF= 5573961.76 GF= 7590502.86 iGF_per= 11059.45 GF_per= 15060.52 | |
Prog= 86.35% N_left= 1165824 Time= 881.27 Time_left= 139.31 iGF= 6271848.56 GF= 7582957.89 iGF_per= 12444.14 GF_per= 15045.55 | |
Prog= 86.75% N_left= 1154304 Time= 886.36 Time_left= 135.37 iGF= 6095663.79 GF= 7574422.65 iGF_per= 12094.57 GF_per= 15028.62 | |
Prog= 87.12% N_left= 1143360 Time= 891.10 Time_left= 131.69 iGF= 6100617.39 GF= 7566590.79 iGF_per= 12104.40 GF_per= 15013.08 | |
Prog= 88.60% N_left= 1097856 Time= 910.53 Time_left= 117.14 iGF= 5880744.58 GF= 7530604.49 iGF_per= 11668.14 GF_per= 14941.68 | |
Prog= 89.96% N_left= 1052352 Time= 929.69 Time_left= 103.75 iGF= 5491677.90 GF= 7488589.51 iGF_per= 10896.19 GF_per= 14858.31 | |
Prog= 91.19% N_left= 1007424 Time= 948.52 Time_left= 91.61 iGF= 5061996.93 GF= 7440414.22 iGF_per= 10043.64 GF_per= 14762.73 | |
Prog= 92.33% N_left= 961920 Time= 966.67 Time_left= 80.27 iGF= 4862198.60 GF= 7392002.82 iGF_per= 9647.22 GF_per= 14666.67 | |
Prog= 93.36% N_left= 916992 Time= 983.82 Time_left= 70.00 iGF= 4624393.64 GF= 7343750.82 iGF_per= 9175.38 GF_per= 14570.93 | |
Prog= 94.30% N_left= 871488 Time= 1001.26 Time_left= 60.54 iGF= 4174093.56 GF= 7288545.19 iGF_per= 8281.93 GF_per= 14461.40 | |
Prog= 95.15% N_left= 825984 Time= 1017.29 Time_left= 51.90 iGF= 4092061.82 GF= 7238193.34 iGF_per= 8119.17 GF_per= 14361.49 | |
[vla3-7009-hpl-test:65524:0:65651] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x80) | |
==== backtrace (tid: 65651) ==== | |
0 0x0000000000026975 ucs_debug_print_backtrace() /build-result/src/hpcx-v2.8.1-gcc-MLNX_OFED_LINUX-5.1-0.6.6.0-ubuntu16.04-x86_64/ucx-96422ce/src/ucs/debug/debug.c:656 | |
1 0x0000000000011390 __funlockfile() ???:0 | |
2 0x00000000004c4430 __kmpc_omp_task_with_deps() ???:0 | |
3 0x00000000004cba51 __kmpc_omp_task_alloc() ???:0 | |
4 0x00000000004d109d __kmpc_omp_task_alloc() ???:0 | |
5 0x000000000051cab7 __kmp_external___intel_sse2_strspn() ???:0 | |
6 0x00000000005212f4 __kmp_external___intel_sse2_strspn() ???:0 | |
7 0x0000000000498f40 omp_in_parallel() ???:0 | |
8 0x00000000004dba60 __kmp_external___intel_sse2_strspn() ???:0 | |
9 0x00000000000076ba start_thread() ???:0 | |
10 0x000000000010751d clone() ???:0 | |
================================= | |
[vla3-7009-hpl-test:65524] *** Process received signal *** | |
[vla3-7009-hpl-test:65524] Signal: Segmentation fault (11) | |
[vla3-7009-hpl-test:65524] Signal code: (-6) | |
[vla3-7009-hpl-test:65524] Failing at address: 0xfff4 | |
[vla3-7009-hpl-test:65524] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f6d03f8c390] | |
[vla3-7009-hpl-test:65524] [ 1] /opt/hpc-benchmarks/workspace/hpl-linux-x86_64/xhpl[0x4c4430] | |
[vla3-7009-hpl-test:65524] [ 2] /opt/hpc-benchmarks/workspace/hpl-linux-x86_64/xhpl[0x4cba51] | |
[vla3-7009-hpl-test:65524] [ 3] /opt/hpc-benchmarks/workspace/hpl-linux-x86_64/xhpl[0x4d109d] | |
[vla3-7009-hpl-test:65524] [ 4] /opt/hpc-benchmarks/workspace/hpl-linux-x86_64/xhpl[0x51cab7] | |
[vla3-7009-hpl-test:65524] [ 5] /opt/hpc-benchmarks/workspace/hpl-linux-x86_64/xhpl[0x5212f4] | |
[vla3-7009-hpl-test:65524] [ 6] /opt/hpc-benchmarks/workspace/hpl-linux-x86_64/xhpl[0x498f40] | |
[vla3-7009-hpl-test:65524] [ 7] /opt/hpc-benchmarks/workspace/hpl-linux-x86_64/xhpl[0x4dba60] | |
[vla3-7009-hpl-test:65524] [ 8] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f6d03f826ba] | |
[vla3-7009-hpl-test:65524] [ 9] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6d03cb851d] | |
[vla3-7009-hpl-test:65524] *** End of error message *** | |
./hpl.sh: line 321: 65524 Segmentation fault (core dumped) numactl --physcpubind=${CPU} ${MEMBIND} ${XHPL} ${DAT} | |
-------------------------------------------------------------------------- | |
Primary job terminated normally, but 1 process returned | |
a non-zero exit code. Per user-direction, the job has been aborted. | |
-------------------------------------------------------------------------- | |
-------------------------------------------------------------------------- | |
mpirun detected that one or more processes exited with non-zero status, thus causing | |
the job to be terminated. The first process to do so was: | |
Process name: [[8832,1],501] | |
Exit code: 139 | |
Crash caused by opt/intel/compilers_and_libraries_2020.4.304, use nvcr.io/nvidia/hpc-benchmarks:20.10-hpl docker image instead |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment