Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@simonbyrne
Created April 25, 2019 15:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simonbyrne/b6c509f1d1f70835ac183d69b78f98f6 to your computer and use it in GitHub Desktop.
Save simonbyrne/b6c509f1d1f70835ac183d69b78f98f6 to your computer and use it in GitHub Desktop.
$ mpiexec -debug -n 2 julia --project test/test_reduce.jl
[00:7284] host tree:
[00:7284] host: MSEDGEWIN10, parent: 0, id: 1
[00:7284] mpiexec started smpd manager listening on port 4439e991-40a0-4df8-9c51-55de66a74419
[00:7284] create manager process (using mpiexec credentials)
[00:7284] Launching smpd as 'C:\Program Files\Microsoft MPI\Bin\smpd.exe "C:\Program Files\Microsoft MPI\Bin\smpd.exe" -p 8677 -d 11 -mgr 256 "job" -localonly'
[00:7284] smpd reading the port string from the manager
[-1:7112] Launching smpd manager instance.
[-1:7112] created set for manager listener 484
[-1:7112] smpd manager listening on port 189aabd2-52fd-4bc1-b80f-886454b16dc6
[00:7284] closing the pipe to the manager
[00:7284] MSEDGEWIN10 posting a re-connect to MSEDGEWIN10:189aabd2-52fd-4bc1-b80f-886454b16dc6 in left child context.
[-1:7112] version check complete, using PMP version 4.
[-1:7112] Received session header from parent id=1, parent=0, level=0
[01:7112] Connecting back to parent using host MSEDGEWIN10 and endpoint 4439e991-40a0-4df8-9c51-55de66a74419
[00:7284] version check complete, using PMP version 4.
[00:7284] posting command SMPD_COLLECT to left child, src=0, dest=1.
[01:7112] handling command SMPD_COLLECT src=0
[00:7284] Handling cmd=SMPD_COLLECT result
[00:7284] cmd=SMPD_COLLECT result will be handled locally
[00:7284] Finished collecting hardware summary.
[00:7284] posting command SMPD_STARTDBS to left child, src=0, dest=1.
[01:7112] handling command SMPD_STARTDBS src=0
[01:7112] sending start_dbs result command kvs = 14feea71-98fe-4215-9145-53d65207f3cd.
[00:7284] Handling cmd=SMPD_STARTDBS result
[00:7284] cmd=SMPD_STARTDBS result will be handled locally
[00:7284] start_dbs succeeded, kvs_name: '14feea71-98fe-4215-9145-53d65207f3cd', domain_name: '9f6fed5a-0807-4fa3-ad8b-34ab5336517f'
[00:7284] creating a process group of size 2 on node 0 called 14feea71-98fe-4215-9145-53d65207f3cd
[00:7284] launching the processes.
[00:7284] posting command SMPD_LAUNCH to left child, src=0, dest=1.
[01:7112] handling command SMPD_LAUNCH src=0
[01:7112] Successfully handled bcast nodeids command.
[01:7112] setting environment variable: <MPIEXEC_HOSTNAME> = <MSEDGEWIN10>
[01:7112] env: PMI_SIZE=2
[01:7112] env: PMI_KVS=14feea71-98fe-4215-9145-53d65207f3cd
[01:7112] env: PMI_DOMAIN=9f6fed5a-0807-4fa3-ad8b-34ab5336517f
[01:7112] env: PMI_HOST=localhost
[01:7112] env: PMI_PORT=189aabd2-52fd-4bc1-b80f-886454b16dc6
[01:7112] env: PMI_SMPD_ID=1
[01:7112] env: PMI_APPNUM=0
[01:7112] env: PMI_SPAWN=0
[01:7112] env: PMI_NODE_IDS=smp_region_7112
[01:7112] env: PMI_RANK_AFFINITIES=affinity_region_7112
[01:7112] searching for 'julia' in workdir 'C:\Users\IEUser\.julia\dev\MPI'
[01:7112] searching for 'julia' in path ''
[01:7112] searching for 'julia' in system path
[01:7112] C:\Users\IEUser\.julia\dev\MPI>CreateProcess(C:\Users\IEUser\AppData\Local\Julia-1.1.0\bin\julia.exe julia --project test/test_reduce.jl)
[01:7112] env: PMI_RANK=1
[01:7112] env: PMI_SMPD_KEY=0
[01:7112] C:\Users\IEUser\.julia\dev\MPI>CreateProcess(C:\Users\IEUser\AppData\Local\Julia-1.1.0\bin\julia.exe julia --project test/test_reduce.jl)
[01:7112] env: PMI_RANK=0
[01:7112] env: PMI_SMPD_KEY=1
[00:7284] Handling cmd=SMPD_LAUNCH result
[00:7284] cmd=SMPD_LAUNCH result will be handled locally
[00:7284] successfully launched process 1
[00:7284] successfully launched process 0
[00:7284] root process launched, starting stdin redirection.
[01:7112] version check complete, using PMP version 4.
[01:7112] forwarding command SMPD_INIT src=1 ctx_key=1
[01:7112] 1 -> 0 : returning parent_context: 0 < 1
[01:7112] posting command SMPD_INIT to parent, src=1, ctx_key=1, dest=0.
[00:7284] handling command SMPD_INIT src=1 ctx_key=1
[00:7284] init: 0:2:14feea71-98fe-4215-9145-53d65207f3cd
[01:7112] Handling cmd=SMPD_INIT result
[01:7112] forward SMPD_INIT result to dest=1 ctx_key=1
[01:7112] version check complete, using PMP version 4.
[01:7112] forwarding command SMPD_INIT src=1 ctx_key=0
[01:7112] 1 -> 0 : returning parent_context: 0 < 1
[01:7112] posting command SMPD_INIT to parent, src=1, ctx_key=0, dest=0.
[00:7284] handling command SMPD_INIT src=1 ctx_key=0
[00:7284] init: 1:2:14feea71-98fe-4215-9145-53d65207f3cd
[01:7112] Handling cmd=SMPD_INIT result
[01:7112] forward SMPD_INIT result to dest=1 ctx_key=0
[01:7112] handling command SMPD_BCPUT src=1 ctx_key=1
[01:7112] Handling SMPD_BCPUT command from smpd 1
ctx_key=1
rank=0
value=port=0 description="10.0.2.15 MSEDGEWIN10 " shm_host=MSEDGEWIN10 shm_queue=4368:816
result=success
[01:7112] handling command SMPD_BARRIER src=1 ctx_key=1
[01:7112] Handling SMPD_BARRIER src=1 ctx_key=1
[01:7112] initializing barrier(14feea71-98fe-4215-9145-53d65207f3cd): in=1 size=2
[01:7112] incrementing barrier(14feea71-98fe-4215-9145-53d65207f3cd) incount from 0 to 1 out of 2
[01:7112] handling command SMPD_BCPUT src=1 ctx_key=0
[01:7112] Handling SMPD_BCPUT command from smpd 1
ctx_key=0
rank=1
value=port=0 description="10.0.2.15 MSEDGEWIN10 " shm_host=MSEDGEWIN10 shm_queue=6360:816
result=success
[01:7112] handling command SMPD_BARRIER src=1 ctx_key=0
[01:7112] Handling SMPD_BARRIER src=1 ctx_key=0
[01:7112] incrementing barrier(14feea71-98fe-4215-9145-53d65207f3cd) incount from 1 to 2 out of 2
[01:7112] all in barrier, release the barrier.
[01:7112] sending reply to barrier command '14feea71-98fe-4215-9145-53d65207f3cd'.
[01:7112] sending reply to barrier command '14feea71-98fe-4215-9145-53d65207f3cd'.
[01:7112] handling command SMPD_BCGET src=1 ctx_key=0
[01:7112] Handling SMPD_BCGET command from smpd 1
ctx_key=0
rank=0
value=port=0 description="10.0.2.15 MSEDGEWIN10 " shm_host=MSEDGEWIN10 shm_queue=4368:816
result=success
[01:7112] handling command SMPD_BCGET src=1 ctx_key=1
[01:7112] Handling SMPD_BCGET command from smpd 1
ctx_key=1
rank=1
value=port=0 description="10.0.2.15 MSEDGEWIN10 " shm_host=MSEDGEWIN10 shm_queue=6360:816
result=success
[01:7112] reading failed, assuming stdout is closed. error 0xc000014b
[01:7112] process_id=1 process refcount == 2, stdout closed.
[01:7112] reading failed, assuming stderr is closed. error 0xc000014b
[01:7112] process_id=1 process refcount == 1, stderr closed.
[01:7112] process_id=1 process refcount == 0, pmi client closed.
[01:7112] process_id=1 rank=0 refcount=0, waiting for the process to finish exiting.
[01:7112] creating an exit command for process id=1 rank=0, pid=4368, exit code=-1073741819.
[01:7112] posting command SMPD_EXIT to parent, src=1, dest=0.
[00:7284] handling command SMPD_EXIT src=1
[00:7284] saving exit code: rank 0, exitcode -1073741819, pg <14feea71-98fe-4215-9145-53d65207f3cd>
[00:7284] Suspending rank 1, smpd id = 1, ctx_key=0
[00:7284] posting command SMPD_SUSPEND to left child, src=0, dest=1.
[01:7112] handling command SMPD_SUSPEND src=0
[01:7112] suspending proc_id=0 succeeded, sending result to parent context
[01:7112] Handling cmd=SMPD_EXIT result
[01:7112] cmd=SMPD_EXIT result will be handled locally
[00:7284] Handling cmd=SMPD_SUSPEND result
[00:7284] cmd=SMPD_SUSPEND result will be handled locally
[00:7284] suspended rank 0 already exited, no need to kill it.
[00:7284] posting kill command smpd id=1, ctx_key=0
[00:7284] posting command SMPD_KILL to left child, src=0, dest=1.
[01:7112] handling command SMPD_KILL src=0
[01:7112] process_id=0 process refcount == 2, pmi client closed.
[01:7112] process_id=0 rank=1 refcount=2, waiting for the process to finish exiting.
[01:7112] creating an exit command for process id=0 rank=1, pid=6360, exit code=-1.
[01:7112] posting command SMPD_EXIT to parent, src=1, dest=0.
[01:7112] reading failed, assuming stdout is closed. error 0xc000014b
[01:7112] reading failed, assuming stderr is closed. error 0xc000014b
[00:7284] Handling cmd=SMPD_KILL result
[00:7284] cmd=SMPD_KILL result will be handled locally
[01:7112] handling command SMPD_CLOSE from parent
[01:7112] sending 'closed' command to parent context
[01:7112] posting command SMPD_CLOSED to parent, src=1, dest=0.
[00:7284] handling command SMPD_EXIT src=1
job aborted:
[ranks] message
[0] process exited without calling finalize
[1] terminated
---- error analysis -----
[0] on MSEDGEWIN10
julia ended prematurely and may have crashed. exit code 0xc0000005
---- error analysis -----
[00:7284] last process exited, tearing down the job tree num_exited=2 num_procs=2.
[01:7112] Handling cmd=SMPD_EXIT result
[01:7112] cmd=SMPD_EXIT result will be handled locally
[00:7284] handling command SMPD_CLOSED src=1
[00:7284] closed command received from left child.
[01:7112] Handling cmd=SMPD_CLOSED result
[01:7112] cmd=SMPD_CLOSED result will be handled locally
[01:7112] smpd manager successfully stopped listening.
[01:7112] SMPD exiting with error code 0.
[00:7284] closed context with error 1726.
[00:7284] smpd manager successfully stopped listening.
@qisun1995
Copy link

So, do you know how to solve this problem. I am confused.

@simonbyrne
Copy link
Author

See discussion at JuliaParallel/MPI.jl#246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment