Skip to content

Instantly share code, notes, and snippets.

@lcw
Last active February 24, 2023 18:50
Show Gist options
  • Save lcw/94909d7d20f3026075f9abcdc4161c61 to your computer and use it in GitHub Desktop.
Save lcw/94909d7d20f3026075f9abcdc4161c61 to your computer and use it in GitHub Desktop.
Julia 1.9 MPI OOM
[176985] signal (15): Terminated
in expression starting at none:0
jl_update_all_fptrs at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:1871 [inlined]
jl_restore_system_image_from_stream_ at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3228
jl_restore_system_image_from_stream at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3359 [inlined]
ijl_restore_system_image_data at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3422
jl_load_sysimg_so at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:498 [inlined]
ijl_restore_system_image at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3395
_finish_julia_init at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/init.c:812
julia_init at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/init.c:799
jl_repl_entrypoint at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:711
main at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
unknown function (ip: (nil))
Allocations: 0 (Pool: 0; Big: 0); GC: 0
[176987] signal (15): Terminated
in expression starting at none:0
[176987] signal (11.1): Segmentation fault
in expression starting at none:0
[176987] signal (6.-6): Aborted
in expression starting at none:0
[176993] signal (15): Terminated
in expression starting at none:0
[176993] signal (11.1): Segmentation fault
in expression starting at none:0
[176993] signal (6.-6): Aborted
in expression starting at none:0
[176995] signal (15): Terminated
in expression starting at none:0
[176995] signal (11.1): Segmentation fault
in expression starting at none:0
[176995] signal (6.-6): Aborted
in expression starting at none:0
[176997] signal (15): Terminated
in expression starting at none:0
[176997] signal (11.1): Segmentation fault
in expression starting at none:0
[176997] signal (6.-6): Aborted
in expression starting at none:0
[176999] signal (15): Terminated
in expression starting at none:0
[176999] signal (11.1): Segmentation fault
in expression starting at none:0
[176999] signal (6.-6): Aborted
in expression starting at none:0
[177002] signal (15): Terminated
in expression starting at none:0
[177002] signal (11.1): Segmentation fault
in expression starting at none:0
[177002] signal (6.-6): Aborted
in expression starting at none:0
[177006] signal (15): Terminated
in expression starting at none:0
[177006] signal (11.1): Segmentation fault
in expression starting at none:0
[177006] signal (6.-6): Aborted
in expression starting at none:0
[177008] signal (15): Terminated
in expression starting at none:0
[177008] signal (11.1): Segmentation fault
in expression starting at none:0
[177008] signal (6.-6): Aborted
in expression starting at none:0
[177011] signal (15): Terminated
in expression starting at none:0
[177011] signal (11.1): Segmentation fault
in expression starting at none:0
[177011] signal (6.-6): Aborted
in expression starting at none:0
[177014] signal (15): Terminated
in expression starting at none:0
[177014] signal (11.1): Segmentation fault
in expression starting at none:0
[177014] signal (6.-6): Aborted
in expression starting at none:0
[177018] signal (15): Terminated
in expression starting at none:0
[177018] signal (11.1): Segmentation fault
in expression starting at none:0
[177018] signal (6.-6): Aborted
in expression starting at none:0
[177020] signal (15): Terminated
in expression starting at none:0
[177020] signal (11.1): Segmentation fault
in expression starting at none:0
[177020] signal (6.-6): Aborted
in expression starting at none:0
[177023] signal (15): Terminated
in expression starting at none:0
[177023] signal (11.1): Segmentation fault
in expression starting at none:0
[177023] signal (6.-6): Aborted
in expression starting at none:0
[177026] signal (15): Terminated
in expression starting at none:0
[177026] signal (11.1): Segmentation fault
in expression starting at none:0
[177026] signal (6.-6): Aborted
in expression starting at none:0
[177029] signal (15): Terminated
in expression starting at none:0
[177029] signal (11.1): Segmentation fault
in expression starting at none:0
[177029] signal (6.-6): Aborted
in expression starting at none:0
[177032] signal (15): Terminated
in expression starting at none:0
[177032] signal (11.1): Segmentation fault
in expression starting at none:0
[177032] signal (6.-6): Aborted
in expression starting at none:0
[177036] signal (15): Terminated
in expression starting at none:0
[177036] signal (11.1): Segmentation fault
in expression starting at none:0
[177036] signal (6.-6): Aborted
in expression starting at none:0
[177039] signal (15): Terminated
in expression starting at none:0
[177039] signal (11.1): Segmentation fault
in expression starting at none:0
[177039] signal (6.-6): Aborted
in expression starting at none:0
[177041] signal (15): Terminated
in expression starting at none:0
[177041] signal (11.1): Segmentation fault
in expression starting at none:0
[177041] signal (6.-6): Aborted
in expression starting at none:0
[177044] signal (15): Terminated
in expression starting at none:0
[177044] signal (11.1): Segmentation fault
in expression starting at none:0
[177044] signal (6.-6): Aborted
in expression starting at none:0
[177047] signal (15): Terminated
in expression starting at none:0
[177047] signal (11.1): Segmentation fault
in expression starting at none:0
[177047] signal (6.-6): Aborted
in expression starting at none:0
[177051] signal (15): Terminated
in expression starting at none:0
[177051] signal (11.1): Segmentation fault
in expression starting at none:0
[177051] signal (6.-6): Aborted
in expression starting at none:0
[177053] signal (15): Terminated
in expression starting at none:0
[177053] signal (11.1): Segmentation fault
in expression starting at none:0
[177053] signal (6.-6): Aborted
in expression starting at none:0
[177057] signal (15): Terminated
in expression starting at none:0
[177057] signal (11.1): Segmentation fault
in expression starting at none:0
[177057] signal (6.-6): Aborted
in expression starting at none:0
[177059] signal (15): Terminated
in expression starting at none:0
[177059] signal (11.1): Segmentation fault
in expression starting at none:0
[177059] signal (6.-6): Aborted
in expression starting at none:0
[177063] signal (15): Terminated
in expression starting at none:0
[177063] signal (11.1): Segmentation fault
in expression starting at none:0
[177063] signal (6.-6): Aborted
in expression starting at none:0
[177066] signal (15): Terminated
in expression starting at none:0
[177066] signal (11.1): Segmentation fault
in expression starting at none:0
[177066] signal (6.-6): Aborted
in expression starting at none:0
[177068] signal (15): Terminated
in expression starting at none:0
[177068] signal (11.1): Segmentation fault
in expression starting at none:0
[177068] signal (6.-6): Aborted
in expression starting at none:0
[197443] signal (15): Terminated
in expression starting at none:0
memset at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_map_object_from_fd at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_map_object at /lib64/ld-linux-x86-64.so.2 (unknown line)
openaux at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_map_object_deps at /lib64/ld-linux-x86-64.so.2 (unknown line)
dl_open_worker at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_open at /lib64/ld-linux-x86-64.so.2 (unknown line)
dlopen_doit at /lib64/libdl.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dlerror_run at /lib64/libdl.so.2 (unknown line)
dlopen at /lib64/libdl.so.2 (unknown line)
ijl_load_dynamic_library at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/dlload.c:348
[197447] signal (15): Terminated
in expression starting at /home/lwilcox/scalable/MPI/hello.jl:1
apply_cl at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:1795
do_trycatch at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:895
apply_cl at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:1804
[197449] signal (15): Terminated
in expression starting at none:0
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:562
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:641
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:641
fl_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:724 [inlined]
fl_load_system_image at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:2456
jl_init_ast_ctx at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/ast.c:221
jl_init_flisp at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/ast.c:278 [inlined]
jl_init_flisp at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/ast.c:273
[197453] signal (15): Terminated
in expression starting at none:0
alloc_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:398 [inlined]
alloc_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:389 [inlined]
vector_grow at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:417 [inlined]
read_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:443
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:670
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
[197460] signal (15): Terminated
in expression starting at none:0
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:562
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:447
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:670
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:641
fl_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:724 [inlined]
fl_load_system_image at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:2456
[197463] signal (15): Terminated
in expression starting at none:0
_int_malloc at /lib64/libc.so.6 (unknown line)
__libc_malloc at /lib64/libc.so.6 (unknown line)
mk_symbol at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:222 [inlined]
symbol at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:262
peek at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:405
peek at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:213 [inlined]
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:575
[197466] signal (15): Terminated
in expression starting at none:0
[197466] signal (11.1): Segmentation fault
in expression starting at none:0
[197466] signal (6.-6): Aborted
in expression starting at none:0
[197470] signal (15): Terminated
in expression starting at none:0
[197470] signal (11.1): Segmentation fault
in expression starting at none:0
[197470] signal (6.-6): Aborted
in expression starting at none:0
[197478] signal (15): Terminated
in expression starting at none:0
[197478] signal (11.1): Segmentation fault
in expression starting at none:0
[197478] signal (6.-6): Aborted
in expression starting at none:0
[197480] signal (15): Terminated
in expression starting at none:0
[197480] signal (11.1): Segmentation fault
in expression starting at none:0
[197480] signal (6.-6): Aborted
in expression starting at none:0
[197485] signal (15): Terminated
in expression starting at none:0
[197485] signal (11.1): Segmentation fault
in expression starting at none:0
[197485] signal (6.-6): Aborted
in expression starting at none:0
[197488] signal (15): Terminated
in expression starting at none:0
[197488] signal (11.1): Segmentation fault
in expression starting at none:0
[197488] signal (6.-6): Aborted
in expression starting at none:0
[197491] signal (15): Terminated
in expression starting at none:0
[197491] signal (11.1): Segmentation fault
in expression starting at none:0
[197491] signal (6.-6): Aborted
in expression starting at none:0
[197494] signal (15): Terminated
in expression starting at none:0
[197494] signal (11.1): Segmentation fault
in expression starting at none:0
[197494] signal (6.-6): Aborted
in expression starting at none:0
[197497] signal (15): Terminated
in expression starting at none:0
[197497] signal (11.1): Segmentation fault
in expression starting at none:0
[197497] signal (6.-6): Aborted
in expression starting at none:0
[197501] signal (15): Terminated
in expression starting at none:0
[197501] signal (11.1): Segmentation fault
in expression starting at none:0
[197501] signal (6.-6): Aborted
in expression starting at none:0
[197504] signal (15): Terminated
in expression starting at none:0
[197504] signal (11.1): Segmentation fault
in expression starting at none:0
[197504] signal (6.-6): Aborted
in expression starting at none:0
[197509] signal (15): Terminated
in expression starting at none:0
[197509] signal (11.1): Segmentation fault
in expression starting at none:0
[197509] signal (6.-6): Aborted
in expression starting at none:0
[197513] signal (15): Terminated
in expression starting at none:0
[197513] signal (11.1): Segmentation fault
in expression starting at none:0
[197513] signal (6.-6): Aborted
in expression starting at none:0
[197515] signal (15): Terminated
in expression starting at none:0
[197515] signal (11.1): Segmentation fault
in expression starting at none:0
[197515] signal (6.-6): Aborted
in expression starting at none:0
[197518] signal (15): Terminated
in expression starting at none:0
[197518] signal (11.1): Segmentation fault
in expression starting at none:0
[197518] signal (6.-6): Aborted
in expression starting at none:0
[197521] signal (15): Terminated
in expression starting at none:0
[197521] signal (11.1): Segmentation fault
in expression starting at none:0
[197521] signal (6.-6): Aborted
in expression starting at none:0
[197524] signal (15): Terminated
in expression starting at none:0
[197524] signal (11.1): Segmentation fault
in expression starting at none:0
[197524] signal (6.-6): Aborted
in expression starting at none:0
using MPI
MPI.Init()
comm = MPI.COMM_WORLD
println("Hello, world, I am rank $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm)).")
#!/bin/bash
#SBATCH --time=0:10:00
#SBATCH --exclusive
#SBATCH --nodes=2 # node count
#SBATCH --ntasks-per-node=32 # number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem=2G
# SBATCH --mem-per-cpu=2G
#SBATCH --output=output-%j.txt
# SBATCH --partition=math
# SBATCH --gres=gpu:1
# SBATCH --nodelist=compute-8-21,compute-8-25
echo ------------------------------------------------------
if [ "${SLURM_NNODES}" -eq "1" ]
then
echo 'CPUS(xNODES): '${SLURM_JOB_CPUS_PER_NODE}'(x1)'
else
echo 'CPUS(xNODES): '${SLURM_JOB_CPUS_PER_NODE}
fi
echo 'Job is running on nodes:'
echo $SLURM_JOB_NODELIST
echo ------------------------------------------------------
echo SLURM: submission node: $SLURM_SUBMIT_HOST
echo SLURM: partition: $SLURM_JOB_PARTITION
echo SLURM: submission directory: $SLURM_SUBMIT_DIR
echo SLURM: job identifier: $SLURM_JOBID
echo SLURM: job name: $SLURM_JOB_NAME
echo SLURM: current home directory: $HOME
echo SLURM: PATH: $PATH
echo ------------------------------------------------------
source /etc/profile
module load sdk/nvidia/22.7
# have the shell echo the commands that are run
set -x
JULIA=/home/lwilcox/local/julia/1.9.0-beta4/bin/julia
export JULIA_NUM_THREADS=1
$JULIA --project -e 'using Pkg; pkg"instantiate"'
$JULIA --project -e 'using Pkg; pkg"precompile"'
mpiexec --mca mpi_cuda_support 0 -output-filename $SLURM_JOB_ID $JULIA hello.jl
#mpiexec --mca mpi_cuda_support 0 -output-filename $SLURM_JOB_ID ./hello
exit 0
------------------------------------------------------
CPUS(xNODES): 128(x2)
Job is running on nodes:
compute-7-[3,5]
------------------------------------------------------
SLURM: submission node: submit-1.hamming.cluster
SLURM: partition: primary
SLURM: submission directory: /home/lwilcox/scalable/MPI
SLURM: job identifier: 45886739
SLURM: job name: mpi-job.sh
SLURM: current home directory: /home/lwilcox
SLURM: PATH: /home/lwilcox/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/tmux-3.1b-qfqajvvt23flztkmtjzrx4wdc3hry5rv/bin:/home/lwilcox/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/environment-modules-4.6.1-zdvbiw6n4iwdghhmstuqqewwl4hjdfsg/bin:/home/lwilcox/.local/bin:/home/lwilcox/local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/lwilcox/.fzf/bin:/home/lwilcox/bin
------------------------------------------------------
+ JULIA=/home/lwilcox/local/julia/1.9.0-beta4/bin/julia
+ export JULIA_NUM_THREADS=1
+ JULIA_NUM_THREADS=1
+ /home/lwilcox/local/julia/1.9.0-beta4/bin/julia --project -e 'using Pkg; pkg"instantiate"'
┌ Warning: The Pkg REPL mode is intended for interactive use only, and should not be used from scripts. It is recommended to use the functional API instead.
└ @ Pkg.REPLMode ~/local/julia/1.9.0-beta4/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:382
+ /home/lwilcox/local/julia/1.9.0-beta4/bin/julia --project -e 'using Pkg; pkg"precompile"'
┌ Warning: The Pkg REPL mode is intended for interactive use only, and should not be used from scripts. It is recommended to use the functional API instead.
└ @ Pkg.REPLMode ~/local/julia/1.9.0-beta4/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:382
+ mpiexec --mca mpi_cuda_support 0 -output-filename 45886739 /home/lwilcox/local/julia/1.9.0-beta4/bin/julia hello.jl
[177068] signal (15): Terminated
in expression starting at none:0
[177068] signal (11.1): Segmentation fault
in expression starting at none:0
[177068] signal (6.-6): Aborted
in expression starting at none:0
[176985] signal (15): Terminated
in expression starting at none:0
jl_update_all_fptrs at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:1871 [inlined]
jl_restore_system_image_from_stream_ at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3228
jl_restore_system_image_from_stream at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3359 [inlined]
ijl_restore_system_image_data at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3422
jl_load_sysimg_so at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:498 [inlined]
ijl_restore_system_image at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/staticdata.c:3395
_finish_julia_init at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/init.c:812
julia_init at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/init.c:799
jl_repl_entrypoint at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:711
main at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
unknown function (ip: (nil))
Allocations: 0 (Pool: 0; Big: 0); GC: 0
[176987] signal (15): Terminated
in expression starting at none:0
[176987] signal (11.1): Segmentation fault
in expression starting at none:0
[176987] signal (6.-6): Aborted
in expression starting at none:0
[176993] signal (15): Terminated
in expression starting at none:0
[176993] signal (11.1): Segmentation fault
in expression starting at none:0
[176993] signal (6.-6): Aborted
in expression starting at none:0
[176995] signal (15): Terminated
in expression starting at none:0
[176995] signal (11.1): Segmentation fault
in expression starting at none:0
[176995] signal (6.-6): Aborted
in expression starting at none:0
[176997] signal (15): Terminated
in expression starting at none:0
[176997] signal (11.1): Segmentation fault
in expression starting at none:0
[176997] signal (6.-6): Aborted
in expression starting at none:0
[176999] signal (15): Terminated
in expression starting at none:0
[176999] signal (11.1): Segmentation fault
in expression starting at none:0
[176999] signal (6.-6): Aborted
in expression starting at none:0
[177002] signal (15): Terminated
in expression starting at none:0
[177002] signal (11.1): Segmentation fault
in expression starting at none:0
[177002] signal (6.-6): Aborted
in expression starting at none:0
[177006] signal (15): Terminated
in expression starting at none:0
[177006] signal (11.1): Segmentation fault
in expression starting at none:0
[177006] signal (6.-6): Aborted
in expression starting at none:0
[177008] signal (15): Terminated
in expression starting at none:0
[177008] signal (11.1): Segmentation fault
in expression starting at none:0
[177008] signal (6.-6): Aborted
in expression starting at none:0
[177011] signal (15): Terminated
in expression starting at none:0
[177011] signal (11.1): Segmentation fault
in expression starting at none:0
[177011] signal (6.-6): Aborted
in expression starting at none:0
[177014] signal (15): Terminated
in expression starting at none:0
[177014] signal (11.1): Segmentation fault
in expression starting at none:0
[177014] signal (6.-6): Aborted
in expression starting at none:0
[177018] signal (15): Terminated
in expression starting at none:0
[177018] signal (11.1): Segmentation fault
in expression starting at none:0
[177018] signal (6.-6): Aborted
in expression starting at none:0
[177020] signal (15): Terminated
in expression starting at none:0
[177020] signal (11.1): Segmentation fault
in expression starting at none:0
[177020] signal (6.-6): Aborted
in expression starting at none:0
[177023] signal (15): Terminated
in expression starting at none:0
[177023] signal (11.1): Segmentation fault
in expression starting at none:0
[177023] signal (6.-6): Aborted
in expression starting at none:0
[177026] signal (15): Terminated
in expression starting at none:0
[177026] signal (11.1): Segmentation fault
in expression starting at none:0
[177026] signal (6.-6): Aborted
in expression starting at none:0
[177029] signal (15): Terminated
in expression starting at none:0
[177029] signal (11.1): Segmentation fault
in expression starting at none:0
[177029] signal (6.-6): Aborted
in expression starting at none:0
[177032] signal (15): Terminated
in expression starting at none:0
[177032] signal (11.1): Segmentation fault
in expression starting at none:0
[177032] signal (6.-6): Aborted
in expression starting at none:0
[177036] signal (15): Terminated
in expression starting at none:0
[177036] signal (11.1): Segmentation fault
in expression starting at none:0
[177036] signal (6.-6): Aborted
in expression starting at none:0
[177039] signal (15): Terminated
in expression starting at none:0
[177039] signal (11.1): Segmentation fault
in expression starting at none:0
[177039] signal (6.-6): Aborted
in expression starting at none:0
[177041] signal (15): Terminated
in expression starting at none:0
[177041] signal (11.1): Segmentation fault
in expression starting at none:0
[177041] signal (6.-6): Aborted
in expression starting at none:0
[177044] signal (15): Terminated
in expression starting at none:0
[177044] signal (11.1): Segmentation fault
in expression starting at none:0
[177044] signal (6.-6): Aborted
in expression starting at none:0
[177047] signal (15): Terminated
in expression starting at none:0
[177047] signal (11.1): Segmentation fault
in expression starting at none:0
[177047] signal (6.-6): Aborted
in expression starting at none:0
[177051] signal (15): Terminated
in expression starting at none:0
[177051] signal (11.1): Segmentation fault
in expression starting at none:0
[177051] signal (6.-6): Aborted
in expression starting at none:0
[177053] signal (15): Terminated
in expression starting at none:0
[177053] signal (11.1): Segmentation fault
in expression starting at none:0
[177053] signal (6.-6): Aborted
in expression starting at none:0
[177057] signal (15): Terminated
in expression starting at none:0
[177057] signal (11.1): Segmentation fault
in expression starting at none:0
[177057] signal (6.-6): Aborted
in expression starting at none:0
[177059] signal (15): Terminated
in expression starting at none:0
[177059] signal (11.1): Segmentation fault
in expression starting at none:0
[177059] signal (6.-6): Aborted
in expression starting at none:0
[177063] signal (15): Terminated
in expression starting at none:0
[177063] signal (11.1): Segmentation fault
in expression starting at none:0
[177063] signal (6.-6): Aborted
in expression starting at none:0
[177066] signal (15): Terminated
in expression starting at none:0
[177066] signal (11.1): Segmentation fault
in expression starting at none:0
[177066] signal (6.-6): Aborted
in expression starting at none:0
[197524] signal (15): Terminated
in expression starting at none:0
[197524] signal (11.1): Segmentation fault
in expression starting at none:0
[197524] signal (6.-6): Aborted
in expression starting at none:0
[197443] signal (15): Terminated
in expression starting at none:0
memset at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_map_object_from_fd at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_map_object at /lib64/ld-linux-x86-64.so.2 (unknown line)
openaux at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_map_object_deps at /lib64/ld-linux-x86-64.so.2 (unknown line)
dl_open_worker at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_open at /lib64/ld-linux-x86-64.so.2 (unknown line)
dlopen_doit at /lib64/libdl.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dlerror_run at /lib64/libdl.so.2 (unknown line)
dlopen at /lib64/libdl.so.2 (unknown line)
ijl_load_dynamic_library at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/dlload.c:348
[197447] signal (15): Terminated
in expression starting at /home/lwilcox/scalable/MPI/hello.jl:1
apply_cl at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:1795
do_trycatch at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:895
apply_cl at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:1804
[197449] signal (15): Terminated
in expression starting at none:0
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:562
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:641
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:641
fl_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:724 [inlined]
fl_load_system_image at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:2456
jl_init_ast_ctx at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/ast.c:221
jl_init_flisp at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/ast.c:278 [inlined]
jl_init_flisp at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/ast.c:273
[197453] signal (15): Terminated
in expression starting at none:0
alloc_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:398 [inlined]
alloc_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:389 [inlined]
vector_grow at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:417 [inlined]
read_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:443
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:670
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
[197460] signal (15): Terminated
in expression starting at none:0
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:562
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_vector at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:447
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:670
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:657
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:572
do_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:641
fl_read_sexpr at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:724 [inlined]
fl_load_system_image at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:2456
[197463] signal (15): Terminated
in expression starting at none:0
_int_malloc at /lib64/libc.so.6 (unknown line)
__libc_malloc at /lib64/libc.so.6 (unknown line)
mk_symbol at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:222 [inlined]
symbol at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/flisp.c:262
peek at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:405
peek at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:213 [inlined]
read_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/flisp/read.c:575
[197466] signal (15): Terminated
in expression starting at none:0
[197466] signal (11.1): Segmentation fault
in expression starting at none:0
[197466] signal (6.-6): Aborted
in expression starting at none:0
[197470] signal (15): Terminated
in expression starting at none:0
[197470] signal (11.1): Segmentation fault
in expression starting at none:0
[197470] signal (6.-6): Aborted
in expression starting at none:0
[197478] signal (15): Terminated
in expression starting at none:0
[197478] signal (11.1): Segmentation fault
in expression starting at none:0
[197478] signal (6.-6): Aborted
in expression starting at none:0
[197480] signal (15): Terminated
in expression starting at none:0
[197480] signal (11.1): Segmentation fault
in expression starting at none:0
[197480] signal (6.-6): Aborted
in expression starting at none:0
[197485] signal (15): Terminated
in expression starting at none:0
[197485] signal (11.1): Segmentation fault
in expression starting at none:0
[197485] signal (6.-6): Aborted
in expression starting at none:0
[197488] signal (15): Terminated
in expression starting at none:0
[197488] signal (11.1): Segmentation fault
in expression starting at none:0
[197488] signal (6.-6): Aborted
in expression starting at none:0
[197491] signal (15): Terminated
in expression starting at none:0
[197491] signal (11.1): Segmentation fault
in expression starting at none:0
[197491] signal (6.-6): Aborted
in expression starting at none:0
[197494] signal (15): Terminated
in expression starting at none:0
[197494] signal (11.1): Segmentation fault
in expression starting at none:0
[197494] signal (6.-6): Aborted
in expression starting at none:0
[197497] signal (15): Terminated
in expression starting at none:0
[197497] signal (11.1): Segmentation fault
in expression starting at none:0
[197497] signal (6.-6): Aborted
in expression starting at none:0
[197501] signal (15): Terminated
in expression starting at none:0
[197501] signal (11.1): Segmentation fault
in expression starting at none:0
[197501] signal (6.-6): Aborted
in expression starting at none:0
[197504] signal (15): Terminated
in expression starting at none:0
[197504] signal (11.1): Segmentation fault
in expression starting at none:0
[197504] signal (6.-6): Aborted
in expression starting at none:0
[197509] signal (15): Terminated
in expression starting at none:0
[197509] signal (11.1): Segmentation fault
in expression starting at none:0
[197509] signal (6.-6): Aborted
in expression starting at none:0
[197513] signal (15): Terminated
in expression starting at none:0
[197513] signal (11.1): Segmentation fault
in expression starting at none:0
[197513] signal (6.-6): Aborted
in expression starting at none:0
[197515] signal (15): Terminated
in expression starting at none:0
[197515] signal (11.1): Segmentation fault
in expression starting at none:0
[197515] signal (6.-6): Aborted
in expression starting at none:0
[197518] signal (15): Terminated
in expression starting at none:0
[197518] signal (11.1): Segmentation fault
in expression starting at none:0
[197518] signal (6.-6): Aborted
in expression starting at none:0
[197521] signal (15): Terminated
in expression starting at none:0
[197521] signal (11.1): Segmentation fault
in expression starting at none:0
[197521] signal (6.-6): Aborted
in expression starting at none:0
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node compute-7-3 exited on signal 9 (Killed).
--------------------------------------------------------------------------
+ exit 0
slurmstepd: error: Detected 200 oom-kill event(s) in StepId=45886739.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment