Skip to content

Instantly share code, notes, and snippets.

@moble
Last active August 17, 2021 03:34
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save moble/efa31ddc04a1ed2f931a7320e39c5760 to your computer and use it in GitHub Desktop.
Save moble/efa31ddc04a1ed2f931a7320e39c5760 to your computer and use it in GitHub Desktop.
Speed up execution of `@everywhere` in julia
NOTE: A somewhat streamlined version of this approach can be found in this other gist, which involves only one simple script. You could still use the timing files from this gist if you want to check the details.

As described in detail here, julia can take really excessive amounts of time to execute the first @everywhere statement on many processes — around 1 hour for thousands of processes — even if the actual code being executed everywhere is trivial. Basically, the Distributed functions need to be precompiled to make this happen quickly.

This gist provides a simple way to do so — at least on Slurm clusters (though the same principles should apply elsewhere). The key file is organizer.jl; just submit it as a batch job (adjusting the SBATCH directives as needed), and it should create a sysimage that you can use to run future batch jobs.

The file timing_sys.jl provides an example of how to use the sysimage that gets created. Specifically, note that both the original julia process and all processes created with addprocs use the --sysimage=/path/to/sys_everywhere.so argument. Doing so reduces the time taken to execute the first @everywhere statement by a factor of ~20 for ~100 processes, and possibly more for a greater number of processes.

#!/bin/bash -l
# -*- mode: julia -*-
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=56
#SBATCH --time=00:05:00
#SBATCH --partition=development
#SBATCH --output=%x_%j.log
#SBATCH --job-name=organizer
# Submit this script as a batch job like `sbatch organizer.jl`
#=
# This block will execute in bash. The second line will run the remainder of this file as a julia script,
# but should return and proceed with the other commands, eventually generating `sys_everywhere.so`.
julia -e 'using Pkg; Pkg.add(["Distributed", "ClusterManagers", "Sockets", "Serialization", "Logging", "LinearAlgebra", "REPL")'
julia --trace-compile=precompile01.jl "${BASH_SOURCE[0]}"
echo "using Distributed, ClusterManagers, Sockets, Serialization, Logging, LinearAlgebra, REPL" > precompile02.jl
cat precompile01.jl precompile02.jl >> precompile03.jl
julia precompile.jl
sbatch timing_sys.jl --sysimage sys_everywhere.so
sbatch timing.jl
exec echo "Submitted timing jobs; when they finish run `tail -n 3 timing*.log` to compare times with and without sys"
=#
using Distributed, ClusterManagers
addprocs_slurm(1; partition="development", time="00:04:50", exeflags="--trace-compile=precompile02.jl")
@time @everywhere 1+2
using PackageCompiler
using Distributed, ClusterManagers, Sockets, Serialization, Logging, LinearAlgebra, REPL
create_sysimage(
[:Distributed, :ClusterManagers, :Sockets, :Serialization, :Logging, :LinearAlgebra, :REPL],
sysimage_path="sys_everywhere.so",
precompile_execution_file="precompile03.jl"
)
#!/bin/bash -l
# -*- mode: julia -*-
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=56
#SBATCH --time=00:25:00
#SBATCH --partition=development
#SBATCH --output=%x_%j.log
#SBATCH --job-name=timing
#=
exec julia "${BASH_SOURCE[0]}" "$@"
=#
using Dates
using Distributed, ClusterManagers
addprocs_slurm(100; partition="development", time="00:20:00")
@info "Beginning computation"
starttime = now()
flush(stdout); flush(stderr)
@time @everywhere 1+2
endtime = now()
@info "Finished computation after $(endtime-starttime)"
flush(stdout); flush(stderr)
map(rmprocs, workers())
#!/bin/bash -l
# -*- mode: julia -*-
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=56
#SBATCH --time=00:05:00
#SBATCH --partition=development
#SBATCH --output=%x_%j.log
#SBATCH --job-name=timing_sys
#=
exec julia --sysimage=/path/to/sys_everywhere.so "${BASH_SOURCE[0]}" "$@"
=#
using Dates
using Distributed, ClusterManagers
addprocs_slurm(100; partition="development", time="00:04:30", exeflags="--sysimage=/path/to/sys_everywhere.so")
@info "Beginning computation"
starttime = now()
flush(stdout); flush(stderr)
@time @everywhere 1+2
endtime = now()
@info "Finished computation after $(endtime-starttime)"
flush(stdout); flush(stderr)
map(rmprocs, workers())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment