Skip to content

Instantly share code, notes, and snippets.

@sdwfrost
Last active July 2, 2022 14:15
Show Gist options
  • Save sdwfrost/7c660322c6c33961297a826df4cbc30d to your computer and use it in GitHub Desktop.
Save sdwfrost/7c660322c6c33961297a826df4cbc30d to your computer and use it in GitHub Desktop.
Comparing simple simulations in Julia, Nim, C++ and R

This gist compares the performance of Julia, Nim, C++ and R - the latter using either POMP, or LibBi in a simple simulation of an SIR epidemiological model. In addition to keeping track of susceptibles, infecteds and recovereds, I also store the cumulative number of infections. Time moves in discrete steps, and the algorithm avoids language-specific syntax features to make the comparison as fair as possible, including using the same algorithm for generating binomial random numbers and the same random number generator; the exception are the R versions, POMP uses the standard R Mersenne Twister for the random number generator; I'm not sure what LibBi uses. The algorithm for generating random binomial numbers is only really suitable for small np.

Benchmarks were run on a Mac Pro (Late 2013), with 3 Ghz 8-core Intel Xeon E3, 64GB 1866 Mhz RAM, running OSX v 10.11.3 (El Capitan), using Julia v0.6.1, Nim v0.17.3, clang v5.0.0, emscripten v1.37.34, node v8.9.1, and R v3.3.2.

Nim version

Native

Compile using:

nim c -d:release --passC="-flto -ffast-math" sir

Run:

./sir

Output:

0.038819
20.75

The first number is time taken in s, the second the result of averaging the cumulative number of infections (Y in the code).

Native, optimised version

Based on the comments below, I wrote an optimised version, sir_opt.nim.

nim c -d:release --passC="-flto -ffast-math" sir_opt
./sir_opt

Output:

0.031404
20.964

Webassembly version

Nim can also compile to Webassembly via Emscripten; using nim.cfg linked below:

nim c -d:release -d:wasm -o:sir_opt.js  sir_opt
node ./sir_opt.js

Output:

0.052
20.964

Julia version

Run:

julia sir.jl

Output:

0.058961 seconds (2.09 k allocations: 232.998 KiB)
21.019

There is also a modified version that uses tuples (and hence is also a bit neater), adds in a couple of macros, and uses an algorithm to work out random binomials that is faster for small np. Thanks to @DNF2 and @ChrisRackauckas (see below) for tips here.

julia sir_opt.jl
0.045802 seconds (87 allocations: 14.123 KiB)
21.028

C++ version

Native build

Compilation:

g++ -std=c++11 -O3 -o sir_cpp sir.cpp

Running:

./sir_cpp

Output:

0.050181
20.708

Webassembly

To compile the C++ code into Webassembly, I use Emscripten.

Compilation:

em++ -s WASM=1 -s SINGLE_FILE=1 -std=c++11 -O3 -o sir_cpp.js sir.cpp

Running:

node sir_cpp.js

Output:

0.07
20.708

This C++ code uses code from random_real.c, copyright Taylor G. Campbell (2014) and a portable implementation of clzll.

R, using POMP

Running:

Rscript sir_pomp.R

Output:

Loading required package: methods
   user  system elapsed 
  0.741   0.108   0.849

R, using LibBi

Running:

Rscript sir_bi.R

Output:

Model:  sir 
Run time:  0.073982  seconds
Number of samples:  1000 
[1] 20.251

Note that it takes much longer in practice than above for rbi to run the model, due to the need for compilation, which takes a lot longer than Julia. Thanks to @sbfnk for some excellent examples.

Any suggestions for improvements welcome!

@if asmjs or wasm:
d:emscripten
@end
@if emscripten or asmjs or wasm:
o:"index.html"
@if not wasm:
d:asmjs
@end
cc = clang
clang.exe = "emcc"
clang.linkerexe = "emcc"
clang.options.linker = ""
cpu = "i386"
@if wasm:
passC = "-s WASM=1 -s 'BINARYEN_METHOD=\"native-wasm\"' -Iemscripten -s SINGLE_FILE=1 "
passL = "-s WASM=1 -s 'BINARYEN_METHOD=\"native-wasm\"' -Lemscripten -s SINGLE_FILE=1 -s ALLOW_MEMORY_GROWTH=1 "
@elif asmjs:
passC = "-s ASM_JS=1 -Iemscripten" #-s USE_PTHREADS=1
passL = "-s ASM_JS=1 -Lemscripten" #-s ALLOW_MEMORY_GROWTH=1"
@end
@if release:
passC %= "-O3 -flto -ffast-math"
passL %= "-O3 -flto -ffast-math"
@end
#Fix _setjmp/longjmp problem. https://irclogs.nim-lang.org/24-09-2017.html#12:19:50
d:nimStdSetjmp # https://irclogs.nim-lang.org/24-09-2017.html#20:13:18
@end
#include <array>
#include <vector>
#include <cmath>
#include <iostream>
#include <climits>
#include <limits>
#include <iterator>
#include <numeric>
#include <chrono>
using namespace std;
// The following is used to turn the uint64 random numbers into a double
// between 0.0 and 1.0
int clz64(uint64_t v) {
const unsigned char y[] = {
0, 47, 1, 56, 48, 27, 2, 60, 57, 49, 41, 37, 28, 16, 3, 61,
54, 58, 35, 52, 50, 42, 21, 44, 38, 32, 29, 23, 17, 11, 4, 62,
46, 55, 26, 59, 40, 36, 15, 53, 34, 51, 20, 43, 31, 22, 10, 45,
25, 39, 14, 33, 19, 30, 9, 24, 13, 18, 8, 12, 7, 6, 5, 63
};
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v |= v >> 32;
return ((sizeof(uint64_t) * CHAR_BIT) - 1) -
y[(uint64_t)(v * 0x03F79D71B4CB0A89ULL) >> 58];
}
double random_real(uint64_t x)
{
int exponent = -64;
uint64_t significand;
unsigned shift;
while ((significand = x) == 0) {
exponent -= 64;
if (exponent < -1074) return 0;
}
shift = clz64(significand);
if (shift != 0) {
exponent -= shift;
significand <<= shift;
significand |= (x >> (64 - shift));
}
significand |= 1;
return ldexp((double)significand, exponent);
}
// Random number generator
uint64_t xorshift128plus(array<uint64_t,2> &rng) {
uint64_t x = rng[0];
uint64_t const y = rng[1];
rng[0] = y;
x ^= x << 23; // a
rng[1] = x ^ y ^ (x >> 17) ^ (y >> 26); // b, c
uint64_t z = rng[1] + y;
return(z);
}
double randunif(array<uint64_t,2> &rng){
uint64_t z = xorshift128plus(rng);
double dz = random_real(z);
return dz;
}
int64_t randbn(int64_t n, double p, array<uint64_t,2> &rng)
{
const double log_q = log(1.0 - p);
int64_t x = 0;
double sum = 0.0;
while(1){
sum += log(randunif(rng))/static_cast<double>(n - x);
if(sum < log_q) break;
x += 1;
}
return(x);
}
void sir(int64_t t, array<int64_t,4> &u, array<int64_t,4> &du, const array<double,5> parms, array<uint64_t,2> &rng)
{
int64_t S=u[0];
int64_t I=u[1];
int64_t R=u[2];
int64_t Y=u[3];
double beta=parms[0];
double gamma=parms[1];
double iota=parms[2];
double N=parms[3];
double dt=parms[4];
double lambd=beta*(static_cast<double>(I)+iota)/N;
double ifrac=1.0-exp(-lambd*dt);
double rfrac=1.0-exp(-gamma*dt);
int64_t infection=randbn(S,ifrac,rng);
int64_t recovery=randbn(I,rfrac,rng);
du[0]=S-infection;
du[1]=I+infection-recovery;
du[2]=R+recovery;
du[3]=Y+infection;
return;
}
double simulate()
{
array<double,5> parms = {2.62110617498984, 0.5384615384615384, 0.5, 403.0, 0.1};
int64_t tf=540;
int64_t nsims = 1000;
array<int64_t,1000> yvec;
array<uint64_t,2> rng = {123,456};
for(int64_t i=0;i<nsims;i++){
array<int64_t,4> u = {60,1,342,0};
array<int64_t,4> du = {0,0,0,0};
for(int64_t j=1;j<=tf;j++){
sir(j,u,du,parms,rng);
u=du;
}
yvec[i] = u[3];
}
int64_t numerator = accumulate(begin(yvec),end(yvec),0);
double result = static_cast<double>(numerator)/static_cast<double>(nsims);
return(result);
}
int main(int argc, char* argv[]) {
std::clock_t startcputime = std::clock();
double m=simulate();
double cpu_duration = (std::clock() - startcputime) / (double)CLOCKS_PER_SEC;
cout << cpu_duration << "\n";
cout << m << "\n";
}
using RandomNumbers
function randbn(n, p, rng)
log_q = log(1.0 - p)
x = 0
sum = 0.0
while true
sum += log(rand(rng)) / (n - x)
sum < log_q && break
x += 1
end
return x
end
function sir(t,u,du,parms,rng)
S=u[1]
I=u[2]
R=u[3]
Y=u[4]
β=parms[1]
γ=parms[2]
ι=parms[3]
N=parms[4]
δt=parms[5]
λ=β*(convert(Float64,I)+ι)/N
ifrac=1.0-exp(-λ*δt)
rfrac=1.0-exp(-γ*δt)
infection=randbn(S,ifrac,rng)
recovery=randbn(I,rfrac,rng)
du[1]=S-infection
du[2]=I+infection-recovery
du[3]=R+recovery
du[4]=Y+infection
end
function simulate()
parms = [2.62110617498984, 0.5384615384615384, 0.5, 403.0, 0.1]
seed=123
r = Xorshifts.Xorshift128Plus(seed)
tf = 540
nsims = 1000
yvec = Vector{Int64}(nsims)
for i in 1:nsims
u = [60,1,342,0]
du = [0,0,0,0]
for j in 1:tf
sir(j,u,du,parms,r)
u=du
end
yvec[i]=du[4]
end
convert(Float64,sum(yvec))/convert(Float64,nsims)
end
simulate()
@time m=simulate()
println(m)
import random, random.xorshift, random.common
import math
import times
proc randbn(n: int64; p: float64; rng: var RNG): int64 =
let logq:float64 = math.ln(1.0-p)
var x:int64 = 0
var sum:float64 = 0.0
while true:
sum += math.ln(rng.random())/float64(n - x)
if sum < log_q:
return x
x=x+1
proc sir(t:int64;u:array[4,int64];du: var array[4,int64]; parms:array[5,float64]; r: var RNG): int64 {.discardable.}=
let S=u[0]
let I=u[1]
let R=u[2]
let Y=u[3]
let beta=parms[0]
let gamma=parms[1]
let iota=parms[2]
let N=parms[3]
let dt=parms[4]
let lambd=beta*(float64(I)+iota)/N
let ifrac=1.0-exp(-lambd*dt)
let rfrac=1.0-exp(-gamma*dt)
let infection=int(randbn(S,ifrac,r))
let recovery=int(randbn(I,rfrac,r))
du[0]=S-infection
du[1]=I+infection-recovery
du[2]=R+recovery
du[3]=Y+infection
result = 1
proc simulate():float64 =
let parms:array[5,float64] = [2.62110617498984, 0.5384615384615384, 0.5, 403.0, 0.1]
var seed:uint64 = 123
var r = initXorshift128Plus(seed)
let tf:int64=540
let nsims:int64 = 1000
var yvec:array[1000,int64]
for i in 1..nsims:
var u:array[4,int64] = [60.int64,1,342,0]
var du:array[4,int64] = [0.int64,0,0,0]
for j in 1..tf:
discard sir(j,u,du,parms,r)
u=du
yvec[i-1] = u[3]
result = float64(sum(yvec))/float64(nsims)
let t0=cpuTime()
let m = simulate()
let t1=cpuTime()
echo t1-t0
echo m
library(rbi)
sir.bi <- "
model sir {
const N = 403
const timestep = 0.1
state S, I, R, Y
param bet, gamm, iota
noise infection, recovery
sub parameter {
bet <- 2.62110617498984
gamm <- 0.5384615384615384
iota <- 0.5
}
sub initial {
S <- 60
I <- 1
R <- 342
Y <- 0
}
sub transition (delta=timestep) {
inline lambd = bet*(I+iota)/N
inline ifrac = 1.0-exp(-lambd*timestep)
inline rfrac = 1.0-exp(-gamm*timestep)
infection ~ binomial(S,ifrac)
recovery ~ binomial(I,rfrac)
S <- S - infection
I <- I + infection - recovery
R <- R + recovery
Y <- Y + infection
}
}
"
sir.bi.model <- bi_model(lines=sir.bi)
sir.bi.obj <- libbi(model=sir.bi.model)
sir.bi.sample <- sample(sir.bi.obj,
target="joint",
end_time=54.0,
nsamples=1000,
seed=1)
sir.bi.sample
sir.bi.results <- bi_read(sir.bi.sample$output_file_name)
mean(sir.bi.results$Y$value)
using RandomNumbers
@inline @fastmath function randbn(n,p,rng)
q = 1.0 - p
s = p/q
a = (n+1)*s
r = exp(n*log(q))
x = 0
u = rand(rng)
while true
if (u < r)
return x
end
u -= r
x += 1
r *= (a/x)-s
end
end
@inline @fastmath function sir(t, u, parms, rng)
(S, I, R, Y) = u
(β, γ, ι, N, δt) = parms
λ = β * (I + ι) / N
ifrac = 1.0 - exp(-λ * δt)
rfrac = 1.0 - exp(-γ * δt)
infection = randbn(S, ifrac, rng)
recovery = randbn(I, rfrac, rng)
return (S - infection, I + infection - recovery, R + recovery, Y + infection)
end
function simulate()
parms = (2.62110617498984, 0.5384615384615384, 0.5, 403.0, 0.1)
seed = 123
r = Xorshifts.Xorshift128Plus(seed)
tf = 540
nsims = 1000
yvec = Vector{Int64}(nsims)
u0 = (60, 1, 342, 0)
for i in 1:nsims
u = u0
for j in 1:tf
u = sir(j, u, parms, r)
end
yvec[i] = u[4]
end
return sum(yvec) / nsims
end
simulate()
@time m=simulate()
println(m)
import random, random.xorshift, random.common
import math
import times
type
Params = tuple[beta,gamma,iota,N,dt: float64]
State = tuple[S,I,R,Y: int]
proc randbn(n: int; p: float64; rng: var RNG): int {.inline.} =
let q:float64 = 1.0-p
let s:float64 = p/q
let a:float64 = (float64(n)+1.0)*s
var r:float64 = math.exp(float64(n)*math.ln(q))
var x:int = 0
var u:float64 = rng.random()
while true:
if u < r:
return x
u = u-r
x = x+1
r = r*((a/float64(x))-s)
proc sir(t:int; u: State;du: var State; parms: Params; r: var RNG): int {.inline.}=
let (S,I,R,Y) = u
let (beta,gamma,iota,N,dt)=parms
let lambd=beta*(float64(I)+iota)/N
let ifrac=1.0-exp(-lambd*dt)
let rfrac=1.0-exp(-gamma*dt)
let infection=int(randbn(S,ifrac,r))
let recovery=int(randbn(I,rfrac,r))
du.S=S-infection
du.I=I+infection-recovery
du.R=R+recovery
du.Y=Y+infection
result = 1
proc simulate():float64 =
let parms:Params = (2.62110617498984, 0.5384615384615384, 0.5, 403.0, 0.1)
var seed:array[2,uint64] = [123.uint64,456]
var r = initXorshift128Plus(seed)
let tf:int=540
let nsims:int = 1000
var yvec:array[1000,int]
for i in 1..nsims:
var u:State = (60,1,342,0)
var du:State = (0,0,0,0)
for j in 1..tf:
discard sir(j,u,du,parms,r)
u=du
yvec[i-1] = u[3]
result = float64(sum(yvec))/float64(nsims)
when isMainModule:
let t0 = cpuTime()
let m = simulate()
let t1 = cpuTime()
echo t1 - t0
echo m
library(pomp)
sir.step <- "
double lambd = bet * (I+iota) / N;
double ifrac = 1.0 - exp(-lambd *dt);
double rfrac = 1.0 - exp(-gamm*dt);
double infection = rbinom(S, ifrac);
double recovery = rbinom(I, rfrac);
S += -infection;
I += infection - recovery;
R += recovery;
Y += infection;
"
rmeas <- "
Z = Y;
"
pomp(
data=data.frame(time=seq(0,54,by=0.1),Z=rep(0,541)),
times="time",
t0=0,
rmeasure=Csnippet(rmeas),
rprocess=euler.sim(
step.fun=Csnippet(sir.step),
delta.t=0.1
),
statenames=c("S","I","R","Y"),
paramnames=c(
"bet","gamm","iota", "N"
),
initializer=function(params, t0, ...) {
x0 <- c(S=60,I=1,R=342,Y=0)
x0
},
params=c(bet=2.62110617498984, gamm=0.5384615384615384, iota=0.5, N=403.0)
) -> sir
set.seed(1)
system.time(replicate(1000,simulate(sir)))
@sdwfrost
Copy link
Author

sdwfrost commented Aug 9, 2020

Feel free to suggest additions/changes! I'm no expert in C++.

@snawaz
Copy link

snawaz commented Aug 10, 2020

@sdwfrost,

I did the following things:

  • I copied the C++ code, and compiled it with clang++ -O3 as you did, and on my machine it took ~0.056s. I consider this original speed on my machine. Now let's see how much we can improve it with minor tweaks.
  • I passed -ffast-math as well, and nothing else; it took ~0.036s.
  • I passed -march=native as well, and nothing else; it took ~0.029s.
  • I used __attribute__((always_inline)) for all functions, except main() and simulate(), and it took ~0.024s.

So going from 0.056s to 0.024s is more than 2x performant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment