-
-
Save CGMossa/dbdd887a7ace8552bb7d68cb5d26aeef to your computer and use it in GitHub Desktop.
[package] | |
name = "agent_based_trading_julia" | |
version = "0.1.0" | |
authors = ["cgmossa <cgmossa@gmail.com>"] | |
edition = "2018" | |
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | |
[dependencies] | |
rand = {version = "0.7.3", features = ["small_rng"]} | |
rand_distr = "0.2.2" | |
[dev-dependencies] | |
criterion = "0.3" | |
[[bench]] | |
name = "cont_run" | |
harness = false |
// resides in benches/cont_run.rs | |
// | |
use agent_based_trading_julia::agent_simulation::cont_run; | |
use criterion::{black_box, criterion_group, criterion_main, Criterion}; | |
pub fn criterion_benchmark(c: &mut Criterion) { | |
c.bench_function("cont_run default", |b| { | |
b.iter(|| { | |
black_box(cont_run( | |
black_box(10_000), | |
black_box(10_000), | |
black_box(0.05), | |
black_box(0.1), | |
)) | |
}) | |
}); | |
} | |
criterion_group!(benches, criterion_benchmark); | |
criterion_main!(benches); |
using Random | |
using StatsBase | |
function cont_run(time=10000, n=10000, λ=0.05, q=0.1) | |
r = zeros(time) | |
θ = zeros(n) | |
pchange = zeros(n) | |
for t = 1:time | |
ε = randn() | |
if ε > 0 | |
r[t] = sum(<(ε), θ) / (λ * n) | |
else | |
r[t] = -sum(<(-ε), θ) / (λ * n) | |
end | |
θ .= ifelse.(rand!(pchange) .< q, abs(r[t]), θ) | |
end | |
return kurtosis(r) | |
end |
// resides in src/lib.rs | |
use rand::prelude::SmallRng; | |
use rand::{thread_rng, Rng, SeedableRng}; | |
pub fn cont_run(time: usize, n: usize, lambda: f64, q: f64) -> f64 { | |
let mut theta = vec![0.; n]; | |
let n = n as f64; | |
let mut eps_sampler = SmallRng::from_rng(thread_rng()) | |
.unwrap() | |
.sample_iter(rand_distr::StandardNormal); | |
let mut pchange_sampler = SmallRng::from_rng(thread_rng()) | |
.unwrap() | |
.sample_iter(rand::distributions::Uniform::new_inclusive(0., 1.)); | |
let r = std::iter::repeat_with(move || { | |
let eps: f64 = eps_sampler.next().unwrap(); | |
let r_t = if eps > 0. { | |
theta.iter().filter(|&&x| eps > x).count() as f64 / (lambda * n) | |
} else { | |
-(theta.iter().filter(|&&x| -eps > x).count() as f64) / (lambda * n) | |
}; | |
theta | |
.iter_mut() | |
.filter(|_| pchange_sampler.next().unwrap() < q) | |
.for_each(|x| { | |
*x = r_t.abs(); | |
}); | |
r_t | |
}); | |
let r = r.take(time).collect::<Vec<_>>(); | |
kurtosis(r) | |
} | |
fn kurtosis(x: Vec<f64>) -> f64 { | |
let n = x.len() as f64; | |
let mean_x = x.iter().sum::<f64>() / n; | |
let x = x.iter().copied().map(|x| x - mean_x); | |
let r: f64 = n * x.clone().map(|x| x.clone().powi(4)).sum::<f64>() | |
/ (x.map(|x| x.powi(2)).sum::<f64>().powi(2)); | |
r * (1. - 1. / n).powi(2) - 3. | |
} |
I've added you to the repo in case you'd like to make any further changes.
I think from here it would be neat if SIMD RNGs could proceed in rust to a state where it's easy to use (and hard to misuse) them.
For example, I don't think there is a convenience function to generate an array of f64
s while making use of SIMD.
This is non-trivial when the number of elements in the array aren't divisible by the number of elements in the SIMD array or the array isn't aligned properly.
Additionally, there's still a performance gap to fill with the RNG.
In fact, it should run faster than the Mersenne Twister, since it's a lot simpler.
Now talking about alignment, I noticed that my use of the SIMD RNG wasn't correct - I was using the write_aligned function but the allocation wasn't aligned the way it needed to be.
We're now down to 10.9us
for Rust only when using write_to_slice_unaligned
.
This probably can be optimized by writing the first few f64
s using a scalar implementation and then using SIMD on the aligned regions.
I figured out that writing .compiler("clang")
in the build.rs
will force me to use clang, which is the c-compiler I suspect you used. The specific flag -flto
is not defined for gcc
or cl
(msvc c-compiler). But somehow cl
had no problem.
I am running the benchmarks now. Just FYI, the rand
-crate has xorshift
implemented. I'll try to get it in and see.
Are there any known deficiencies with mersienne twister?
Right, I always set CC=clang, sorry I didn't communicate that.
That sounds better, but I don't think it has SIMD support?
Mersenne Twister typically comes with a very large state size, and still has some statistical deficiencies. There's a paper summarizing a few arguments against it https://arxiv.org/pdf/1910.06437.
Just for completion sake: Julia 1.5v didn't improve the benchmark results at all.
Unfortunately I can't use
msvc
or have a windows system available, I guess I'll stick withperf annotate
and flamegraphs.I've tried using the XorShift generator, which is supposedly a lot faster than MT, but to no avail. It seems the optimized implementation is just really good.
The https://docs.rs/xorshift/0.1.3/xorshift/ is really old and uses a really old version of the
rand
crate but seems fairly solid otherwise.I also found out that there's also experimental SIMD support in the
rand
crate gated behind thesimd_support
feature.Here's a really interesting discussion regarding SIMD PRNGs and SIMD distributions.
rust-random/rand#377
However it seems non-trivial to use, since you have to explicitly use the
packed_simd
types.Here is a list of SIMD PRNGs: https://github.com/TheIronBorn/simd_prngs (linked in the discussion).
I've made an absolutely horrible implementation using this in-development crate and it comes in at
15us
for 10k randomf64
.While 2x faster than the previous Rust only implementation, this is 2.4x slower than dSFMT.