-
-
Save CGMossa/dbdd887a7ace8552bb7d68cb5d26aeef to your computer and use it in GitHub Desktop.
[package] | |
name = "agent_based_trading_julia" | |
version = "0.1.0" | |
authors = ["cgmossa <cgmossa@gmail.com>"] | |
edition = "2018" | |
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | |
[dependencies] | |
rand = {version = "0.7.3", features = ["small_rng"]} | |
rand_distr = "0.2.2" | |
[dev-dependencies] | |
criterion = "0.3" | |
[[bench]] | |
name = "cont_run" | |
harness = false |
// resides in benches/cont_run.rs | |
// | |
use agent_based_trading_julia::agent_simulation::cont_run; | |
use criterion::{black_box, criterion_group, criterion_main, Criterion}; | |
pub fn criterion_benchmark(c: &mut Criterion) { | |
c.bench_function("cont_run default", |b| { | |
b.iter(|| { | |
black_box(cont_run( | |
black_box(10_000), | |
black_box(10_000), | |
black_box(0.05), | |
black_box(0.1), | |
)) | |
}) | |
}); | |
} | |
criterion_group!(benches, criterion_benchmark); | |
criterion_main!(benches); |
using Random | |
using StatsBase | |
function cont_run(time=10000, n=10000, λ=0.05, q=0.1) | |
r = zeros(time) | |
θ = zeros(n) | |
pchange = zeros(n) | |
for t = 1:time | |
ε = randn() | |
if ε > 0 | |
r[t] = sum(<(ε), θ) / (λ * n) | |
else | |
r[t] = -sum(<(-ε), θ) / (λ * n) | |
end | |
θ .= ifelse.(rand!(pchange) .< q, abs(r[t]), θ) | |
end | |
return kurtosis(r) | |
end |
// resides in src/lib.rs | |
use rand::prelude::SmallRng; | |
use rand::{thread_rng, Rng, SeedableRng}; | |
pub fn cont_run(time: usize, n: usize, lambda: f64, q: f64) -> f64 { | |
let mut theta = vec![0.; n]; | |
let n = n as f64; | |
let mut eps_sampler = SmallRng::from_rng(thread_rng()) | |
.unwrap() | |
.sample_iter(rand_distr::StandardNormal); | |
let mut pchange_sampler = SmallRng::from_rng(thread_rng()) | |
.unwrap() | |
.sample_iter(rand::distributions::Uniform::new_inclusive(0., 1.)); | |
let r = std::iter::repeat_with(move || { | |
let eps: f64 = eps_sampler.next().unwrap(); | |
let r_t = if eps > 0. { | |
theta.iter().filter(|&&x| eps > x).count() as f64 / (lambda * n) | |
} else { | |
-(theta.iter().filter(|&&x| -eps > x).count() as f64) / (lambda * n) | |
}; | |
theta | |
.iter_mut() | |
.filter(|_| pchange_sampler.next().unwrap() < q) | |
.for_each(|x| { | |
*x = r_t.abs(); | |
}); | |
r_t | |
}); | |
let r = r.take(time).collect::<Vec<_>>(); | |
kurtosis(r) | |
} | |
fn kurtosis(x: Vec<f64>) -> f64 { | |
let n = x.len() as f64; | |
let mean_x = x.iter().sum::<f64>() / n; | |
let x = x.iter().copied().map(|x| x - mean_x); | |
let r: f64 = n * x.clone().map(|x| x.clone().powi(4)).sum::<f64>() | |
/ (x.map(|x| x.powi(2)).sum::<f64>().powi(2)); | |
r * (1. - 1. / n).powi(2) - 3. | |
} |
Unfortunately I can't use msvc
or have a windows system available, I guess I'll stick with perf annotate
and flamegraphs.
I've tried using the XorShift generator, which is supposedly a lot faster than MT, but to no avail. It seems the optimized implementation is just really good.
The https://docs.rs/xorshift/0.1.3/xorshift/ is really old and uses a really old version of the rand
crate but seems fairly solid otherwise.
I also found out that there's also experimental SIMD support in the rand
crate gated behind the simd_support
feature.
Here's a really interesting discussion regarding SIMD PRNGs and SIMD distributions.
rust-random/rand#377
However it seems non-trivial to use, since you have to explicitly use the packed_simd
types.
Here is a list of SIMD PRNGs: https://github.com/TheIronBorn/simd_prngs (linked in the discussion).
I've made an absolutely horrible implementation using this in-development crate and it comes in at 15us
for 10k random f64
.
While 2x faster than the previous Rust only implementation, this is 2.4x slower than dSFMT.
I've added you to the repo in case you'd like to make any further changes.
I think from here it would be neat if SIMD RNGs could proceed in rust to a state where it's easy to use (and hard to misuse) them.
For example, I don't think there is a convenience function to generate an array of f64
s while making use of SIMD.
This is non-trivial when the number of elements in the array aren't divisible by the number of elements in the SIMD array or the array isn't aligned properly.
Additionally, there's still a performance gap to fill with the RNG.
In fact, it should run faster than the Mersenne Twister, since it's a lot simpler.
Now talking about alignment, I noticed that my use of the SIMD RNG wasn't correct - I was using the write_aligned function but the allocation wasn't aligned the way it needed to be.
We're now down to 10.9us
for Rust only when using write_to_slice_unaligned
.
This probably can be optimized by writing the first few f64
s using a scalar implementation and then using SIMD on the aligned regions.
I figured out that writing .compiler("clang")
in the build.rs
will force me to use clang, which is the c-compiler I suspect you used. The specific flag -flto
is not defined for gcc
or cl
(msvc c-compiler). But somehow cl
had no problem.
I am running the benchmarks now. Just FYI, the rand
-crate has xorshift
implemented. I'll try to get it in and see.
Are there any known deficiencies with mersienne twister?
Right, I always set CC=clang, sorry I didn't communicate that.
That sounds better, but I don't think it has SIMD support?
Mersenne Twister typically comes with a very large state size, and still has some statistical deficiencies. There's a paper summarizing a few arguments against it https://arxiv.org/pdf/1910.06437.
Just for completion sake: Julia 1.5v didn't improve the benchmark results at all.
Thanks. I tried to just
cargo build
this repository of yours and it didn't work at all.I am as usual having an absolute terrible time linking with a C-library.
I have cygwin toolchain as being dominant, but I cannot seem to figure out how to install clang or lld-linker or whatever is needed
for this on Cygwin. Googling it is not yielding the right instructions.
I've included my terminal outputs. Maybe you can spot what is going on here. I'm beyond angry that
linking to C is just such a terrible experience.
As a side, I found this dsfmt-rs it is a rust port of the prng you said is Julias.
I'm sure the C-library you linked to is definitely faster, it being SIMD and SSE2.