Skip to content

Instantly share code, notes, and snippets.

@jeancroy
Last active July 4, 2019 13:59
Show Gist options
  • Save jeancroy/c887f3c592e997b9c11818711a014a65 to your computer and use it in GitHub Desktop.
Save jeancroy/c887f3c592e997b9c11818711a014a65 to your computer and use it in GitHub Desktop.
// Explore very small and very large, positive and negative numbers.
//
// Spend about 50% of time in + and 50% in - region. (Or fix m to 1 fo positive only)
// Spend about 50% of time in +- 10^N and 50% in +- 10^-N region.
// Spend about 50% of time between +-[1/1000, 1000] range and 50% of time outside of it, without limit.
//
// More precisely 50% of values are in [3^-s, 3^s] with "s" is the power of "q".
// As a guideline q^6: 1/1000..1000, q^4: 1/100..100, q^2: 1/10..10
// One may notice than 3^6=729<1000,
// In practice what this mean is that 1000 is in the 60% most likely outcomes instead of 50%.
//
// m estimate a sign function ( average of positive values is 1, average of negatives is -1)
// q^s draw from a log-logistic distribution with median=1, scale = s
//
// To recenter, multiply the output of this to the new desired median value (currently 1)
function randomLogScale1000(){
m = 4.0 * Math.random() - 2.0;
p = Math.random();
q = p/(1.0-p);
q2 = q*q;
q4 = q2*q2;
return m*q2*q4;
}
// One way to deal with unknown range is to decompose number in both mantissa and exponenet
// (number and scale of magnitude) and have both be random.
//
// One way to acheive it would be:
// x = m*exp(y), with m in Uniform[-2,2] and y in Gaussian[mu=0, std=10]
//
// - y standard deviation of 10 is choosen because to get "engineering range" values.
// -- qartile of y happens at +- 0.67 standard devidation. (50% of data between q1 and q3)
// -- exp(0.67std) = 1000, std = ln(1000)/0.67 = 10.31
//
// - m in [-2,2] is choosen so positive values of m average to 1, and negatives averages to -1
// ( on avergae this is a branchless sign function )
//
//
// But we don't need gaussian, bell-like is enough and we'll choose logistic distribution for ease of computation.
// To sample from a distribution, we can use the quantile function (inverse of cumulative distribution)
// In this case the quantile function of logistic distribution is the logit function:
// logit(p,s,mu) = mu + s*ln(1/(1-p))
//
// Given p in [0,1], x now become
// x = m*exp(y) = x = m*exp(mu + s*ln(1/(1-p)))
//
// With mu = 0 and with exp and ln canceling out:
// x = m*(1/(1-p))^s
//
// For ease of computation, we choose s to be an interger
// This way, we can implement the power using multiplications.
// We also want the standard deviation of y to be about 10. (engineering range)
//
// We use the variance formula for logistic distribution to isolate parameter "s"
// var(y) = std(y)^2 = s^2* pi^2 / 3
// s = std(y)*sqrt(3)/pi
//
// Choose s as approx 10*sqrt(3)/pi = 5.51
// ==> Choose s integer as 6
//
// ----------------------------
//
// Additionally, exp(y) with y in logistic distribution now follow a log-logistic distribution.
// Lower quartile is 3^-s and upper quartile is 3^s, median is 1
// https://en.wikipedia.org/wiki/Log-logistic_distribution#Quantiles
//
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment