Skip to content

Instantly share code, notes, and snippets.

@nkartashov
Last active August 29, 2015 14:17
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save nkartashov/e46fd146b1df2d79aaf3 to your computer and use it in GitHub Desktop.
Proposal draft

Introduction

Pseudorandom number generator (PRNG) is one of fundamental tools of applied mathematics, used in various applications such as cryptography, non-deterministic algorithms or modelling. However, "there is no silver bullet" and one PRNG doesn't fit all. More specifically, because PRNGs are

  • Deterministic
  • Periodic

usage of PRNGs in an environment unfit for them can lead to predictability or slow execution times and both are disastrous for industrial applications. Therefore, it is imperative to choose an appropriate PRNG for the right application, based on its statistical properties. One of such properties is splittability - the property of a PRNG which allows to split it into two PRNGs, each of which retains (at least to some extent) its qualities.

Motivation

One of the core means of organising parallelism in any language is separation by data, which separates the workers so that they don't modify shared data. One of ways to achieve such separation is by splitting resources, so that each worker accesses its own copy. PRNG can be viewed as one of such resources. Thus, we come to the fact that splittable PRNG is one of a key factors for fast and parallel applications involving nondeterminism.

  1. Current standard generator of System.Random has no statistical foundation for its stdSplit function, which is also noted in the source. On the other hand, SplitMix generator from [1] is backed by a published paper, analysis and is specifically aimed at providing a splittable implementation.
  2. Current standard generator is known for its slow performance, splitting into n generators worsens the problem n-fold.

Outline

In order to be efficient while writing an implementation of a PRNG, one needs to first establish a benchmark for assessing its performance and the quality of generated random numbers. Luckily, [1] provides us with DieHarder and TestU01 test suites for the latter, while for the former one would need to

  1. Write test programs including most common usage cases a. Monte-Carlo simulations b. Quick-Check testing
  2. Inspect low-level representations of implemented algorithms

Also, it would be good to run cryptographic test suites such as [3] in order to evaluate SplitMix implementation's fitness for cryptographic purposes.

Therefore, as a result of this proposal one should produce:

  1. A well-documented SplitMix implementation in Haskell
  2. Results from running benchmarks for the written implementation

Timeline

Before April 13: Read the papers relevant to the project such as [4] (paper on which current standard PRNG is based) and [5] (both referenced from the source), read [1] more thoroughly and look for more relevant papers (especially for ones about industrial applications using parallel random algorithms)

April 13 - May 25: Familiarize myself with Core representation, look at already implemented PRNGs' code, remember optimisations used, get development environment ready

May 26 - June 9: Implement SplitMix PRNG in Haskell, document it

June 10 - June 17: Find and fix bugs by using PRNG test suites mentioned above as well as self-written tests

June 18 - July 2: Write applications using splittable PRNG and estimate its performance, apply optimisations on Haskell level, document and explain changes made

July 3 - July 24: Look into Core representation of the implemented PRNG, write applications which exploit its weaknesses, optimize it on Core level, document and explain changes made

July 25 - August 10: Use [3] to find if SplitMix implementation from July 17 is fit for cryptographic purposes, if so try to apply the subsequent optimisations, produce a separate implementation if some are not possible. If implementation is unfit, find a way to fix it if the problems are minor, otherwise explore additional suggestions.

August 10 - August 21: Wrap up the work on both implementations, compile the documentation so far into a cohesive report

References

http://dl.acm.org/citation.cfm?id=2660195 http://www.inf.utfsm.cl/~hallende/download/Simul-2-2002/pads98.pdf http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf http://www.iro.umontreal.ca/~lecuyer/myftp/papers/cacm88.pdf http://www.cs.ou.edu/~rlpage/burtonpagerngjfp.pdf

About me

I'm a second year graduate student at St Petersburg Academic University. After being introduced to functional programming I felt that its core ideas are very close to what I thought are the guidelines for writing reliable applications. From that time I wanted to be involved in an open-source Haskell project and considering that I am into theoretic cryptography, I think the fulfillment of this project will both benefit the community and be a great experience for me.

Unfortunately, I will probably need to take a week off around late May (step 2 of the timeline) to finish the year's business at the university, but I am pretty sure that it will not hinder the fulfillment of this project. I will notify the mentor of the exact dates, when I know them (around late April).

Github: https://github.com/nkartashov

Current resume: https://drive.google.com/file/d/0B4H5UegQoqcVWVRxdHpKZ0l2Xzg

Email: snailandmail@gmail.com

@jvoigtlaender
Copy link

I'm surprised you don't mention the following work:

Are you aware of it? Have you considered it? Would it be an alternative to the approach you plan to take? That work is already targeted at Haskell. There might already be a prototype implementation around. Its interaction with QuickCheck specifically has been studied by the authors. (One of the authors is one of the original inventors of QuickCheck.) What motivates your use of a different PRNG?

(I have no opinion about the qualities of the different PRNGs. I'm just asking those questions because without considering them, your project might not have the best outcome.)

@nkartashov
Copy link
Author

Thanks for the comment, I have not missed that paper but indeed forgot to mention it. The algorithm from the paper is already implemented in tf-random package, but it is more complex and also it will be interesting to compare the two. I will update my proposal accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment