luczsoma/randomness-demystified.md

## randomness-demystified.md

      
    Raw
  

              randomness-demystified.md
            
          
    Randomness, demystified

The original version of this article was published in Tresorit Engineering on Medium. This version is textually identical to the original, but illustrations are omitted.
Randomness is everywhere. Every atom in the air moves randomly. On subatomic levels, there is even more randomness. Casinos, lotteries, and many games — virtual or not — are based on randomness. Cryptocurrencies or even opening websites involve dealing with “some cryptographic random”, behind the scenes. Developers have to prepare for the ubiquity of randomness, or they will need to face some seriously ugly bugs.
This article mainly focuses on randomness in cryptography. Its goal is not to provide a complete and sound scientific introduction, but rather to provide useful concepts based on abstractions and simplifications, and even concrete tips and rules for developers dealing with cryptographic randomness. The key takeaway — so important that it has to be emphasized up front — is:

Do not implement a random generator for cryptographic purposes (like generating passwords, salts or encryption keys) in a production system by yourself. Most probably you should not use low-level cryptographic libraries at all, but this depends on the level of your expertise.


If you need to keep your data secure, use a high-level, secure-by-design cryptosystem  —  like Tresorit  —  with a strong password and multi-factor authentication.


If you are a developer needing to implement something crypto-related, first make sure to get relevant expertise in the field. This expertise can even be another person or a consultant, but somehow you must be able to identify and fix weak points in your system’s security. And still, in most of the cases it’s best to use a high-level library (like NaCl), designed to avoid low-level crypto vulnerabilities.


If you need secure random numbers, either use a true random generator device, a well-known library, or rather a service provided by your operating system’s kernel for generating such numbers.


Feel free to experiment with any part of cryptography though, plan and implement, then learn from your mistakes by requesting reviews from crypto experts. Just don’t do it in production.
Introduction

Let’s start with the basics. The term randomness can be defined as the lack of predictability or lack of patterns in a sequence of “things” (like events or symbols, or anything else of one kind). More precise mathematical definitions exist, but they are beyond the scope of this article. The lack of predictability concerns an individual event, like tossing a coin: the outcome of a given attempt cannot be predicted (unless special conditions are set up beforehand, which is usually called cheating), even if all outcomes of previous attempts are known in advance. The lack of patterns concerns a sequence of events: e.g. a sequence of fair coin toss outcomes does not follow any patterns or combinations.
A common “measurement of unpredictability” is entropy (in the information theoretic sense, not in the thermodynamic one, although the two are related). Entropy is usually measured in bits. Its value is 0 bit for an outcome that is certain to occur. For an outcome that has maximum uncertainty, its value is log2(n) bits, where n is the number of independent outcomes of the experiment. For example, a fair coin toss has 2 possible outcomes: heads or tails. There is equally 1/2 chance for both, thus the actual outcome of an attempt is of maximum uncertainty. Independent fair coin tosses therefore have log2(2) = 1 bit entropy per toss. Flipping a coin with one side heavier than the other yields less than 1 bit of entropy, since it’s more probable that the coin will land on its heavier side. Flipping a coin having heads on both sides delivers exactly 0 bits of entropy, since it’s certain that the outcome will be heads.
Let’s take a look at a 256-bit-long key for today’s prevalent Advanced Encryption Standard (AES-256) algorithm. If the key was generated with a series of fair coin tosses (256 of them — one for each bit; e.g. heads is ’1’, tails is ’0’), then each bit of the key contains 1 bit of entropy, so the 256-bit-long key contains the most entropy (256 bits) its size allows. Which is good.
In connection with entropy, two other important terms are uniform distribution and statistical independence. Staying with the example of fair coin tosses: heads and tails are equally likely to happen, so we say there is a uniform distribution amongst the two possible outcomes. If we look at a sequence of fair coin toss outcomes, we say the elements of the sequence are statistically independent from each other.
In cryptography, the lack of predictability is an absolute requirement e.g. for key generation (or for initializing a block cipher in some modes). Being able to predict the next or regenerate a previous key under known conditions (e.g. at a known time, like registration) undermines the whole system’s security. Using the above terms:

A good cryptographic key is of high entropy, thus unpredictable. A good key generation method produces keys with uniform distribution within the whole key space, thus each generated key is statistically independent from every other one.

Two kinds of randomness

Computers work by following algorithms, which makes them deterministic, thus completely predictable. This seems to contradict the fact that cryptographic key generation — which involves randomness by definition — is entirely based on computers. How can we incorporate randomness into our computer-based cryptosystems? For answering this, we need to look for TRNGs and (CS)PRNGs.
True Random Number Generators

If we toss a coin or throw a die, we cannot predict the outcome. The concept of true random generation with a computer relies on such physical phenomena: basically we ask the computer to toss a coin. But usually a computer cannot toss a coin, and even if it could, the detection of the outcome would be problematic. For a computer, there are much better alternative random sources.
TRNG devices usually consist of a random source and a detection instrument. The source — the physical phenomenon itself — generates random noise, which is detected by the instrument. The result is then translated to a numeric value, often simply to 0 or 1. This is repeated each time a random number is needed.
Suitable physical phenomena for TRNGs include e.g. atmospheric noise detected by a radio, or radioactive decay of atoms detected by a Geiger counter. Such phenomena are considered to be completely unpredictable (more on that later), and their outcomes are much easier to detect and feed into a computer than a coin toss or a die roll.
(Cryptographically Secure) Pseudo-Random Number Generators

Most computers have no easy access to true randomness, even though there are commercially available integrable or external devices (mostly with a high price tag) and APIs over the internet (mostly with limited usage) offering true randomness. In contrast, pseudo-randomness — being only an algorithm — is there in every computer.
We need to be careful with that, though. PRNG values are not random in the same way as e.g. die rolls. A PRNG is only an algorithm, and since algorithms are deterministic by definition, that makes a PRNG predictable. Knowing the so-called seed — the initial value, which completely determines all subsequent states of the PRNG — all pseudo-random values can be predicted and reproduced. After good amount of research, today we have high-quality PRNGs with statistical properties approximating true random numbers. Several of them are standardized. Several of them are suitable for cryptographic purposes.
What makes a PRNG cryptographically secure?

Looking up the definitions at the beginning of this article, two basic — simplified — rules can be formulated of a CSPRNG: the generated values should not follow a pattern, and the next value should be unpredictable. Simply put, the generated values should look truly random, while not being truly random. But we need more precision for that.
There are two groups of cryptographically secure PRNG requirements:


They should pass statistical randomness tests. Statistics can tell. The US Government’s National Institute of Standards and Technology in Special Publication 800–22 specifically advises a set of statistical tests against (P)RNGs to determine if their generated values “look random enough” for cryptographic purposes. Robert G. Brown from Duke University’s Physics Department composed another test suite, called Dieharder. But passing the statistics is not enough on its own.


They should not break if their state is compromised. If a PRNG reveals its state to an attacker (by reverse engineering or even guesswork), the attacker should not be able to reconstruct the generated values before that. Also — assuming the PRNG has an external entropy source — the attacker should not be able to predict future values based only on observing that entropy source.


So how to get secure random values in my app?

You usually don’t want true random. You could pursue the goal to incorporate true randomness into your application, but it’s unnecessary in most cases: either the security increase by true randomness will be demolished by a user choosing 123456 or password as their password, or the level of effort and costs to have true randomness in your application are simply not comparable to the value of the guarded data. Although, if you can use true randomness without sacrificing too much, you should definitely use it.
The first and obvious choice is to get a dedicated hardware device. These are usually quite expensive though. Another choice would be to use random.org. Its API is simple and straightforward, yet powerful — and not free, unless you use it for development or non-profit purposes. However, getting random over the internet (be it a trusted or untrusted, signed or unsigned API, with or without TLS) is great for scientific or research purposes, but for cryptographic key generation — where it’s all about keeping that random ultimately secret — is not really sufficient.
Most times the sensible way is to use a high-quality CSPRNG: these are available virtually everywhere. I advise against writing your own CSPRNG. Even though it’s not that hard to get it right, it’s very easy to get it wrong. Even if you implemented a well-functioning one complying with all NIST-advised statistical tests, it would take years of public scrutiny to verify that your implementation is sufficiently resistant to determined reverse engineering.
You should get random numbers in your application from the following CSPRNGs. Note that this list is prioritized, and you should always use the first available option*:


Your operating system’s randomness API. Always use /dev/urandom instead of /dev/random on Unix-based distributions. On Windows, use BCryptGenRandom.


Really, use /dev/urandom (on Unix) or BCryptGenRandom (on Windows).


Indeed, you should use /dev/urandom (on Unix) or BCryptGenRandom (on Windows).


If an OS-level secure random source is not available in the given context, use OpenSSL (or another cryptographic library scrutinized to the same extent).


If your application runs in a web browser, always use the Web Crypto API’s native window.crypto.getRandomValues(), do not use JavaScript CSPRNGs. Also avoid using Math.random() for crypto, because it’s predictable (in Opera it’s actually a CSPRNG, but you don’t want your web application to be secure only in Opera).


Default random implementations in most programming languages are absolutely unsuitable for cryptography, so some more fair warnings: do not use srand() + rand() in C/C++, do not use random in Python, etc. If unsure about a method or library (let’s call it X), google it like “Is the PRNG of X cryptographically secure?”, then read as many posts, articles and SO or forum answers as you can.
*The goal of this article is to help developers with little cryptographic experience messing up randomness as least as possible. And since it’s easier to mess up with OpenSSL or another library than with /dev/urandom, it finishes only in second place (or fourth, actually). The quality of their CSPRNGs against each other can be disputed, but the security of a whole system is much likely to depend on other factors.
tl;dr

Use /dev/urandom (on Unix) or BCryptGenRandom (on Windows) where possible.
The philosophical end: is random really random?

The basic question is, whether the Universe is deterministic or not. If it is, then nothing is random, since every “happening” is predetermined by all previous “happenings” since the Big Bang. This means if we knew the Universe’s state to the fullest extent at a point in time, we could predict all subsequent states. But in practice, even if the Universe is deterministic, most likely it’s infeasible for human-created technology to acquire and process such an enormous system’s full state description. We would also face other barriers, like Heisenberg’s uncertainty principle, fundamentally limiting the precision of measuring a given state.
There are two basic approaches of RNGs relying on true randomness: chaotic systems and quantum effects. Chaotic systems are based on the concept of tiny changes in initial conditions causing substantial, untraceable changes in the overall result. In layman’s terms, a chaotic system is so complex and volatile that it’s infeasible to predict or even track its state. A good example for a chaotic system-based TRNG is the random.org API: it uses atmospheric noise as its random source. While our atmosphere can be considered as a chaotic but deterministic system, it’s hard to imagine that one will ever be able to precisely describe its overall state, reason about its noise and predict numbers generated by random.org, given the fact that we usually don’t even get next week’s weather forecasts right.
This approach applies to coin tosses, die rolls, roulette wheels, etc. as well. Even though it seems that only Newtonian mechanics is involved, there are much more factors to consider: air density, friction of the used materials, the wind caused by the AC unit… This causes the system to be so complex, that we cannot predict the outcome, and that is what we call random.
Randomness based on quantum physics is a bit less straightforward. On subatomic levels, particles seem to have objectively random behavior, therefore quantum effects are considered to be the “gold standard” of random generation. With a broad simplification, quantum effects carry “more randomness” than anything else. For example, it’s considered to be fundamentally unpredictable, when will an unstable atom decay. Not because we are not smart enough to perform precise measurements for predictions on this level, but because it’s fundamentally impossible to perform such measurements. The uncertainty principle stands firmly on that end of the scale.
Determinists argue with this. They claim that everything is predetermined, even on subatomic levels, and the reason why we see quantum effects as random is that our physical models are not yet complete enough to describe such systems. And being so restricts us from understanding causalities. To determinists, quantum effects carry the same kind of randomness as chaotic systems. From the cryptographer’s aspect, this doesn’t really matter though.
tl;dr

Until we are not perfectly sure about how the Universe works, we don’t really know if random is truly random. But it doesn’t really matter either. Even if it suddenly turns out that the Universe is deterministic from the smallest to the largest scales, the practical level of cryptographic security will not be limited or decreased by today’s available RNG tools.
Takeaway


Do not implement a random generator for cryptographic purposes (like generating passwords, salts or encryption keys) in a production system by yourself. Most probably you should not use low-level cryptographic libraries at all, but this depends on the level of your expertise.


Feel free to experiment with any part of cryptography, plan and implement, then learn from your mistakes by requesting reviews from crypto experts. Just don’t do it in production.


For generating random numbers, use /dev/urandom (on Unix) or BCryptGenRandom (on Windows) where possible.