Skip to content

Instantly share code, notes, and snippets.

@joepie91
Last active December 4, 2024 19:25
Show Gist options
  • Save joepie91/7105003c3b26e65efcea63f3db82dfba to your computer and use it in GitHub Desktop.
Save joepie91/7105003c3b26e65efcea63f3db82dfba to your computer and use it in GitHub Desktop.
Secure random values (in Node.js)

Not all random values are created equal - for security-related code, you need a specific kind of random value.

A summary of this article, if you don't want to read the entire thing:

  • Don't use Math.random(). There are extremely few cases where Math.random() is the right answer. Don't use it, unless you've read this entire article, and determined that it's necessary for your case.
  • Don't use crypto.getRandomBytes directly. While it's a CSPRNG, it's easy to bias the result when 'transforming' it, such that the output becomes more predictable.
  • If you want to generate random tokens or API keys: Use uuid, specifically the uuid.v4() method. Avoid node-uuid - it's not the same package, and doesn't produce reliably secure random values.
  • If you want to generate random numbers in a range: Use random-number-csprng.

You should seriously consider reading the entire article, though - it's not that long :)

Types of "random"

There exist roughly three types of "random":

  • Truly random: Exactly as the name describes. True randomness, to which no pattern or algorithm applies. It's debatable whether this really exists.
  • Unpredictable: Not truly random, but impossible for an attacker to predict. This is what you need for security-related code - it doesn't matter how the data is generated, as long as it can't be guessed.
  • Irregular: This is what most people think of when they think of "random". An example is a game with a background of a star field, where each star is drawn in a "random" position on the screen. This isn't truly random, and it isn't even unpredictable - it just doesn't look like there's a pattern to it, visually.

Irregular data is fast to generate, but utterly worthless for security purposes - even if it doesn't seem like there's a pattern, there is almost always a way for an attacker to predict what the values are going to be. The only realistic usecase for irregular data is things that are represented visually, such as game elements or randomly generated phrases on a joke site.

Unpredictable data is a bit slower to generate, but still fast enough for most cases, and it's sufficiently hard to guess that it will be attacker-resistant. Unpredictable data is provided by what's called a CSPRNG.

Types of RNGs (Random Number Generators)

  • CSPRNG: A Cryptographically Secure Pseudo-Random Number Generator. This is what produces unpredictable data that you need for security purposes.
  • PRNG: A Pseudo-Random Number Generator. This is a broader category that includes CSPRNGs and generators that just return irregular values - in other words, you cannot rely on a PRNG to provide you with unpredictable values.
  • RNG: A Random Number Generator. The meaning of this term depends on the context. Most people use it as an even broader category that includes PRNGs and truly random number generators.

Every random value that you need for security-related purposes (ie. anything where there exists the possibility of an "attacker"), should be generated using a CSPRNG. This includes verification tokens, reset tokens, lottery numbers, API keys, generated passwords, encryption keys, and so on, and so on.

Bias

In Node.js, the most widely available CSPRNG is the crypto.randomBytes function, but you shouldn't use this directly, as it's easy to mess up and "bias" your random values - that is, making it more likely that a specific value or set of values is picked.

A common example of this mistake is using the % modulo operator when you have less than 256 possibilities (since a single byte has 256 possible values). Doing so actually makes lower values more likely to be picked than higher values.

For example, let's say that you have 36 possible random values - 0-9 plus every lowercase letter in a-z. A naive implementation might look something like this:

let randomCharacter = randomByte % 36;

That code is broken and insecure. With the code above, you essentially create the following ranges (all inclusive):

  • 0-35 stays 0-35.
  • 36-71 becomes 0-35.
  • 72-107 becomes 0-35.
  • 108-143 becomes 0-35.
  • 144-179 becomes 0-35.
  • 180-215 becomes 0-35.
  • 216-251 becomes 0-35.
  • 252-255 becomes 0-3.

If you look at the above list of ranges you'll notice that while there are 7 possible values for each randomCharacter between 4 and 35 (inclusive), there are 8 possible values for each randomCharacter between 0 and 3 (inclusive). This means that while there's a 2.64% chance of getting a value between 4 and 35 (inclusive), there's a 3.02% chance of getting a value between 0 and 3 (inclusive).

This kind of difference may look small, but it's an easy and effective way for an attacker to reduce the amount of guesses they need when bruteforcing something. And this is only one way in which you can make your random values insecure, despite them originally coming from a secure random source.

So, how do I obtain random values securely?

In Node.js:

  • If you need a sequence of random bytes: Use crypto.randomBytes.
  • If you need individual random numbers in a certain range: use crypto.randomInt.
  • If you need a random string: You have two good options here, depending on your needs.
    1. Use a v4 UUID. Safe ways to generate this are crypto.randomUUID, and the uuid library (only the v4 variant!).
    2. Use a nanoid, using the nanoid library. This also allows specifying a custom alphabet to use for your random string.

Both of these use a CSPRNG, and 'transform' the bytes in an unbiased (ie. secure) way.

In the browser:

  • When using the Node.js options, your bundler should automatically select equivalently safe browser implementations for all of these.
  • If not using a bundler:
    • If you need a sequence of random bytes: Use crypto.getRandomValues with a Uint8Array. Other array types will get you numbers in different ranges.
    • If you need a random string: You have two good options here, depending on your needs.
      1. Use a v4 UUID, with the crypto.randomUUID method.
      2. Use a nanoid, using the standalone build of the nanoid library. This also allows specifying a custom alphabet to use for your random string.

However, it is strongly recommended that you use a bundler, in general.

@serrand
Copy link

serrand commented Feb 20, 2019

I still don't get how uuidv4() is more secure than crypto.randomBytes(16).toString('hex')?

Who is using the modulo operator on single bytes and why?

@erikeckhardt
Copy link

Taking nested deps into account:

Query: "node-uuid@
6190711 node-uuid
...

It was specifically called out to use uuid, NOT node-uuid which has been deprecated.

@onury
Copy link

onury commented Sep 15, 2019

There are cases where you need to use crypto.randomBytes() and a uuid cannot be used.

For example, you need a fixed-bit key for encryption.
AES-256 requires a 256-bit key.
To generate a random key using hex encoding:

crypto.randomBytes(256 / 8).toString('hex')
// 4313894765b672473ae6afea85087ae3538a3272afb4831a1d3ac97a63d58e37

@qtangs
Copy link

qtangs commented Oct 2, 2020

@cozzbie
Copy link

cozzbie commented Jan 8, 2021

Been struggling to understand the bias section and how we came up with those numbers. 4 - 35? Why do we have 7 values and 8 at other times? Can someone please break down that section a little dumber for me?

@joepie91
Copy link
Author

joepie91 commented Jan 9, 2021

@cozzble Have a look at the following list of ranges, from the original post:

Selection_401

All of the ranges within the blue box (ie. 8 ranges) can produce an end result between 0 and 3 (inclusive). But only the ranges within the red box (ie. 7 ranges) can produce an end result between 4 and 35 (inclusive).

In other words, there are 8 opportunities for the resulting value to lie between 0 and 3 (inclusive), but only 7 opportunities for it to lie between 4 and 35 (inclusive). That's where the bias comes from.

@jeremybradbury
Copy link

jeremybradbury commented Jan 15, 2021

@joepie91 you may want to note that random-number-cspring is only for safely getting a non-zero-starting and non-byte divisible ending range with a cspring...

you can const hex = crypto.randomBytes(32).toString("hex") to get a safe random 64 char hex string representing 32 random bytes.

the bias is generally introduced when creating a decimal range.

you can also safely generate 0-255 randomly and safely like this... no extra deps.

const hex = crypto.randomBytes(1).toString("hex");
const dec = parseInt(parseInt(hex, 16).toString(10));

this is because any byte can hold up to 256 possibilities. if you try with 2 bytes it will hold 256*256 possibilities. the more data the more possibilities grow exponentially.

for example a 2 byte in C integer can hold 65536 (256*256) possible numbers. but half of those numbers are below 0

this is the reason you can't go past 32 bits in the positive direction... because JS's 64 bit number type goes negative, it even holds exponent data to really extend the range.

this is perfectly secure and unbiased. here we're not transforming our results... only converting bytes to other formats.

if you want to safely generate a random range without using any deps... you can do it yourself.

Here is an example that works well for small number ranges (refresh button on the mini browser to get a new number)... https://codesandbox.io/s/node-js-forked-twhud?file=/src/index.js

If you go up to 2 random bytes (0-65536) and keep 10-100 there are far more attempts. but they happen so quickly it's hard to notice a delay. random-number-cspring uses a mask to reduce the attempt count. but really unbiased number generation in a range is not more complicated than this.

it's not rocket science... it's simply retrying until you get something that fits the criteria.

EDIT: feel free to use fork/edit the code in the sandbox openly or commercially there's no license.

@cozzbie
Copy link

cozzbie commented Jan 15, 2021

@cozzble Have a look at the following list of ranges, from the original post:

Selection_401

All of the ranges within the blue box (ie. 8 ranges) can produce an end result between 0 and 3 (inclusive). But only the ranges within the red box (ie. 7 ranges) can produce an end result between 4 and 35 (inclusive).

In other words, there are 8 opportunities for the resulting value to lie between 0 and 3 (inclusive), but only 7 opportunities for it to lie between 4 and 35 (inclusive). That's where the bias comes from.

Thanks a lot @joepie91. This helped.

@audionerd
Copy link

Related: n-digit-token:

This tiny module generates an n-digit cryptographically strong pseudo-random token in constant time and avoids modulo bias.

https://github.com/almasen/n-digit-token

@joshxyzhimself
Copy link

Wrote this

/* eslint-disable no-constant-condition */

const util = require('util');
const assert = require('assert');
const crypto = require('crypto');

const random_bytes = util.promisify(crypto.randomBytes);

/**
 * Random integer between 1 to 256
 *
 * @param {number} max
 * @returns {number}
 */
const random_integer = async (max) => {
  assert(typeof max === 'number');
  assert(Number.isFinite(max) === true);
  assert(Number.isInteger(max) === true);
  assert(max > 1);
  assert(max <= 256);
  const range = max;
  const range_floor = Math.floor(256 / range);
  const range_product = range * range_floor;
  while (true) {
    const bytes = await random_bytes(1);
    const byte = bytes[0];
    if (byte < range_product) {
      let value = byte + 1;
      if (value > range) {
        const remainder = byte % range;
        value = remainder === 0 ? range : remainder;
      }
      return value;
    }
  }
};

process.nextTick(async () => {
  const random_int = await random_integer(256);
  console.log({ random_int });
});

Example distribution

  max: 256,
  range: 256,
  iterations: 100000,
  hits: {
    '1': 363,
    '2': 399,
    '3': 375,
    '4': 439,
    '5': 344,
    '6': 385,
    '7': 379,
    '8': 376,
    '9': 401,
    '10': 343,
    '11': 395,
    '12': 374,
    '13': 391,
    '14': 410,
    '15': 404,
    '16': 429,
    '17': 414,
    '18': 395,
    '19': 397,
    '20': 377,
    '21': 387,
    '22': 394,
    '23': 366,
    '24': 395,
    '25': 401,
    '26': 404,
    '27': 406,
    '28': 386,
    '29': 413,
    '30': 372,
    '31': 367,
    '32': 370,
    '33': 406,
    '34': 383,
    '35': 387,
    '36': 372,
    '37': 377,
    '38': 361,
    '39': 400,
    '40': 407,
    '41': 390,
    '42': 397,
    '43': 383,
    '44': 425,
    '45': 366,
    '46': 390,
    '47': 407,
    '48': 391,
    '49': 394,
    '50': 371,
    '51': 351,
    '52': 380,
    '53': 387,
    '54': 385,
    '55': 405,
    '56': 409,
    '57': 389,
    '58': 366,
    '59': 357,
    '60': 401,
    '61': 424,
    '62': 371,
    '63': 383,
    '64': 420,
    '65': 382,
    '66': 366,
    '67': 430,
    '68': 383,
    '69': 376,
    '70': 394,
    '71': 418,
    '72': 403,
    '73': 405,
    '74': 387,
    '75': 430,
    '76': 422,
    '77': 429,
    '78': 400,
    '79': 386,
    '80': 371,
    '81': 382,
    '82': 350,
    '83': 368,
    '84': 437,
    '85': 357,
    '86': 417,
    '87': 391,
    '88': 379,
    '89': 414,
    '90': 394,
    '91': 406,
    '92': 418,
    '93': 402,
    '94': 367,
    '95': 410,
    '96': 359,
    '97': 380,
    '98': 397,
    '99': 407,
    '100': 374,
    '101': 381,
    '102': 386,
    '103': 430,
    '104': 373,
    '105': 400,
    '106': 433,
    '107': 372,
    '108': 390,
    '109': 385,
    '110': 404,
    '111': 375,
    '112': 368,
    '113': 427,
    '114': 373,
    '115': 377,
    '116': 384,
    '117': 384,
    '118': 394,
    '119': 386,
    '120': 388,
    '121': 384,
    '122': 386,
    '123': 385,
    '124': 401,
    '125': 384,
    '126': 375,
    '127': 370,
    '128': 405,
    '129': 411,
    '130': 370,
    '131': 372,
    '132': 408,
    '133': 390,
    '134': 362,
    '135': 377,
    '136': 364,
    '137': 390,
    '138': 403,
    '139': 410,
    '140': 421,
    '141': 400,
    '142': 395,
    '143': 414,
    '144': 349,
    '145': 390,
    '146': 402,
    '147': 390,
    '148': 389,
    '149': 372,
    '150': 380,
    '151': 440,
    '152': 371,
    '153': 392,
    '154': 405,
    '155': 407,
    '156': 408,
    '157': 373,
    '158': 423,
    '159': 382,
    '160': 387,
    '161': 396,
    '162': 416,
    '163': 375,
    '164': 362,
    '165': 415,
    '166': 398,
    '167': 386,
    '168': 361,
    '169': 391,
    '170': 378,
    '171': 368,
    '172': 427,
    '173': 375,
    '174': 375,
    '175': 394,
    '176': 385,
    '177': 363,
    '178': 396,
    '179': 367,
    '180': 393,
    '181': 393,
    '182': 424,
    '183': 388,
    '184': 397,
    '185': 397,
    '186': 386,
    '187': 407,
    '188': 399,
    '189': 408,
    '190': 389,
    '191': 378,
    '192': 422,
    '193': 361,
    '194': 391,
    '195': 384,
    '196': 380,
    '197': 414,
    '198': 381,
    '199': 414,
    '200': 369,
    '201': 379,
    '202': 382,
    '203': 402,
    '204': 382,
    '205': 400,
    '206': 390,
    '207': 396,
    '208': 414,
    '209': 401,
    '210': 397,
    '211': 401,
    '212': 349,
    '213': 379,
    '214': 397,
    '215': 366,
    '216': 402,
    '217': 408,
    '218': 392,
    '219': 370,
    '220': 356,
    '221': 399,
    '222': 433,
    '223': 411,
    '224': 406,
    '225': 408,
    '226': 384,
    '227': 380,
    '228': 365,
    '229': 379,
    '230': 372,
    '231': 387,
    '232': 392,
    '233': 383,
    '234': 418,
    '235': 434,
    '236': 371,
    '237': 383,
    '238': 386,
    '239': 419,
    '240': 402,
    '241': 403,
    '242': 383,
    '243': 381,
    '244': 381,
    '245': 393,
    '246': 368,
    '247': 419,
    '248': 387,
    '249': 399,
    '250': 391,
    '251': 390,
    '252': 354,
    '253': 410,
    '254': 376,
    '255': 351,
    '256': 412
  }
}

@crowjdh
Copy link

crowjdh commented Mar 25, 2021

It's quite concerning not to illustrate why those libraries are safer than others. I was looking for the reason why uuid and random-number-csprng are safer.
No one should read this article and just think "well, I'm gonna use these libraries because some post on the internet said so."

@jeremybradbury
Copy link

jeremybradbury commented Mar 29, 2021

@joshxyzhimself what are you attempting to do here? you created 256 random numbers that don't fit in a requested range. These are also biased using the very method this gist aims to solve: transforming numbers using the modulus operator (and/or multiplication/division).

if (byte < range_product) {
      let value = byte + 1;
      if (value > range) {
        const remainder = byte % range;
        value = remainder === 0 ? range : remainder;
      }
      return value;
    }

it still took 100,000 iterations to come up with these, and they don't fit into an input range like: 1-255... they just create the amount of requested randoms... but in what range? looks like you're using a range of 256 to 512, based on the input.... all of these result are above the requested maximum of 256.

@crowjdh there is a very easy way to do this without any libraries. the issue comes when looking for ranges of random, by transforming the random result (as i explained above), rather than simply trying again when the random is not in range. I broke down the topic pretty well above and left a sandbox example https://gist.github.com/joepie91/7105003c3b26e65efcea63f3db82dfba#gistcomment-3595666. Since then I have created a super efficient version that chooses the array size based on the input range.

You need the lib to prevent you from making obvious mistakes, if you understand the right way to implement it, you don't need a lib. Here's my Browser implementation, for node you'll need to rework slightly as the 2 crypto APIs differ slightly (see my comment above for a link to a node sandbox version), all of this free to reuse anywhere:

const getCSPRNG = (min, max) => {
      // validate
      if (min < 0) {
        return console.error("min must be at least 0");
      }
      if (min > max) {
        return console.error("min must be less than max");
      }
      let array;
      // choose the correct array type
      switch (true) {
        case max < 1:
          console.error("max must be at least 1");
          return;
        case max < 256: // 1-255
          array = new Uint8Array(1);
          break;
        case max < 65536: // 256-65535
          array = new Uint16Array(1); // default
          break;
        case max < 4294967296: // 65536-4294967295
          array = new Uint32Array(1);
          break;
        default:
          console.error("max must be no more than 4294967295");
          return;
      }
      let randomNumber = crypto.getRandomValues(array)[0];
      let attempts = 1;
      while (randomNumber > max || randomNumber < min) {
        randomNumber = crypto.getRandomValues(array)[0]; // retry until result is in range
        attempts++; // track failed generations
      }
      console.info({
        randomNumber,
        attempts,
      });
      return randomNumber;
    };

if "crypto safe" isn't exactly what you need, just "better random numbers with less repetition", there is another simpler method below.

again in this case, written for browser crypto api.

// first make math random use crypto instead
    Math.random = crypto.getRandomValues(new Uint32Array(1))[0] / 4294967295;

// typical biased range 
    const getMathRandomRange = (min, max) => {
      min = Math.ceil(min); // minimum is inclusive
      max = Math.floor(max); // maximum is inclusive
      return Math.floor(Math.random() * (max - min + 1)) + min;
    };

@bipinstha7
Copy link

For the API keys or secret keys we can use crypto.pbkdf2(secret, salt, iteration, length, algorithm) with just single iteration. We can use different secret and salt according to our needs and different length. As it is cryptographically generated string values, just single iteration is sufficient so performance is not a problem and secret & salt makes it more random and unique.

@joepie91
Copy link
Author

@bipinstha7 This is bad practice. Never derive "random" keys from any other kind of data - it will never make it more secure than truly random data, all it can do is weaken the unpredictability of the key. "Cryptographically generated" also means nothing if you're using the wrong cryptographic tool for the job.

The only correct way to obtain a secure random value is from a cryptographically-secure random source, not a key derivation function.

@bipinstha7
Copy link

@bipinstha7 This is bad practice. Never derive "random" keys from any other kind of data - it will never make it more secure than truly random data, all it can do is weaken the unpredictability of the key. "Cryptographically generated" also means nothing if you're using the wrong cryptographic tool for the job.

The only correct way to obtain a secure random value is from a cryptographically-secure random source, not a key derivation function.

@joepie91 Thank you for correcting me. I wasn't aware of its weakness. Could you please point us about how it weakens the predictability despite having secret key and salt?
UUIDv4 is one of the best and nanoid looks promising. Beyond these tools, what could be the native way for the API keys or secret keys?

@joepie91
Copy link
Author

@bipinstha7 So there's a general principle in cryptography: never add more complexity than is actually needed to effectively solve the problem. This is because every bit of complexity you add, increases the chance of failure; whether it's because of a bug in the implementation, or because you put the pieces together incorrectly. And because a single tiny cryptographic failure can compromise an entire system, it's crucial to reduce this risk to the absolute minimum.

It's for this same reason that when you see a cryptographic tool advertising itself with how complex it is ("triple hashing", "double encryption", "proprietary algorithm", etc.), you should treat it as completely insecure. When someone presents complexity as a feature, that's a sure sign that they don't understand how to handle risk in cryptography. More about that and related topics here.

But to get back to your case: consider that in your approach, you still need to have a secret key, which needs to be randomly generated, and a salt, which also needs to be randomly generated. So in the end, you still need a random source, you're just adding a bunch of cryptographic complexity on top of that. That means more things that could break.

Consider, for example, what would happen if a weakness were found in the key derivation function that allows someone with 10 outputs (with 10 different salts) to determine the original secret key. Suddenly, your key security got a lot weaker; a problem that would not have existed if you had just used a CSPRNG with a simple encoding scheme.

You're not avoiding the encoding scheme either - you say that it produces "string values", but it doesn't, at least not originally! Like all cryptographic operations, PBKDF2 operates on series of bytes, and produces a series of bytes too. It likely just gets encoded (hex-encoded, base64-encoded, etc.) before it is returned to you.

So why not just take a random string from the CSPRNG, and encode it in some simple and non-biased way, completely skipping the whole "cryptographic key derivation with a salt" dance inbetween those two steps? And that is indeed exactly what UUID and nanoid implementations do.

You may also find this a useful read.

@bipinstha7
Copy link

@joepie91 Thank you, that's precious information. You cleared the misunderstanding I had with cryptography. Thank you :)

@hhamilto
Copy link

hhamilto commented Jun 10, 2021

@Joepie I'm not able to derive the same percentages that you quote regarding the relative chance of generating a certain number mod 36:

2.64% chance of getting a value between 4 and 35 (inclusive), there's a 3.02% chance of getting a value between 0 and 3 (inclusive).

By my calculation the numbers should be a ~2.73% chance of getting a value between 4 and 35 (inclusive) and a ~3.13% chance of getting a value between 0 and 3 inclusive.

I calculate that there for 0-3 range, we can say there are 8 chances of getting that value out of a total of 256 or 0.03125, and for the 4-35 range we have 7 chances out of 256 or 0.02734375.

This is also borne out empirically by this test code:

const crypto = require("crypto")

// Create an array of 0's to hold a count of the number of times a given number is observed
counts = [];
for (let i = 0; i < 36; i++) {
    counts[i] = 0;
}

// Do insecure modulo method NUM_OBSERVATIONS_TO_MAKE times and count how many times each number is observed
const NUM_OBSERVATIONS_TO_MAKE = 1000000
for (let i = 0; i < NUM_OBSERVATIONS_TO_MAKE; i++) {
    const randomByte = crypto.randomBytes(1).readUInt8(0);
    const randomNumber = randomByte%36;
    counts[randomNumber]++;
}

// Calculate the ratio of the number of times a number was observed out of the total number of observations made
const ratios = counts.map(c => c / NUM_OBSERVATIONS_TO_MAKE)
console.log(ratios[0]) // Approximately 0.03125
console.log(ratios[5]) // Approximately 0.02734375

Very possible I'm missing something about we are supposed to derive these numbers!

I also realize that this doesn't affect your higher level points, but I think accurate numbers are also important.

Thank you for taking the time to put this post together!

@harryadel
Copy link

harryadel commented Nov 12, 2021

import crypto from 'crypto';
crypto.randomUUID();

https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID

@emirsavran
Copy link

I'm thinking of using 32 bytes randomBytes for raffle tickets. Is it safe to use? I will order them and take first 100. My concern is can ordering make it biased?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment