Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@neizod
Created September 30, 2019 23:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save neizod/038b3cc481ae3fff88a61843812cd7a6 to your computer and use it in GitHub Desktop.
Save neizod/038b3cc481ae3fff88a61843812cd7a6 to your computer and use it in GitHub Desktop.
Mersenne Twister
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Mersenne Twister, the Pseudorandom Number Generator Algorithm for Modern Computer\n",
"\n",
"- Author: Nattawut Phetmak\n",
"- Date: 2019-09-29\n",
"- Copyright © 2019\n",
"\n",
"---\n",
"\n",
"Mersenne Twister is a pseudorandom number generator (PRNG) that generate an integer $x$ such that $0 \\le x \\lt 2^{32}$. It is developed by Makoto Matsumoto and Takuji Nishimura in 1997. With a standard variant of MT-19937, which has random length of $2^{19937}-1$, a Mersenne prime number.\n",
"\n",
"Since the output of the algorithm looks random enough by statistical test. And it cover up all mistakes from other PRNGs. It is wildly use in modern computer nowaday, despite being designed for 20 years.\n",
"\n",
"In this notebook, we will deassemble the algorithm and study what it is doing. The layout of the notebook contains 4 sections. We dedicate the first section to recap on how computer represent number in bits, and necessary bitwise operations that will be used through out the notebook. Mersenne Twister from 1997 is discussed in section two, and improved version from 2002 in the section three. Finally, the final section will discuss some statistical test on the algorithm.\n",
"\n",
"First, we import everything necessary for this notebook."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from types import SimpleNamespace\n",
"from itertools import cycle, islice\n",
"from collections import Counter\n",
"from statistics import mean, stdev\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Integer Representation in Computer\n",
"\n",
"Data in computer is just a series of bits (each can be 0 or 1). By nowaday standard, we let 1 byte consist of 8 bits. Other common length are 4 bytes (32 bits) and 8 bytes (64 bits).\n",
"\n",
"An integer in computer is represent with a binary number (number in base 2). For example, if we want to store a value 29 (in base 10) into 1 byte slot, first we convert it to a binary number 1_1101 (underscore for easier reading), then we padded the output with zeros to 0001_1101, and finally we store this value into that slot.\n",
"\n",
"But if a number is larger than our allocated slot, most value bits from the left will be discarded. For example, 333 (base 10) has binary representation as 1_0100_1101, which will be truncated to 0100_1101 if we attempt to store this number into 1 byte slot. We can still store this number without loss of information in larger slot, such as in a 4 bytes slot, however.\n",
"\n",
"Negative integers can be store and use in computer by some magic conversion, but we will not discuss and use them in this notebook.\n",
"\n",
"Now, we will play around with these (non negative) integers in NumPy to get familiar with them. We force it's type to be __unsigned integer 8-bits__ (`uint8`)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0 1 163 255]\n",
"[ 77 157 224 42]\n"
]
}
],
"source": [
"xs = np.array([ 0, 1, 163, 255], dtype=np.uint8)\n",
"ys = np.array([333, -99, 224, 42], dtype=np.uint8)\n",
"\n",
"print(xs)\n",
"print(ys) # observe that 333 and -99 is converted to 77 and 157, respectively."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also view it's binary representation with string format, here's a help on that task."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"xs = ['00000000', '00000001', '10100011', '11111111']\n",
"ys = ['01001101', '10011101', '11100000', '00101010']\n"
]
}
],
"source": [
"def view_binary(ns):\n",
" return [f'{n:08b}' for n in ns]\n",
"\n",
"print('xs =', view_binary(xs))\n",
"print('ys =', view_binary(ys))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First operation (that we already familiar with) is __addition__. Keep in mind that exceed bits (carry) after computation will be discard."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"add = ['01001101', '10011110', '10000011', '00101001']\n"
]
}
],
"source": [
"print('add =', view_binary(xs + ys))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To __subtract__, just convert the negative operand (right operand) to positive, and then do adding."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ys = ['01001101', '10011101', '11100000', '00101010']\n",
"-ys = ['10110011', '01100011', '00100000', '11010110']\n",
" xs = ['00000000', '00000001', '10100011', '11111111']\n",
"res = ['10110011', '01100100', '11000011', '11010101']\n",
"sub = ['10110011', '01100100', '11000011', '11010101']\n"
]
}
],
"source": [
"print(' ys =', view_binary(ys))\n",
"print('-ys =', view_binary(-ys))\n",
"print(' xs =', view_binary(xs))\n",
"print('res =', view_binary(xs + (-ys)))\n",
"print('sub =', view_binary(xs - ys))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A __bit shift__ (`<<` and `>>`) is a simple operation that shift bit to the left or right (and discard any exceed bits), see the examples."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<<2 = ['00110100', '01110100', '10000000', '10101000']\n",
"ys = ['01001101', '10011101', '11100000', '00101010']\n",
">>2 = ['00010011', '00100111', '00111000', '00001010']\n"
]
}
],
"source": [
"print('<<2 =', view_binary(ys << 2))\n",
"print('ys =', view_binary(ys))\n",
"print('>>2 =', view_binary(ys >> 2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now comes the unfamiliar operation __bitwise or__ (`|`). To compute the result, take a look at each pair of bits from left and right operand, if both bits has value of 0, then that result bit is also 0. Otherwise the result bit will be 1."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"xs = ['00000000', '00000001', '10100011', '11111111']\n",
"ys = ['01001101', '10011101', '11100000', '00101010']\n",
"OR = ['01001101', '10011101', '11100011', '11111111']\n"
]
}
],
"source": [
"print('xs =', view_binary(xs))\n",
"print('ys =', view_binary(ys))\n",
"print('OR =', view_binary(xs | ys))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar operation is __bitwise exclusive or__ (`^`), this time only one bit must be 1 to make the result bit 1. Both 1 or both 0 will result in 0."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" xs = ['00000000', '00000001', '10100011', '11111111']\n",
" ys = ['01001101', '10011101', '11100000', '00101010']\n",
"XOR = ['01001101', '10011100', '01000011', '11010101']\n"
]
}
],
"source": [
"print(' xs =', view_binary(xs))\n",
"print(' ys =', view_binary(ys))\n",
"print('XOR =', view_binary(xs ^ ys))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly is __bitwise and__ (`&`), if any bit is 0, then the result is 0. We can also view this operation as masking, given data as first operand and a mask as second (or vice versa), the result is to get a bit from data if a bit in the mask is 1."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" xs = ['00000000', '00000001', '10100011', '11111111']\n",
" ys = ['01001101', '10011101', '11100000', '00101010']\n",
"AND = ['00000000', '00000001', '10100000', '00101010']\n"
]
}
],
"source": [
"print(' xs =', view_binary(xs))\n",
"print(' ys =', view_binary(ys))\n",
"print('AND =', view_binary(xs & ys))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We've cover all basic operations that will be use throughout the notebook. From now on, we will use NumPy's __unsigned integer 32-bits__ (`uint32`) only.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mersenne Twister 19937, version 1997\n",
"\n",
"We first state list of magic numbers for this algorithm. (These numbers are unchange in version 2002 too.)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"SIZE = 624\n",
"SKIP = 397\n",
"LEFT, RIGHT = 0x80000000, 0x7FFFFFFF\n",
"MAGNITUDE = 0x9908B0DF\n",
"\n",
"SHIFT_1, MASK_1 = 11, 0xFFFFFFFF\n",
"SHIFT_2, MASK_2 = 7, 0x9D2C5680\n",
"SHIFT_3, MASK_3 = 15, 0xEFC60000\n",
"SHIFT_4, MASK_4 = 18, 0xFFFFFFFF"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We build a lookup array of 624 elements (`SIZE`), each store 32 bits integer. This array is for keeping the seed and its expansion."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"state = SimpleNamespace()\n",
"state.array = np.zeros(SIZE, dtype=np.uint32)\n",
"state.index = SIZE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Though the seed can be vary from 0 to $2^{32}-1$, if we continue to use a single seed, the length of the pseudorandom algorithm will be at most $2^{32}$ before repeating. So we have to expand this seed to prolong the length. The strategy for seed expansion is to use previous seed multiply by some magic constant and store this new seed. We expand it for 623 remaining slot in the lookup array. (And set flag that the array are not processed and can not be extracted a random number, yet.)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def expand_seed(seed):\n",
" state.array[0] = seed\n",
" for i in range(1, SIZE):\n",
" state.array[i] = np.uint32(state.array[i-1] * 69069)\n",
" state.index = SIZE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Heres we view first 30 seeds from the array after expansion. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 4357 300933633 1838352333 1039116329 1822211541 2902250641\n",
" 835884317 743498041 2037202853 140269601 3129818989 3468776265\n",
" 3042141813 3597794481 2144163517 462622297 2597716549 3720499777\n",
" 3305777933 2019631721 1995498261 1568860369 1886915677 891264889\n",
" 3303332069 589975649 2673363629 1613469065 3473388469 3774879985]\n"
]
}
],
"source": [
"expand_seed(4357)\n",
"print(state.array[:30])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we will define the strategy for twisting array. This intend to prevent someone trying to reversing current seed array back to previous seed array. The overview of this routine is to look at each seed in the array and compute a new value for it. To compute a new seed value at that index, use that seed, a seed next to it, and jump 397 step forward to another seed (`SKIP`), then combine these 3 seeds and replace the old one."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def twist():\n",
" for i in range(SIZE):\n",
" succ = (i + 1) % SIZE\n",
" skip = (i + SKIP) % SIZE\n",
" state.array[i] = (state.array[i] & LEFT) | (state.array[succ] & RIGHT)\n",
" state.array[i] >>= 1\n",
" state.array[i] ^= state.array[skip] ^ (state.array[succ] % 2) * MAGNITUDE\n",
" state.index = 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, these are first 30 seeds from the array after twisting."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2454627182 1332197252 1496154258 2811530864 2431206102 3291005788\n",
" 2620758058 272633368 1483018782 1885656372 1360407554 3856790784\n",
" 390338790 436405036 940293242 2796748872 204531470 3422570148\n",
" 1879591794 4122534096 565337142 3238795964 434740682 2509097720\n",
" 4140411902 4122039316 3720575394 557386016 3109380614 506878412]\n"
]
}
],
"source": [
"expand_seed(4357)\n",
"twist()\n",
"print(state.array[:30])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Result array from twisting might look random enough, but the algorithm does not return each element from this array right away. It does take each element and obfuscate it with shifts and masks before outputting it as a random number. The obfuscate process is intends to be a one way function, so someone can not reverse the result back to seed array and guess the next random number. (It fail to hide this information, however.)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def random_uint32():\n",
" if state.index == SIZE:\n",
" twist()\n",
" result = state.array[state.index]\n",
" state.index += 1\n",
" result ^= (result >> SHIFT_1) & MASK_1\n",
" result ^= (result << SHIFT_2) & MASK_2\n",
" result ^= (result << SHIFT_3) & MASK_3\n",
" result ^= (result >> SHIFT_4) & MASK_4\n",
" return np.uint32(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now we have a Mersenne Twister! Heres are first 30 random result from seed = 4357."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[3510405877 4290933890 2191955339 564929546 152112058 4262624192\n",
" 2687398418 268830360 1763988213 578848526 4212814465 3596577449\n",
" 4146913070 950422373 1908844540 1452005258 3029421110 142578355\n",
" 1583761762 1816660702 2530498888 1339965000 3874409922 3044234909\n",
" 1962617717 2324289180 310281170 981016607 908202274 3371937721]\n"
]
}
],
"source": [
"expand_seed(4357)\n",
"print(np.array([random_uint32() for _ in range(30)]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Back to the magic numbers, these numbers are choosen to ensure length of the PRNG. So by changing these numbers, the PRNG length might be shorten."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# Exercise 1\n",
"# ==========\n",
"#\n",
"# What if we change the magic number SHIFT_4 to zero?\n",
"# Try getting first few random number from it,\n",
"# and guess why this happen?\n",
"#\n",
"# Caution: don't forget to change this value back to 18,\n",
"# before proceed next chapter.\n",
"\n",
"\n",
"#SHIFT_4 = 0\n",
"#[random_uint32() for _ in range(30)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mersenne Twister 19937, version 2002\n",
"\n",
"Observe the fatal failure when we input seed = 0 into the former version."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n",
"[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n"
]
}
],
"source": [
"expand_seed(0)\n",
"print(state.array[:30])\n",
"print(np.array([random_uint32() for _ in range(30)]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What just happen? Why the algorithm does not random anything but spit out zeros forever? Well, if we take a look at the code, we'll see that every step manipulate seed by add, sub, mul, and, xor, shift. All these operations gives 0 if input is 0. So if we send in 0 as a seed, it will surely fail.\n",
"\n",
"To fix this, the seed expansion is designed to be more complex, but first we define a special way to fetch an element in the array."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"def special_fetch_array(i):\n",
" return state.array[i] ^ (state.array[i] >> 30)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can build up the seed expansion again, this time we add a loop index into each element to ensure the entire array will not be zeros like before."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"def expand_seed_fixed(seed):\n",
" state.array[0] = seed\n",
" for i in range(1, SIZE):\n",
" state.array[i] = np.uint32(special_fetch_array(i-1) * 1812433253)\n",
" state.array[i] += i\n",
" state.index = SIZE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look, the array is now fixed!"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0 1 1812433255 1900727105 1208447044 2481403966\n",
" 4042607538 337614300 3232553940 1018809052 3202401494 1775180719\n",
" 3192392114 594215549 184016991 829906058 610491522 3879932251\n",
" 3139825610 297902587 4075895579 2943625357 3530655617 1423771745\n",
" 2135928312 2891506774 1066338622 135451537 933040465 2759011858]\n"
]
}
],
"source": [
"expand_seed_fixed(0)\n",
"print(state.array[:30])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also introduce new way to initialize seed array, by inputting multiple seeds at the beginning. The strategy is to combine every input seeds into every elements the array, if the seeds run out then we reuse the seeds again.\n",
"\n",
"Implementation details, from `itertools` module, we use `cycle` function to get elements from a list over and over, and `islice` function tell how many elements to take from that (repeating) list. "
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"def expand_seeds(seeds):\n",
" expand_seed_fixed(19650218)\n",
" loop_pool = cycle(range(1, SIZE))\n",
" seed_pool = cycle(enumerate(seeds))\n",
" pred = 0\n",
" for i in islice(loop_pool, max(SIZE, len(seeds))):\n",
" state.array[i] ^= np.uint32(special_fetch_array(pred) * 1664525)\n",
" state.array[i] += sum(next(seed_pool))\n",
" pred = i\n",
" for i in islice(loop_pool, SIZE-1):\n",
" state.array[i] ^= np.uint32(special_fetch_array(pred) * 1566083941)\n",
" state.array[i] -= i\n",
" pred = i\n",
" state.array[0] = 0x80000000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we have a final random algorithm! Here are some random number with that initial seeds."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1067595299 955945823 477289528 4107218783 4228976476 3344332714\n",
" 3355579695 227628506 810200273 2591290167 2560260675 3242736208\n",
" 646746669 1479517882 4245472273 1143372638 3863670494 3221021970\n",
" 1773610557 1138697238 1421897700 1269916527 2859934041 1764463362\n",
" 3874892047 3965319921 72549643 2383988930 2600218693 3237492380]\n"
]
}
],
"source": [
"expand_seeds([0x123, 0x234, 0x345, 0x456])\n",
"print(np.array([random_uint32() for _ in range(30)]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These conclude the algorithm for Mersenne Twister. Also observe that Python's Random will not gives exactly same random sequence despite given the same initial seeds. This happened becase Python pad extra seed and modify output to suite it's 64-bits designed.\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"# Exercise 2\n",
"# ==========\n",
"#\n",
"# With these given seeds, use expand_seeds and observe state.array:\n",
"# 1. seeds = [42]\n",
"# 2. seeds = [42, 41, 40]\n",
"# We see that the result state.array are the same,\n",
"# which will surely leads to the same sequence of random_uint32().\n",
"# Are there any other seeds that result with similar phenomenon?\n",
"\n",
"\n",
"#expand_seeds([42])\n",
"#print(state.array[:30])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Some Statistical Test on Mersenne Twister\n",
"\n",
"There are many tests to test if a PRNG is a good random. Here we show a few simple tests.\n",
"\n",
"We first obtain some random data with:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"expand_seeds([4, 8, 15, 16, 23, 42])\n",
"raw_data = [random_uint32() for _ in range(8)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Right now result data is a list of 32-bit integer, which are a little bit too hard to use. We will convert the data to a list of binary bit."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1]\n"
]
}
],
"source": [
"data = [int(bit) for x in raw_data for bit in f'{x:032b}']\n",
"print(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see some summaries on the data. "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total bits: 256\n",
"count 0: 118\n",
"count 1: 138\n",
"mean: 0.5390625\n",
"stdev: 0.4994482249560656\n"
]
}
],
"source": [
"counter = Counter(data)\n",
"print(f'total bits: {len(data)}')\n",
"print(f'count 0: {counter[0]}')\n",
"print(f'count 1: {counter[1]}')\n",
"print(f'mean: {mean(data)}')\n",
"print(f'stdev: {stdev(data)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hypothesis Test Random Likely\n",
"\n",
"A good PRNG should produce number of bit 0 and 1 (close to or) equally. So we state null hypothesis that $Pr(0) = Pr(1) = 0.5$. In hypothesis we already know mean and s.d.; $\\mu = 0.5$ and $\\sigma = 0.5$ respectively. So we compute z-score as follow:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"z-score: 1.25\n"
]
}
],
"source": [
"mu = 0.5\n",
"sigma = 0.5\n",
"z = (mean(data) - mu) / (sigma / len(data)**0.5)\n",
"print(f'z-score: {z}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The z-score of 1.25 tells that our random data is above standard mean by 1.25 s.d. By looking up the table, area under standard curve in interval $[-1.25, 1.25]$ is about 79%. So the remaining area, in other word the p-value, is at 21%.\n",
"\n",
"Recalling significant level, it is a probability of rejecting null hypothesis when it is actually true (false positive error). So if we set our significant level at 5%, we __can not__ reject the null hypothesis. However, if we set the significant level to 25%\\*, we can reject the null hypothesis and say that this PRNG is bias.\n",
"\n",
"\\* Note: in practical, common significant level are 10%, 5%, and 1%. The number 25% is too high and should not be used."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Group Distribution\n",
"\n",
"We can't only test equal distribution of 0 and 1, but we must also take it as consecutive bits. We define routine for getting group of bits as:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"def group(data, size):\n",
" ls = [''.join(map(str, data[i:i+size])) for i in range(0, len(data), size)]\n",
" if len(ls[0]) != len(ls[-1]):\n",
" del ls[-1]\n",
" return ls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets plot the data, bit by bit."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<BarContainer object of 2 artists>"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAMxklEQVR4nO3cb4xl9V3H8fdHRqzUNEB3IHR346LZoNhoSiYUbaJNV+PSNl0elARidFM32RipVqspi03kUROIpmgTJVkLsk0QSrCGja1/yEpDTAo60JZ/W2RDcZnuyl5CwT9NrKtfH8xZcx3uMHPvuXcGfrxfyebe8zvn3Pt9MHnvyZm5N1WFJKkt37PZA0iSps+4S1KDjLskNci4S1KDjLskNWhuswcA2LJlS+3YsWOzx5CkN5RHHnnkxaqaH7XvdRH3HTt2sLi4uNljSNIbSpJ/Xm2ft2UkqUHGXZIaZNwlqUFrxj3J7UlOJXlixL7fTlJJtnTbSfKZJMeSPJbkslkMLUl6beu5cr8D2L1yMcl24OeA40PLVwI7u3/7gVv7jyhJGteaca+qB4GXRuy6BfgEMPzNY3uAz9Wyh4Bzk1w0lUklSes20T33JB8CvlVVX1+xayvw/ND2Urc26jX2J1lMsjgYDCYZQ5K0irHjnuQc4JPA747aPWJt5HcKV9XBqlqoqoX5+ZF/gy9JmtAkH2L6YeBi4OtJALYBjya5nOUr9e1Dx24DTvQdUpI0nrHjXlWPAxec2U7yHLBQVS8mOQx8NMndwLuBV6rq5LSGHWXHgS/O8uX1BvfcTR/Y7BGkTbGeP4W8C/gKcEmSpST7XuPwLwHPAseAPwF+dSpTSpLGsuaVe1Vdu8b+HUPPC7iu/1iSpD78hKokNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNci4S1KDjLskNWjNuCe5PcmpJE8Mrf1ekm8keSzJXyQ5d2jfDUmOJXk6yc/PanBJ0urWc+V+B7B7xdr9wDur6seBfwJuAEhyKXAN8GPdOX+c5KypTStJWpc1415VDwIvrVj726o63W0+BGzrnu8B7q6q/6yqbwLHgMunOK8kaR2mcc/9l4G/6p5vBZ4f2rfUrb1Kkv1JFpMsDgaDKYwhSTqjV9yTfBI4Ddx5ZmnEYTXq3Ko6WFULVbUwPz/fZwxJ0gpzk56YZC/wQWBXVZ0J+BKwfeiwbcCJyceTJE1iorgn2Q1cD/xMVX1naNdh4M+SfBp4B7AT+IfeU0pvcDsOfHGzR9Dr1HM3fWAmr7tm3JPcBbwX2JJkCbiR5b+O+T7g/iQAD1XVr1TVk0nuAZ5i+XbNdVX13zOZXJK0qjXjXlXXjli+7TWO/xTwqT5DSZL68ROqktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktSgNeOe5PYkp5I8MbR2fpL7kzzTPZ7XrSfJZ5IcS/JYkstmObwkabT1XLnfAexesXYAOFJVO4Ej3TbAlcDO7t9+4NbpjClJGseaca+qB4GXVizvAQ51zw8BVw2tf66WPQScm+SiaQ0rSVqfSe+5X1hVJwG6xwu69a3A80PHLXVrr5Jkf5LFJIuDwWDCMSRJo0z7F6oZsVajDqyqg1W1UFUL8/PzUx5Dkt7cJo37C2dut3SPp7r1JWD70HHbgBOTjydJmsSkcT8M7O2e7wXuG1r/pe6vZq4AXjlz+0aStHHm1jogyV3Ae4EtSZaAG4GbgHuS7AOOA1d3h38JeD9wDPgO8JEZzCxJWsOaca+qa1fZtWvEsQVc13coSVI/fkJVkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhpk3CWpQcZdkhrUK+5JfjPJk0meSHJXkrckuTjJw0meSfL5JGdPa1hJ0vpMHPckW4FfBxaq6p3AWcA1wM3ALVW1E/g2sG8ag0qS1q/vbZk54PuTzAHnACeB9wH3dvsPAVf1fA9J0pgmjntVfQv4feA4y1F/BXgEeLmqTneHLQFbR52fZH+SxSSLg8Fg0jEkSSP0uS1zHrAHuBh4B/BW4MoRh9ao86vqYFUtVNXC/Pz8pGNIkkboc1vmZ4FvVtWgqv4L+ALwU8C53W0agG3AiZ4zSpLG1Cfux4ErkpyTJMAu4CngAeDD3TF7gfv6jShJGlefe+4Ps/yL00eBx7vXOghcD3w8yTHg7cBtU5hTkjSGubUPWV1V3QjcuGL5WeDyPq8rSerHT6hKUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoN6xT3JuUnuTfKNJEeT/GSS85Pcn+SZ7vG8aQ0rSVqfvlfufwj8dVX9CPATwFHgAHCkqnYCR7ptSdIGmjjuSd4G/DRwG0BVfbeqXgb2AIe6ww4BV/UdUpI0nj5X7j8EDIA/TfLVJJ9N8lbgwqo6CdA9XjDq5CT7kywmWRwMBj3GkCSt1Cfuc8BlwK1V9S7gPxjjFkxVHayqhapamJ+f7zGGJGmlPnFfApaq6uFu+16WY/9CkosAusdT/UaUJI1r4rhX1b8Azye5pFvaBTwFHAb2dmt7gft6TShJGttcz/N/DbgzydnAs8BHWP4P454k+4DjwNU930OSNKZeca+qrwELI3bt6vO6kqR+/ISqJDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDXIuEtSg4y7JDWod9yTnJXkq0n+stu+OMnDSZ5J8vkkZ/cfU5I0jmlcuX8MODq0fTNwS1XtBL4N7JvCe0iSxtAr7km2AR8APtttB3gfcG93yCHgqj7vIUkaX98r9z8APgH8T7f9duDlqjrdbS8BW0edmGR/ksUki4PBoOcYkqRhE8c9yQeBU1X1yPDyiENr1PlVdbCqFqpqYX5+ftIxJEkjzPU49z3Ah5K8H3gL8DaWr+TPTTLXXb1vA070H1OSNI6Jr9yr6oaq2lZVO4BrgL+rql8AHgA+3B22F7iv95SSpLHM4u/crwc+nuQYy/fgb5vBe0iSXkOf2zL/p6q+DHy5e/4scPk0XleSNBk/oSpJDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDZo47km2J3kgydEkTyb5WLd+fpL7kzzTPZ43vXElSevR58r9NPBbVfWjwBXAdUkuBQ4AR6pqJ3Ck25YkbaCJ415VJ6vq0e75vwFHga3AHuBQd9gh4Kq+Q0qSxjOVe+5JdgDvAh4GLqyqk7D8HwBwwSrn7E+ymGRxMBhMYwxJUqd33JP8APDnwG9U1b+u97yqOlhVC1W1MD8/33cMSdKQXnFP8r0sh/3OqvpCt/xCkou6/RcBp/qNKEkaV5+/lglwG3C0qj49tOswsLd7vhe4b/LxJEmTmOtx7nuAXwQeT/K1bu13gJuAe5LsA44DV/cbUZI0ronjXlV/D2SV3bsmfV1JUn9+QlWSGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGmTcJalBxl2SGjSzuCfZneTpJMeSHJjV+0iSXm0mcU9yFvBHwJXApcC1SS6dxXtJkl5tVlfulwPHqurZqvoucDewZ0bvJUlaYW5Gr7sVeH5oewl49/ABSfYD+7vNf0/y9IxmebPZAry42UO8XuTmzZ5AI/gzOqTnz+gPrrZjVnHPiLX6fxtVB4GDM3r/N60ki1W1sNlzSKvxZ3RjzOq2zBKwfWh7G3BiRu8lSVphVnH/R2BnkouTnA1cAxye0XtJklaYyW2Zqjqd5KPA3wBnAbdX1ZOzeC+9ire69Hrnz+gGSFWtfZQk6Q3FT6hKUoOMuyQ1yLg3IsntSU4leWKzZ5FW49eSbBzj3o47gN2bPYS0Gr+WZGMZ90ZU1YPAS5s9h/Qa/FqSDWTcJW2UUV9LsnWTZmmecZe0Udb8WhJNj3GXtFH8WpINZNwlbRS/lmQDGfdGJLkL+ApwSZKlJPs2eyZpWFWdBs58LclR4B6/lmR2/PoBSWqQV+6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1KD/BSP/zhtoNHVUAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"counter = Counter(group(data, 1))\n",
"plt.bar(counter.keys(), counter.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, considered a sequence of bit in data. The next plot is sequence of 2 bits."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<BarContainer object of 4 artists>"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAMvElEQVR4nO3dUYhc53mH8ecfSWlCE7BdjY2wrSqkwo0pRIat6uKb1LGLmlxYLg7UF0YXLptCRRMaStTcNIEWHGjiqxBQalW6SJ0YJ8EmTpsK1cG4BKerRFGkKkGuq6SKhbTGMbFvXGS/vdBR2axnNKOdmZ39pOcHy858c0bzcsCPh7NnzqSqkCS1522zHkCStDIGXJIaZcAlqVEGXJIaZcAlqVHrV/PFNm7cWFu2bFnNl5Sk5h0+fPilquotX1/VgG/ZsoWFhYXVfElJal6Sn/Zb9xCKJDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDVqVT+JOY4te56a9QgzdeqhD896BElrjO/AJalRBlySGmXAJalRBlySGmXAJalRBlySGmXAJalRBlySGmXAJalRBlySGmXAJalRBlySGjU04EnekeR7SX6Y5HiSz3Tr+5P8d5Ij3c+26Y8rSbpolKsRvg7cWVWvJdkAPJvkn7vH/qqqHp/eeJKkQYYGvKoKeK27u6H7qWkOJUkabqRj4EnWJTkCnAMOVtVz3UN/l+RokoeT/NqA584nWUiysLi4OKGxJUkjBbyq3qiqbcBNwPYkvwP8NfDbwO8C1wGfHPDcvVU1V1VzvV5vQmNLki7rLJSqegX4DrCjqs7UBa8D/whsn8J8kqQBRjkLpZfkmu72O4G7gB8n2dStBdgJHJvmoJKkXzXKWSibgANJ1nEh+I9V1TeT/FuSHhDgCPBnU5xTkrTMKGehHAVu67N+51QmkiSNxE9iSlKjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNWqUa6HoCrBlz1OzHmGmTj304VmPIE2c78AlqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaZcAlqVGjfKnxO5J8L8kPkxxP8plu/T1JnktyMslXk7x9+uNKki4a5R3468CdVfV+YBuwI8ntwGeBh6tqK/AL4MHpjSlJWm5owOuC17q7G7qfAu4EHu/WDwA7pzKhJKmvkT5Kn2QdcBj4LeALwH8Br1TV+W6T08CNA547D8wDbN68edx5JTXoar+UA0zncg4j/RGzqt6oqm3ATcB24H39Nhvw3L1VNVdVc71eb+WTSpJ+xWWdhVJVrwDfAW4Hrkly8R38TcCLkx1NknQpo5yF0ktyTXf7ncBdwAngaeC+brNdwBPTGlKS9FajHAPfBBzojoO/DXisqr6Z5D+BryT5W+AHwCNTnFOStMzQgFfVUeC2PusvcOF4uCRpBvwkpiQ1yoBLUqMMuCQ1yoBLUqMMuCQ1ym+ll0ZwtX8UfBofA9f4fAcuSY0y4JLUKAMuSY0y4JLUKAMuSY0y4JLUKAMuSY0y4JLUKAMuSY0y4JLUKAMuSY0y4JLUqFG+1PjmJE8nOZHkeJKPdeufTvLzJEe6nw9Nf1xJ0kWjXI3wPPCJqvp+kncDh5Mc7B57uKr+fnrjSZIGGeVLjc8AZ7rbryY5Adw47cEkSZd2WcfAk2zhwjfUP9ct7U5yNMm+JNdOeDZJ0iWMHPAk7wK+Bny8qn4JfBF4L7CNC+/QPzfgefNJFpIsLC4uTmBkSRKMGPAkG7gQ7y9X1dcBqupsVb1RVW8CXwK293tuVe2tqrmqmuv1epOaW5KueqOchRLgEeBEVX1+yfqmJZvdCxyb/HiSpEFGOQvlDuAB4EdJjnRrnwLuT7INKOAU8NGpTChJ6muUs1CeBdLnoW9NfhxJ0qj8JKYkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNcqAS1KjDLgkNWqUb6W/OcnTSU4kOZ7kY936dUkOJjnZ/b52+uNKki4a5R34eeATVfU+4Hbgz5PcCuwBDlXVVuBQd1+StEqGBryqzlTV97vbrwIngBuBe4AD3WYHgJ3TGlKS9FaXdQw8yRbgNuA54IaqOgMXIg9cP+A580kWkiwsLi6ON60k6f+NHPAk7wK+Bny8qn456vOqam9VzVXVXK/XW8mMkqQ+Rgp4kg1ciPeXq+rr3fLZJJu6xzcB56YzoiSpn1HOQgnwCHCiqj6/5KEngV3d7V3AE5MfT5I0yPoRtrkDeAD4UZIj3dqngIeAx5I8CPwM+Mh0RpQk9TM04FX1LJABD39wsuNIkkblJzElqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaZcAlqVEGXJIaNcqXGu9Lci7JsSVrn07y8yRHup8PTXdMSdJyo7wD3w/s6LP+cFVt636+NdmxJEnDDA14VT0DvLwKs0iSLsM4x8B3JznaHWK5dtBGSeaTLCRZWFxcHOPlJElLrTTgXwTeC2wDzgCfG7RhVe2tqrmqmuv1eit8OUnScisKeFWdrao3qupN4EvA9smOJUkaZkUBT7Jpyd17gWODtpUkTcf6YRskeRT4ALAxyWngb4APJNkGFHAK+OgUZ5Qk9TE04FV1f5/lR6YwiyTpMvhJTElqlAGXpEYZcElqlAGXpEYZcElqlAGXpEYZcElqlAGXpEYZcElqlAGXpEYZcElqlAGXpEYZcElqlAGXpEYZcElqlAGXpEYZcElqlAGXpEYNDXiSfUnOJTm2ZO26JAeTnOx+XzvdMSVJy43yDnw/sGPZ2h7gUFVtBQ519yVJq2howKvqGeDlZcv3AAe62weAnROeS5I0xEqPgd9QVWcAut/XD9owyXyShSQLi4uLK3w5SdJyU/8jZlXtraq5qprr9XrTfjlJumqsNOBnk2wC6H6fm9xIkqRRrDTgTwK7utu7gCcmM44kaVSjnEb4KPBd4JYkp5M8CDwE3J3kJHB3d1+StIrWD9ugqu4f8NAHJzyLJOky+ElMSWqUAZekRhlwSWqUAZekRhlwSWqUAZekRhlwSWqUAZekRhlwSWqUAZekRhlwSWqUAZekRhlwSWqUAZekRhlwSWqUAZekRhlwSWqUAZekRg39SrVLSXIKeBV4AzhfVXOTGEqSNNxYAe/8QVW9NIF/R5J0GTyEIkmNGjfgBfxrksNJ5icxkCRpNOMeQrmjql5Mcj1wMMmPq+qZpRt0YZ8H2Lx585gvJ0m6aKx34FX1Yvf7HPANYHufbfZW1VxVzfV6vXFeTpK0xIoDnuTXk7z74m3gD4FjkxpMknRp4xxCuQH4RpKL/84/VdW/TGQqSdJQKw54Vb0AvH+Cs0iSLoOnEUpSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSowy4JDXKgEtSo8YKeJIdSX6S5PkkeyY1lCRpuBUHPMk64AvAHwG3AvcnuXVSg0mSLm2cd+Dbgeer6oWq+l/gK8A9kxlLkjRMqmplT0zuA3ZU1Z929x8Afq+qdi/bbh6Y7+7eAvxk5ePO1EbgpVkP0TD333jcf+Npff/9ZlX1li+uH+MfTJ+1t/zfoKr2AnvHeJ01IclCVc3Neo5Wuf/G4/4bz5W6/8Y5hHIauHnJ/ZuAF8cbR5I0qnEC/h/A1iTvSfJ24E+AJyczliRpmBUfQqmq80l2A98G1gH7qur4xCZbe5o/DDRj7r/xuP/Gc0XuvxX/EVOSNFt+ElOSGmXAJalRBryPJPuSnEtybMnaR5IcT/JmkivudKRJGrD/rktyMMnJ7ve1s5xxret3mYruhIHnun341e7kAfUxYP/t7u5Xko2znnESDHh/+4Edy9aOAX8MPLPq07RnP2/df3uAQ1W1FTjU3Vcfl7hMxWeBh7t9+AvgwdlNuXZdYv/9O3AX8NMZjjdRBryPqnoGeHnZ2omqavVTpKuq3/7jwmUWDnS3DwA7V3Wotgy6TMWdwOPdNu7Dwfruv6r6QVWdmu1ok2XAtVpuqKozAN3v62c8z1p2I/A/S+6f7tZeqarzy9b0VoP23xXHgEtrT7/LVKzrs+Y5wP2NdJmPK4EB12o5m2QTQPf73IznWcv6XabiZ8A1SdYvWfPSFf1dNZf5MOBaLU8Cu7rbu4AnZjjLWjfoMhVPA/d127gPB7tqLvNhwPtI8ijwXeCWJKeTPJjk3iSngd8Hnkry7dlOuXb123/AQ8DdSU4Cd3f31Ud3nPviZSpOAI91l6n4JPCXSZ4HfgN4ZHZTrl2D9l+Sv+j+G74JOJrkH2Y55yT4UXpJapTvwCWpUQZckhplwCWpUQZckhplwCWpUQZckhplwCWpUf8HomkNKenMiFMAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"counter = Counter(group(data, 2))\n",
"plt.bar(counter.keys(), counter.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And a sequence of 3 bits..."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<BarContainer object of 8 artists>"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAANmUlEQVR4nO3dfYxl9V3H8ffHXaGCrTzs0FZgHTAEJU0TyFTRGkxA4haaUhNMlqQVFTNRg61GUpeQ2L9M8CE+BSPZUIRGAlSsKbHRsqEQ0oRil+elW8q2XemWlV0kbS0qdMPXP+asGWZn5t6599w7+1ver2Qy9zzs/D5zLufD2XPuOZuqQpLUnh9Y7wCSpNFY4JLUKAtckhplgUtSoyxwSWrUxmkOtmnTppqdnZ3mkJLUvEcfffSlqppZOn+qBT47O8vOnTunOaQkNS/Jvy8331MoktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVEDCzzJrUkOJNm1zLLrklSSTZOJJ0layTBH4LcBW5bOTHImcCnwfM+ZJElDGFjgVfUQ8PIyi/4C+BjgA8UlaR2MdCdmkg8A36qqJ5MMWncemAfYvHnzKMNJasTsts+u6/h7b7x8XceftjVfxExyAnAD8IfDrF9V26tqrqrmZmaOuJVfkjSiUT6F8uPAWcCTSfYCZwCPJXlHn8EkSatb8ymUqnoaOO3wdFfic1X1Uo+5JEkDDPMxwjuBh4Fzk+xLcs3kY0mSBhl4BF5VVw1YPttbGknS0LwTU5IaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGjXS42SlY916Phb1zfZIVI3OI3BJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjRrmX6W/NcmBJLsWzfvTJF9J8lSSf0py0mRjSpKWGuYI/DZgy5J5O4B3VdW7ga8C1/ecS5I0wMACr6qHgJeXzLuvqg51k18EzphANknSKvp4GuGvA3evtDDJPDAPsHnz5h6Gk97cfFKiDhvrImaSG4BDwB0rrVNV26tqrqrmZmZmxhlOkrTIyEfgSa4G3g9cUlXVXyRJ0jBGKvAkW4A/AH6+qv6730iSpGEM8zHCO4GHgXOT7EtyDXAT8FZgR5Inktw84ZySpCUGHoFX1VXLzP7EBLJIktbAOzElqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRfTxOdip8hKYkvZFH4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMGFniSW5McSLJr0bxTkuxI8lz3/eTJxpQkLTXMEfhtwJYl87YB91fVOcD93bQkaYoGFnhVPQS8vGT2FcDt3evbgQ/2nEuSNMCoj5N9e1XtB6iq/UlOW2nFJPPAPMDmzZtHHE6j8jG80oL13BdgMvvDxC9iVtX2qpqrqrmZmZlJDydJbxqjFviLSd4J0H0/0F8kSdIwRi3we4Gru9dXA5/pJ44kaVjDfIzwTuBh4Nwk+5JcA9wIXJrkOeDSblqSNEUDL2JW1VUrLLqk5yySpDXwTkxJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDVqrAJP8ntJnkmyK8mdSd7SVzBJ0upGLvAkpwMfAeaq6l3ABmBrX8EkSasb9xTKRuCHkmwETgBeGD+SJGkYG0f9g1X1rSR/BjwP/A9wX1Xdt3S9JPPAPMDmzZtHHU7HoNltn123sffeePm6jS31ZZxTKCcDVwBnAT8KnJjkQ0vXq6rtVTVXVXMzMzOjJ5UkvcE4p1B+AfhGVR2squ8DnwZ+tp9YkqRBxinw54ELk5yQJMAlwO5+YkmSBhm5wKvqEeAe4DHg6e5nbe8plyRpgJEvYgJU1ceBj/eURZK0Bt6JKUmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjRrrTkwt8LGoktaDR+CS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjxirwJCcluSfJV5LsTvIzfQWTJK1u3IdZ/RXwr1V1ZZLjgBN6yCRJGsLIBZ7kbcBFwK8CVNVrwGv9xJIkDTLOKZSzgYPA3yV5PMktSU5culKS+SQ7k+w8ePDgGMNJkhYbp8A3AhcAf1tV5wOvANuWrlRV26tqrqrmZmZmxhhOkrTYOAW+D9hXVY900/ewUOiSpCkYucCr6j+AbyY5t5t1CfDlXlJJkgYa91MovwPc0X0C5evAr40fSZI0jLEKvKqeAOZ6yiJJWgPvxJSkRlngktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqPGLvAkG5I8nuSf+wgkSRpOH0fgHwV29/BzJElrMFaBJzkDuBy4pZ84kqRhjXsE/pfAx4DXe8giSVqDkQs8yfuBA1X16ID15pPsTLLz4MGDow4nSVpinCPw9wIfSLIXuAu4OMnfL12pqrZX1VxVzc3MzIwxnCRpsZELvKqur6ozqmoW2Ap8vqo+1FsySdKq/By4JDVqYx8/pKoeBB7s42dJkobjEbgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSo0Yu8CRnJnkgye4kzyT5aJ/BJEmr2zjGnz0E/H5VPZbkrcCjSXZU1Zd7yiZJWsXIR+BVtb+qHute/xewGzi9r2CSpNX1cg48ySxwPvDIMsvmk+xMsvPgwYN9DCdJoocCT/LDwD8Cv1tV3126vKq2V9VcVc3NzMyMO5wkqTNWgSf5QRbK+46q+nQ/kSRJwxjnUygBPgHsrqo/7y+SJGkY4xyBvxf4MHBxkie6r8t6yiVJGmDkjxFW1ReA9JhFkrQG3okpSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1KixCjzJliTPJtmTZFtfoSRJg41c4Ek2AH8DvA84D7gqyXl9BZMkrW6cI/CfAvZU1der6jXgLuCKfmJJkgZJVY32B5MrgS1V9Rvd9IeBn66qa5esNw/Md5PnAs+OHncsm4CX1mnsQcw2GrONxmyjWc9sP1ZVM0tnbhzjB2aZeUf836CqtgPbxxinF0l2VtXceudYjtlGY7bRmG00R2O2cU6h7APOXDR9BvDCeHEkScMap8C/BJyT5KwkxwFbgXv7iSVJGmTkUyhVdSjJtcDngA3ArVX1TG/J+rfup3FWYbbRmG00ZhvNUZdt5IuYkqT15Z2YktQoC1ySGnVMFHiSW5McSLJr0bxfTvJMkteTzC2af2qSB5J8L8lNU8p3xCMHuou/jyR5Lsnd3YVgkhzfTe/pls9OMNdy2+2UJDu6XDuSnNzN/4kkDyd5Ncl1k8o0Rr4k+etuuz2V5IIJZ1vuPb22m64kmxatO9Vtt8Zs095uQ++r3bLru2zPJvnFCWdby356UZLHkhzq7olZH1XV/BdwEXABsGvRvJ9k4cahB4G5RfNPBH4O+E3gpilk2wB8DTgbOA54koVHD3wK2NqtczPwW93r3wZu7l5vBe6e8nb7E2Bb93ob8Mfd69OA9wB/BFy3ju/rSvkuA/6FhfsTLgQeWYf39HxgFtgLbFq0/tS23QjZprbdVnlPV9pXz+vyHw+c1f1eG6a83VbaT2eBdwOfBK6cxv6w3NcxcQReVQ8BLy+Zt7uqjrjrs6peqaovAP87pXgrPXLgYuCebp3bgQ92r6/opumWX5JkuZumxrbcdlsy/v/nqqoDVfUl4PuTyDJuvm7+J2vBF4GTkrxzQtGWfU+r6vGq2rt05SlvuzVlY7rbbU37apftrqp6taq+Aexh4febhDXtp1W1t6qeAl6fUJ6hHBMFfpQ7Hfjmoul93bxvV9WhJfPesH63/DvAqdOJCsDbq2p/N/5+Fo4ejyYr5VtpO0/CNMdaq7VmO5Z+l0mMtdJ+elSwwCdvuaPnDcvMO/x5zqEeUaAjTHO7Hc3v0VqzHUu/S99jrbafHhUs8Mlb7pEDz7PwV9WNi+a9sHT9bvmPcORphEl68fBfobvvB6Y49jBWyjfNRzsczY+RWGu2Y+l36Xus1fbTo4IFPnkrPXLgAeDw1eurgc90r+/tpumWf766qyZTsnj8xbmOFivluxf4le5TFRcC3zl8qmUCjubHSKw12zS321rdC2ztPpl1FnAO8G8TGmut++nRYb2unvb5BdwJ7GfhItE+4Brgl7rXrwIvAp9btP5eFo5qv9etc96E810GfJWFq9w3dPPOZuE/xj3APwDHd/Pf0k3v6ZafPeXtdipwP/Bc9/2Ubt13dOt8F/h29/pt6/C+rpQvLPwDI18DnmbRpxmm+J5+pMt5iIUjtVvWY9utMdu0t9ta99UbumzPAu9bh/d0pf30PV3mV4D/BJ6ZZLaVvryVXpIa5SkUSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIa9X+/B8+tS+f5zQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"counter = Counter(group(data, 3))\n",
"plt.bar(counter.keys(), counter.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sequence of 3 bits seems bias toward some values, but this occur because of too small size of the sample data in the first place. If we random more number and test again, then each bar in the plot should have almost the same height."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"# Exercise 3\n",
"# ==========\n",
"#\n",
"# Initial random seed as you like, then random a lot of numbers.\n",
"# Observe the bar chart when grouping result into 4, 5, 6 bits.\n",
"# What is the random size that makes each bar has similar hight?\n",
"\n",
"\n",
"# EX_SEEDS = [1, 2, 3]\n",
"# EX_RANDOM_SIZE = 50\n",
"# EX_GROUP_SIZE = 4\n",
"\n",
"# expand_seeds(EX_SEEDS)\n",
"# raw_data = [random_uint32() for _ in range(EX_RANDOM_SIZE)]\n",
"# data = [int(bit) for x in raw_data for bit in f'{x:032b}']\n",
"# counter = Counter(group(data, EX_GROUP_SIZE))\n",
"# plt.bar(counter.keys(), counter.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many more other tests, such as, birthday spacing, overlapping permutation, monkeys, parking lot, etc. Which require further knowledge on the statistics that we can not cover all in such a limit time. So please use the build-ins random and do not create your own. This notebook is merely for education on one specific PRNG only."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"- [Mersenne Twister from the creator](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html)\n",
"- [Diehard test](http://webhome.phy.duke.edu/~rgb/General/dieharder.php)\n",
"- [Hypothesis testing fair coin](https://stats.stackexchange.com/questions/21581)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment