Skip to content

Instantly share code, notes, and snippets.

@ykarikos
Last active December 14, 2015 01:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ykarikos/5007399 to your computer and use it in GitHub Desktop.
Save ykarikos/5007399 to your computer and use it in GitHub Desktop.
Select random letters according to the distribution of letters in Finnish language
#!/usr/bin/python
# Select random letters according to the distribution of letters in Finnish language:
# https://docs.google.com/spreadsheet/ccc?key=0AiZHeDrg3BuddFZfTXVnclQ5UWNkaGVuWmdVT3dzMEE&usp=sharing
# Ignore the uncommon letters c, z, w, q, x and ao
import random
from itertools import repeat
from collections import Counter
random.seed()
alphabet = ((0.11607, "I"), (0.23011, "A"), (0.31651, "T"), (0.39098, "S"), (0.45836, "E"),
(0.52488, "U"), (0.59058, "K"), (0.65494, "N"), (0.71334, "L"), (0.76748, "O"), (0.80699, "R"),
(0.83889, "AE"), (0.86920, "P"), (0.89892, "M"), (0.92313, "V"), (0.94481, "H"), (0.96644, "Y"),
(0.97941, "J"), (0.98683, "D"), (0.99417, "OE"), (0.99682, "G"), (0.99847, "F"), (1.00000, "B"))
def find(list, f):
head, tail = list[0], list[1:]
if (f(head)):
return head
else:
return find(tail, f)
def getLetter(alphabet):
r = random.random()
return find(alphabet, lambda x: x[0] > r)[1]
letters = Counter([getLetter(alphabet) for i in range(1000000)]).most_common()
for a,b in letters:
print(a, b/10000.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment