Skip to content

Instantly share code, notes, and snippets.

@tuzz
Last active February 24, 2017 00:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tuzz/74540f7b37554058b2d84716bba486bc to your computer and use it in GitHub Desktop.
Save tuzz/74540f7b37554058b2d84716bba486bc to your computer and use it in GitHub Desktop.
A jumble of thoughts

I've been thinking about pre-computing a database of pangram solutions. Why?

  • I think it would be fun
  • I like the idea of showing solutions in real-time as you type
  • I'm interested in patterns/stats relating to the set

I started thinking about which pangrams I should pre-compute, e.g.

  • "This seventy first pangram contains ..."
  • "This pangram is dedicated to John Smith and it contains ..."

Neither of these achieve my goals. It's very unlikely someone would choose to type an arbitrary number as a seed and I suspect there would be a lot of bias in the set if a repetitive pattern is used, skewing statistics.

I started wondering, instead, about basing it on the frequency of letters in English text. E is the most common letter and appears 13% of the time, followed by T, 9% of the time. It's far more likely people would choose seeds that resemble this distribution.

Therein lies the crux of this gist. How would you enumerate seeds in a way that approximates a probability distribution? Here's what that might look like:

     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1) `[_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _]`
2) `[_ _ _ _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _]`
3) `[_ _ _ _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1 _ _ _ _ _ _]`
4) `[_ _ _ _ 2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1 _ _ _ _ _ _]`
5) `[_ _ _ _ 1 _ _ _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _]`        (_ represents 0)

On each step, the enumerator should yield the next most likely combination of letters. This might not mean the total number of letters always increases. For example, two E's is more likely than one Z.

I can think of a way to do this by keeping track of all possible next moves and picking the one, that when normalised, has the smallest Euclidean distance to the probability distribution, but that requires an exponential amount of memory. I suppose I could use a priority queue. Maybe I'm over-complicating it?

Additionally, it would be desirable to be able to pause and continue execution at a later date without the need to presere a large object graph. I also thought about using a random number generater based on the probability distribution, but I think there's probably a better way.

Is this already a solved problem in mathematics/computer science?

Finally, I wrote something along these lines years ago, but it's logically flawed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment