tuzz/README.md

## README.md

      
    Raw
  

              README.md
            
          
    I've been thinking about pre-computing a database of pangram solutions. Why?

I think it would be fun
I like the idea of showing solutions in real-time as you type
I'm interested in patterns/stats relating to the set

I started thinking about which pangrams I should pre-compute, e.g.

"This seventy first pangram contains ..."
"This pangram is dedicated to John Smith and it contains ..."

Neither of these achieve my goals. It's very unlikely someone would choose to
type an arbitrary number as a seed and I suspect there would be a lot of bias in
the set if a repetitive pattern is used, skewing statistics.
I started wondering, instead, about basing it on the frequency of letters in
English text. E is the most common letter and appears 13% of the time, followed
by T, 9% of the time. It's far more likely people would choose seeds that
resemble this distribution.
Therein lies the crux of this gist. How would you enumerate seeds in a way that
approximates a probability distribution? Here's what that might look like:
     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1) `[_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _]`
2) `[_ _ _ _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _]`
3) `[_ _ _ _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1 _ _ _ _ _ _]`
4) `[_ _ _ _ 2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1 _ _ _ _ _ _]`
5) `[_ _ _ _ 1 _ _ _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _]`        (_ represents 0)

On each step, the enumerator should yield the next most likely combination of
letters. This might not mean the total number of letters always increases. For
example, two E's is more likely than one Z.
I can think of a way to do this by keeping track of all possible next moves and
picking the one, that when normalised, has the smallest Euclidean distance to
the probability distribution, but that requires an exponential amount of memory.
I suppose I could use a priority queue. Maybe I'm over-complicating it?
Additionally, it would be desirable to be able to pause and continue execution
at a later date without the need to presere a large object graph. I also thought
about using a random number generater based on the probability distribution, but
I think there's probably a better way.
Is this already a solved problem in mathematics/computer science?
Finally, I wrote something along these lines years ago, but it's logically flawed.