Skip to content

Instantly share code, notes, and snippets.

@javallone
Created March 14, 2014 01:33
Show Gist options
  • Save javallone/9540645 to your computer and use it in GitHub Desktop.
Save javallone/9540645 to your computer and use it in GitHub Desktop.
Generate random words

Generating weights

Before generating any words, you first need a weights file. Generate one using build_weights.rb like so:

cat /path/to/dictionary | ./build_weights.rb 5 > weights.json

This will generate the weights.json file based on the words in /path/to/dictionary with a look-back of 5. The input dictionary is a collection of words, one per line. The look-back controls how much context will be used when generating a word. A look-back of one tends to generate more gibberish words, higher values will create more coherent words. 5 seems to give very good results.

Generating words

Once you have a weight file, you can generate words using make_word.rb like so:

./make_word.sh weights.json
#!/usr/bin/env ruby
require 'json'
LOOK_BACK = ARGV[0].to_i
counts = STDIN.readlines.map(&:downcase).uniq
.map(&:chars).flat_map do |chars|
last_state = chars.first
chars.slice(1..chars.length).map do |ch|
ch = '' if ch == "\n"
tr = [last_state, ch]
last_state = (last_state + ch).chars.last(LOOK_BACK).join
tr
end
end.reduce({}) do |memo, (start, result)|
memo[start] ||= {}
memo[start][result] ||= 0
memo[start][result] += 1
memo
end
weights = counts.merge(counts) do |_, count|
total = count.values.inject(:+)
acc = 0
count.merge(count) do |_, value|
acc += Rational(value, total).to_f
end
end
puts(JSON.dump(weights))
#!/usr/bin/env ruby
require 'json'
weights_file = File.join(File.dirname(__FILE__), ARGV[0])
weights = JSON.parse(File.read(weights_file))
LOOK_BACK = weights.keys.map(&:length).max
state = 'abcdefghijklmnopqrstuvwxyz'.chars.sample
result = []
loop do
result << state
r = rand
state = weights[result.last(LOOK_BACK).join].find { |ch, prob| prob > r }[0]
break if state == ''
end
puts(result.join)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment