Skip to content

Instantly share code, notes, and snippets.

@WilHall
Created September 7, 2017 02:36
Show Gist options
  • Save WilHall/aae380bf41630f2c33a75e840101728e to your computer and use it in GitHub Desktop.
Save WilHall/aae380bf41630f2c33a75e840101728e to your computer and use it in GitHub Desktop.
# dictionary: http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b
import string
import json
dictionary = {}
with open('cmudict-0.7b', 'r') as dictionary_file:
for line in dictionary_file:
# skip comment lines
if (line[0] == ';'):
continue
word, token_string = line.split(' ')
# skip words that start with punctuation (punctuation characters, or just trash)
# skip words that end with punctuation (additional pronounciations, or trash)
if word[0] in string.punctuation or word[-1] in string.punctuation:
continue
tokens = token_string.split(' ')
syllables = [token for token in tokens if token[-1] in string.digits]
dictionary[word.lower()] = len(syllables)
with open('cmudict-0.7b.json', 'w+') as json_file:
json.dump(dictionary, json_file)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment