Skip to content

Instantly share code, notes, and snippets.

@pebreo
Last active July 28, 2018 21:02
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save pebreo/6117882 to your computer and use it in GitHub Desktop.
Save pebreo/6117882 to your computer and use it in GitHub Desktop.
An n-gram generator in Python (newbie program)
from collections import Counter
from random import choice
import re
class Cup:
""" A class defining a cup that will hold the words that we will pull out """
def __init__(self):
self.next_word = Counter() # will keep track of how many times a word appears in a cup
def add_next_word(self,word):
""" Used to add words to the cup and keep track of how many times we see it """
self.next_word[word] += 1
class NGram:
def make_cups(self,order,string):
""" Get a string of input text and make all the unique cups with the different words in it """
cups = dict()
words = re.findall(r"([\w]+)",string)
token = []
next_word = ''
for i in range(len(words)-order):
token = []
next_word = words[i+order]
for j in range(order):
token.append(words[i+j])
# create a cup if we've never seen this token before
if not cups.has_key(tuple(token)):
cups[tuple(token)] = Cup()
cups[tuple(token)].add_next_word(next_word)
else:
cups[tuple(token)].add_next_word(next_word)
return cups
def print_cups(self,cups):
""" You can use this to see what cups there are and the words inside of it """
for token in cups.keys():
print "key %s : nextword %s" %(token,cups[token].next_word)
def topword(self,n,cup):
""" A function that picks the top words in a cup. I realized I could've just randomly picked from a list of words
but instead a made a Counter() object of words to keep track of them
"""
# reverse sort based on values
temp = list(reversed(sorted(cup.next_word,key=cup.next_word.get)))
if len(temp[:n]) > 0:
return choice(temp[:n]) # random element from the temp list
else:
return ""
def generate_from_file(self,order,filename):
""" Generate words based on a text file """
# put file in one big string
with open(filename,'r') as f:
string = f.read()
string = re.sub('[,\.?"-\'!:;]','',string)
cups = {}
cups = self.make_cups(order,string)
first_token = choice(cups.keys()) # random first token
this_token = first_token
generated_words = []
generated_words += list(this_token)
next_word = "" # first next word is blank
for i in range(10):
next_word = self.topword(1,cups[this_token])
temp = list(this_token)
temp.pop(0) # remove first word in token
temp.append(next_word) # add new word to the token
next_token = tuple(temp)
this_token = next_token
generated_words += [next_word]
print ' '.join(generated_words)
def generate_from_string(self,order,string,number_of_words):
""" Generate words based on a string """
string = re.sub('[,\.?"-\'!:;]','',string)
cups = {}
cups = self.make_cups(order,string)
first_token = choice(cups.keys()) # random first token
this_token = first_token
generated_words = []
generated_words += list(this_token)
next_word = "" # first next_word is blank
for i in range(1,number_of_words):
# if the name of the token exists
if cups.has_key(this_token):
next_word = self.topword(2,cups[this_token])
else:
this_token = choice(cups.keys()) # random token
next_word = self.topword(2,cups[this_token])
temp = list(this_token)
temp.pop(0) # remove first word in token
temp.append(next_word) # add new word to the end of the token
next_token = tuple(temp)
this_token = next_token
generated_words += [next_word]
print ' '.join(generated_words)
ngram = NGram()
# uncomment to use if you have a file you want to generate from
#ngram.generate_from_file(2,'shakespeare.txt')
strings = {}
strings['sal'] = "Let's say we have the equation 7 times x is equal to 14. Now before even trying to solve this equation, what I want to do is think a little bit about what this actually means. 7x equals 14, this is the exact same thing as saying 7 times x - let me write it this way -- 7 times x -- we'll do the x in orange again -- 7 times x is equal to 14. Now you might be able to do this in your head. You could literally go through the 7 times table. You say well 7 times 1 is equal to 7, so that won't work. 7 times 2 is equal to 14, so 2 works here. So you would immediately be able to solve it. You would immediately, just by trying different numbers out, say hey, that's going to be a 2. But what we're going to do in this video is to think about how to solve this systematically. Because what we're going to find is as these equations get more and more complicated, you're not going to be able to just think about it and do it in your head. So it's really important that one, you understand how to manipulate these equations, but even more important to understand what they actually represent. This literally just says 7 times x is equal to 14. In algebra we don't write the times there. When you write two numbers next to each other or a number next"
strings['sherlock'] = "It was in the spring of the year 1894 that all London was interested, and the fashionable world dismayed, by the murder of the Honourable Ronald Adair under most unusual and inexplicable circumstances. The public has already learned those particulars of the crime which came out in the police investigation; but a good deal was suppressed upon that occasion, since the case for the prosecution was so overwhelmingly strong that it was not necessary to bring forward all the facts. Only now, at the end of nearly ten years, am I allowed to supply those missing links which make up the whole of that remarkable chain. The crime was of interest in itself, but that interest was as nothing to me compared to the inconceivable sequel, which afforded me the greatest shock and surprise of any event in my adventurous life. Even now, after this long interval, I find myself thrilling as I think of it, and feeling once more that sudden flood of joy, amazement, and incredulity which utterly submerged my mind. Let me say to that public which has shown some interest in those glimpses which I have occasionally given them of the thoughts and actions of a very remarkable man that they are not to blame me if I have not shared my knowledge with them, for I should have considered it my first duty to have done so had I not been barred by a positive prohibition from his own lips, which was only withdrawn upon the third of last month."
strings['hamlet'] = "To be, or not to be: that is the question: whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them? To die: to sleep; no more; and, by a sleep to say we end the heart-ache and the thousand natural shocks that flesh is heir to, 'tis a consummation devoutly to be wish'd. To die, to sleep; to sleep: perchance to dream: ay, there's the rub; for in that sleep of death what dreams may come when we have shuffled off this mortal coil, must give us pause. There's the respect that makes calamity of so long a life; for who would bear the whips and scorns of time, the oppressor's wrong, the proud man's contumely, the pangs of dispriz'd love, the law's delay, the insolence of office, and the spurns that patient merit of the unworthy takes, when he himself might his quietus make with a bare bodkin? Who would fardels bear, to grunt and sweat under a weary life, but that the dread of something after"
strings['winnie'] = "Once upon a time, a very long time ago now, about last Friday, Winnie-the-Pooh lived in a forest all by himself under the name of Sanders. ('What does 'under the name' mean?' asked Christopher Robin. 'It means he had the name over the door in gold letters, and lived under it.' 'Winnie-the-Pooh wasn't quite sure,' said Christopher Robin. 'Now I am,' said a growly voice. 'Then I will go on,' said I.) One day when he was out walking, he came to an open place in the middle of the forest, and in the middle of this place was a large oak-tree, and, from the top of the tree, there came a loud buzzing-noise. Winnie-the-Pooh sat down at the foot of the tree, put his head between his paws and began to think. First of all he said to himself: 'That buzzing-noise means something. You don't get a buzzing-noise like that, just buzzing and buzzing, without its meaning something. If there's a buzzing-noise, somebody's making a buzzing-noise, and the only reason for making a buzzing-noise that I know of is because you're a bee.' Then he thought another long time, and said: 'And the only reason for being a bee that I know of is making honey.' And then he got up, and said: 'And the only reason for making honey is so as I can eat it.' So he began to climb the tree"
print "\nSal: "
ngram.generate_from_string(order=2,string=strings['sal'],number_of_words=100)
print "\nSherlock: "
ngram.generate_from_string(order=2,string=strings['sherlock'],number_of_words=100)
print "\nHamlet: "
ngram.generate_from_string(order=2,string=strings['hamlet'],number_of_words=100)
print "\nWinnie the Pooh: "
ngram.generate_from_string(order=2,string=strings['winnie'],number_of_words=100)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment