Skip to content

Instantly share code, notes, and snippets.

@elyasha
Created October 31, 2020 15:09
Show Gist options
  • Save elyasha/a9cc644fe5a2358ca9fbaa6c67060da9 to your computer and use it in GitHub Desktop.
Save elyasha/a9cc644fe5a2358ca9fbaa6c67060da9 to your computer and use it in GitHub Desktop.
Bag-of-words can be an excellent way of looking at language when you want to make predictions concerning topic or sentiment of a text. When grammar and word order are irrelevant, this is probably a good model to use.
looking_glass_text = """
However, the egg only got larger and larger, and more and more human: when she had come within a few yards of it, she saw that it had eyes and a nose and mouth; and when she had come close to it, she saw clearly that it was HUMPTY DUMPTY himself. It cant be anybody else! she said to herself. Im as certain of it, as if his name were written all over his face.
It might have been written a hundred times, easily, on that enormous face. Humpty Dumpty was sitting with his legs crossed, like a Turk, on the top of a high wallsuch a narrow one that Alice quite wondered how he could keep his balanceand, as his eyes were steadily fixed in the opposite direction, and he didnt take the least notice of her, she thought he must be a stuffed figure after all.
And how exactly like an egg he is! she said aloud, standing with her hands ready to catch him, for she was every moment expecting him to fall.
Its very provoking, Humpty Dumpty said after a long silence, looking away from Alice as he spoke, to be called an eggVery!
I said you looked like an egg, Sir, Alice gently explained. And some eggs are very pretty, you know she added, hoping to turn her remark into a sort of a compliment.
Some people, said Humpty Dumpty, looking away from her as usual, have no more sense than a baby!
Alice didnt know what to say to this: it wasnt at all like conversation, she thought, as he never said anything to her; in fact, his last remark was evidently addressed to a treeso she stood and softly repeated to herself:
Humpty Dumpty sat on a wall:
Humpty Dumpty had a great fall.
All the Kings horses and all the Kings men
Couldnt put Humpty Dumpty in his place again.
That last line is much too long for the poetry, she added, almost out loud, forgetting that Humpty Dumpty would hear her.
Dont stand there chattering to yourself like that, Humpty Dumpty said, looking at her for the first time, but tell me your name and your business.
My name is Alice, but
Its a stupid enough name! Humpty Dumpty interrupted impatiently. What does it mean?
Must a name mean something? Alice asked doubtfully.
Of course it must, Humpty Dumpty said with a short laugh: my name means the shape I amand a good handsome shape it is, too. With a name like yours, you might be any shape, almost.
Why do you sit out here all alone? said Alice, not wishing to begin an argument.
Why, because theres nobody with me! cried Humpty Dumpty. Did you think I didnt know the answer to that? Ask another.
Dont you think youd be safer down on the ground? Alice went on, not with any idea of making another riddle, but simply in her good-natured anxiety for the queer creature. That wall is so very narrow!
What tremendously easy riddles you ask! Humpty Dumpty growled out. Of course I dont think so! Why, if ever I did fall offwhich theres no chance ofbut if I did Here he pursed his lips and looked so solemn and grand that Alice could hardly help laughing. If I did fall, he went on, The King has promised mewith his very own mouthtoto
To send all his horses and all his men, Alice interrupted, rather unwisely.
Now I declare thats too bad! Humpty Dumpty cried, breaking into a sudden passion. Youve been listening at doorsand behind treesand down chimneysor you couldnt have known it!
I havent, indeed! Alice said very gently. Its in a book.
Ah, well! They may write such things in a book, Humpty Dumpty said in a calmer tone. Thats what you call a History of England, that is. Now, take a good look at me! Im one that has spoken to a King, I am: mayhap youll never see such another: and to show you Im not proud, you may shake hands with me! And he grinned almost from ear to ear, as he leant forwards (and as nearly as possible fell off the wall in doing so) and offered Alice his hand. She watched him a little anxiously as she took it. If he smiled much more, the ends of his mouth might meet behind, she thought: and then I dont know what would happen to his head! Im afraid it would come off!
Yes, all his horses and all his men, Humpty Dumpty went on. Theyd pick me up again in a minute, they would! However, this conversation is going on a little too fast: lets go back to the last remark but one.
Im afraid I cant quite remember it, Alice said very politely.
In that case we start fresh, said Humpty Dumpty, and its my turn to choose a subject (He talks about it just as if it was a game! thought Alice.) So heres a question for you. How old did you say you were?
Alice made a short calculation, and said Seven years and six months.
Wrong! Humpty Dumpty exclaimed triumphantly. You never said a word like it!
I though you meant How old are you? Alice explained.
If Id meant that, Id have said it, said Humpty Dumpty.
"""
from nltk.corpus import wordnet
from collections import Counter
def get_part_of_speech(word):
probable_part_of_speech = wordnet.synsets(word)
pos_counts = Counter()
pos_counts["n"] = len( [ item for item in probable_part_of_speech if item.pos()=="n"] )
pos_counts["v"] = len( [ item for item in probable_part_of_speech if item.pos()=="v"] )
pos_counts["a"] = len( [ item for item in probable_part_of_speech if item.pos()=="a"] )
pos_counts["r"] = len( [ item for item in probable_part_of_speech if item.pos()=="r"] )
most_likely_part_of_speech = pos_counts.most_common(1)[0][0]
return most_likely_part_of_speech
# importing regex and nltk
import re, nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
# importing Counter to get word counts for bag of words
from collections import Counter
# importing a passage from Through the Looking Glass
from looking_glass import looking_glass_text
# importing part-of-speech function for lemmatization
from part_of_speech import get_part_of_speech
# Change text to another string:
text = looking_glass_text
cleaned = re.sub('\W+', ' ', text).lower()
tokenized = word_tokenize(cleaned)
stop_words = stopwords.words('english')
filtered = [word for word in tokenized if word not in stop_words]
normalizer = WordNetLemmatizer()
normalized = [normalizer.lemmatize(token, get_part_of_speech(token)) for token in filtered]
# Comment out the print statement below
# print(normalized)
# Define bag_of_looking_glass_words & print:
bag_of_looking_glass_words = Counter(normalized)
print(bag_of_looking_glass_words)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment