Skip to content

Instantly share code, notes, and snippets.

Avatar

amnrzv

  • London
View GitHub Profile
@amnrzv
amnrzv / nltk_tokenize.py
Last active Nov 1, 2017
A little example of NLTK's word and sentence tokenization. Output here: https://gist.github.com/amnrzv/2cbaad89e016acc0db410ec79a5ff40f
View nltk_tokenize.py
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Hello, Mr. Jacobs. Nice to meet you!"
sentences = sent_tokenize(text)
words = word_tokenize(text)
print (sentences)
print (words)
View nltk_pos_tags.py
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
text1 = "I'm going to watch a play tonight."
text2 = "I like to play guitar."
words1 = word_tokenize(text1)
pos_tags1 = nltk.pos_tag(words1)
words2 = word_tokenize(text2)
@amnrzv
amnrzv / nltk_pos_tags
Created Oct 19, 2017
A list of POS tags used in NLTK and what they mean
View nltk_pos_tags
POS tag list:
CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there (like: "there is" ... think of it like "there exists")
FW foreign word
IN preposition/subordinating conjunction
JJ adjective 'big'
JJR adjective, comparative 'bigger'
@amnrzv
amnrzv / ntlk_lemmatizer.py
Last active Nov 1, 2017
An example of NLTK's WordNet Lemmatizer.
View ntlk_lemmatizer.py
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
print (wordnet_lemmatizer.lemmatize("geese"))
print (wordnet_lemmatizer.lemmatize("bottles", 'n'))
print (wordnet_lemmatizer.lemmatize("said", 'v'))
print (wordnet_lemmatizer.lemmatize("better", 'a'))
print (wordnet_lemmatizer.lemmatize("quickly", 'r'))
@amnrzv
amnrzv / a_language_analysis.py
Last active Nov 1, 2017
Python NLTK vocabulary analysis example.
View a_language_analysis.py
import nltk
import re
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
input_file = "./input.txt"
words_file = "./words.txt"
output_file = "./output.txt"
curriculum_words = []
pos_tagged_array = []
View input.txt
Ruby... Ruby, can you hear me?
Moli? Moli, where are you?
Moli?
Ruby, I've crashed.
Yeah... But where?
I'm hurt, Ruby. Can you find me?
Okay, I can see plants.
I can see rocks.
View words.txt
act
be
begin
believe
break
call
can
change
choose
clean
View output.txt
act | 0
be | 6
begin | 0
believe | 0
break | 0
call | 0
can | 5
change | 0
choose | 0
clean | 0
@amnrzv
amnrzv / nltk_pos_workaround.py
Last active Nov 1, 2017
Python NLTK POS tagger workaround example.
View nltk_pos_workaround.py
import nltk
import re
from nltk.tokenize import word_tokenize, sent_tokenize
text = "I'm not going to the party."
words = word_tokenize(text)
pos_tags = nltk.pos_tag(words)
print (pos_tags)
@amnrzv
amnrzv / a_PhoneticTranslations_main.py
Last active Nov 1, 2017
Get phonetic transcriptions of words by scraping it from the website http://www.phonemicchart.com/
View a_PhoneticTranslations_main.py
import urllib.request
import urllib.error
import urllib.parse
import re
from bs4 import BeautifulSoup
from bs4 import UnicodeDammit
lines = []
base_url = "http://www.phonemicchart.com/transcribe/?w=%s"
output_file = open("output.txt", 'w', encoding='utf-8')
You can’t perform that action at this time.