Skip to content

Instantly share code, notes, and snippets.

Avatar

amnrzv

  • London
View GitHub Profile
@amnrzv
amnrzv / nltk_pos_tags
Created Oct 19, 2017
A list of POS tags used in NLTK and what they mean
View nltk_pos_tags
POS tag list:
CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there (like: "there is" ... think of it like "there exists")
FW foreign word
IN preposition/subordinating conjunction
JJ adjective 'big'
JJR adjective, comparative 'bigger'
View input.txt
Ruby... Ruby, can you hear me?
Moli? Moli, where are you?
Moli?
Ruby, I've crashed.
Yeah... But where?
I'm hurt, Ruby. Can you find me?
Okay, I can see plants.
I can see rocks.
View words.txt
act
be
begin
believe
break
call
can
change
choose
clean
View output.txt
act | 0
be | 6
begin | 0
believe | 0
break | 0
call | 0
can | 5
change | 0
choose | 0
clean | 0
@amnrzv
amnrzv / a_language_analysis.py
Last active Nov 1, 2017
Python NLTK vocabulary analysis example.
View a_language_analysis.py
import nltk
import re
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
input_file = "./input.txt"
words_file = "./words.txt"
output_file = "./output.txt"
curriculum_words = []
pos_tagged_array = []
@amnrzv
amnrzv / nltk_tokenize.py
Last active Nov 1, 2017
A little example of NLTK's word and sentence tokenization. Output here: https://gist.github.com/amnrzv/2cbaad89e016acc0db410ec79a5ff40f
View nltk_tokenize.py
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Hello, Mr. Jacobs. Nice to meet you!"
sentences = sent_tokenize(text)
words = word_tokenize(text)
print (sentences)
print (words)
View nltk_pos_tags.py
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
text1 = "I'm going to watch a play tonight."
text2 = "I like to play guitar."
words1 = word_tokenize(text1)
pos_tags1 = nltk.pos_tag(words1)
words2 = word_tokenize(text2)
@amnrzv
amnrzv / ntlk_lemmatizer.py
Last active Nov 1, 2017
An example of NLTK's WordNet Lemmatizer.
View ntlk_lemmatizer.py
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
print (wordnet_lemmatizer.lemmatize("geese"))
print (wordnet_lemmatizer.lemmatize("bottles", 'n'))
print (wordnet_lemmatizer.lemmatize("said", 'v'))
print (wordnet_lemmatizer.lemmatize("better", 'a'))
print (wordnet_lemmatizer.lemmatize("quickly", 'r'))
@amnrzv
amnrzv / nltk_pos_workaround.py
Last active Nov 1, 2017
Python NLTK POS tagger workaround example.
View nltk_pos_workaround.py
import nltk
import re
from nltk.tokenize import word_tokenize, sent_tokenize
text = "I'm not going to the party."
words = word_tokenize(text)
pos_tags = nltk.pos_tag(words)
print (pos_tags)
@amnrzv
amnrzv / a_PhoneticTranslations_main.py
Last active Nov 1, 2017
Get phonetic transcriptions of words by scraping it from the website http://www.phonemicchart.com/
View a_PhoneticTranslations_main.py
import urllib.request
import urllib.error
import urllib.parse
import re
from bs4 import BeautifulSoup
from bs4 import UnicodeDammit
lines = []
base_url = "http://www.phonemicchart.com/transcribe/?w=%s"
output_file = open("output.txt", 'w', encoding='utf-8')
You can’t perform that action at this time.