Skip to content

Instantly share code, notes, and snippets.

@bluerid
Created September 11, 2019 04:36
Show Gist options
  • Save bluerid/81022701e1a6bcb288e8bb7abc876f42 to your computer and use it in GitHub Desktop.
Save bluerid/81022701e1a6bcb288e8bb7abc876f42 to your computer and use it in GitHub Desktop.
NLP using python NLTK Frequency Distribution
import nltk
import urllib.request
from bs4 import BeautifulSoup
from nltk.corpus import stopwords
response = urllib.request.urlopen('https://en.wikipedia.org/wiki/SpaceX')
html = response.read()
soup = BeautifulSoup(html,'html5lib')
text = soup.get_text(strip = True)
tokens = [t for t in text.split()]
sr= stopwords.words('english')
clean_tokens = tokens[:]
for token in tokens:
if token in stopwords.words('english'):
clean_tokens.remove(token)
freq = nltk.FreqDist(clean_tokens)
for key,val in freq.items():
print(str(key) + ':' + str(val))
freq.plot(20, cumulative=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment