Last active
February 9, 2023 22:47
-
-
Save wtype/a8d20629917f11d49b830a654b74de45 to your computer and use it in GitHub Desktop.
Generate a word cloud image from a URL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt | |
from wordcloud import WordCloud | |
import requests | |
from bs4 import BeautifulSoup | |
def is_wanted_element(element): | |
if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']: | |
return False | |
return True | |
def words_in_html(body): | |
soup = BeautifulSoup(body, 'html.parser') | |
texts = soup.findAll(text=True) | |
visible_texts = filter(is_wanted_element, texts) | |
return u" ".join(t.strip() for t in visible_texts) | |
# enter a url to make a word cloud | |
url = raw_input('Please enter the URL you want a word cloud of...\n For example: https://en.wikipedia.org/wiki/Lauterbrunnen\n\n URL: ') | |
page = requests.get(url) | |
html = page.content | |
words = words_in_html(html) | |
# print(words) | |
wordcloud = WordCloud(width=1000, height=700, max_font_size=150).generate(words) | |
plt.figure() | |
plt.imshow(wordcloud, interpolation='bilinear') | |
plt.axis("off") | |
plt.show() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Web Page to Word Cloud
Generate a word cloud image from a url using Python.
There are lots of free word cloud generators online. I tried to use one recently and was met with a barrage of popups asking me to "OK" cookies and warnings that "We may sell data to advertisers". So here's a quick script to make your own word cloud images for free.
📃 → ☁️
Create a Word Cloud
Enter your
url
and press enter.