Skip to content

Instantly share code, notes, and snippets.

@n3tl0kr
n3tl0kr / utilities.py
Created Oct 10, 2020 — forked from linwoodc3/utilities.py
A python script to scrape text from websites. This works surprisingly well on most news websites when you have the URL to the story. Use GDELT urls for the best results.
View utilities.py
# Author: Linwood Creekmore
# Email: valinvescap@gmail.com
# Description: Python script to pull content from a website (works on news stories).
#Licensed under GNU GPLv3; see https://choosealicense.com/licenses/lgpl-3.0/ for details
# Notes
"""
23 Oct 2017: updated to include readability based on PyCon talk: https://github.com/DistrictDataLabs/PyCon2016/blob/master/notebooks/tutorial/Working%20with%20Text%20Corpora.ipynb
18 Jul 2018: added keywords and summary