Skip to content

Instantly share code, notes, and snippets.

View n3tl0kr's full-sized avatar

Paul Goffar n3tl0kr

View GitHub Profile
@n3tl0kr
n3tl0kr / utilities.py
Created October 10, 2020 19:39 — forked from linwoodc3/utilities.py
A python script to scrape text from websites. This works surprisingly well on most news websites when you have the URL to the story. Use GDELT urls for the best results.
# Author: Linwood Creekmore
# Email: valinvescap@gmail.com
# Description: Python script to pull content from a website (works on news stories).
#Licensed under GNU GPLv3; see https://choosealicense.com/licenses/lgpl-3.0/ for details
# Notes
"""
23 Oct 2017: updated to include readability based on PyCon talk: https://github.com/DistrictDataLabs/PyCon2016/blob/master/notebooks/tutorial/Working%20with%20Text%20Corpora.ipynb
18 Jul 2018: added keywords and summary