Skip to content

Instantly share code, notes, and snippets.

@kamotos
Created December 5, 2014 16:54
Show Gist options
  • Save kamotos/250a650ddd240e52289b to your computer and use it in GitHub Desktop.
Save kamotos/250a650ddd240e52289b to your computer and use it in GitHub Desktop.
Text extraction
beautifulsoup4==4.3.2
python-readability
requests
import bs4
import requests
from readability import readability
def get_summary(url):
response = requests.get(url)
document = readability.Document(response.content)
body = bs4.BeautifulSoup(document.summary()).get_text()
return {'text': body, 'title': document.short_title()}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment