Skip to content

Instantly share code, notes, and snippets.

@omiq
Last active February 19, 2018 19:33
Show Gist options
  • Save omiq/a6540e9de396c2982c149d55a2f53a34 to your computer and use it in GitHub Desktop.
Save omiq/a6540e9de396c2982c149d55a2f53a34 to your computer and use it in GitHub Desktop.
Readability
# these modules help us do the script
import html2text
import requests
import sys
# this is the important library that acutally does the work
from textstat.textstat import textstat
# this is an easy way to strip html
h = html2text.HTML2Text()
# we want to not include even link tags
h.ignore_links = True
# function to remove html - could be more robust
def remove_html(in_text):
return h.handle(in_text)
# grab whatever content was specified in the command line
url = sys.argv[1]
# we need to strip html tags
test_string = remove_html(requests.get( url ).text)
# show what we grabbed
print( test_string )
print()
# So how readable is it?
print( textstat.flesch_reading_ease(test_string) + " /100" )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment