Skip to content

Instantly share code, notes, and snippets.

@fnielsen
Created September 14, 2014 18:56
Show Gist options
  • Save fnielsen/d253cda351d9d382b4a1 to your computer and use it in GitHub Desktop.
Save fnielsen/d253cda351d9d382b4a1 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:b56b6af508759731f57ef4be9a7881aa10a8b90a2b2c70f694285db2c6cfbaa3"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Change in Twitter's Terms of Service\n",
"================================\n",
"\n",
"2014-09-09 I got an email from Twitter stating that they changed the Terms of Service. But what precisely?\n",
"\n",
"The following code downloads HTML pages from the Internet Archive to determine difference in the text of the Terms of Service.\n",
"\n",
"Note there was also a change in Privacy Policy which the code does not examine.\n",
"\n",
"Author \n",
"-----\n",
"Finn \u00c5rup Nielsen\n",
"http://www.compute.dtu.dk/~faan/"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from __future__ import division, print_function, unicode_literals\n",
"\n",
"from bs4 import BeautifulSoup\n",
"import difflib\n",
"import nltk\n",
"import requests"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# <div id=\"pageContent\">\n",
"def html_to_text(html):\n",
" soup = BeautifulSoup(html)\n",
" elements = soup.find_all('div', attrs={'id': 'pageContent'})\n",
" text = elements[0].get_text()\n",
" return text"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def text_to_sentences(text):\n",
" lines = text.splitlines()\n",
" sentences = [sentence for line in lines \n",
" for sentence in nltk.sent_tokenize(line)]\n",
" return sentences"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"urls = ['https://web.archive.org/web/20140625035128/https://twitter.com/tos',\n",
" 'https://web.archive.org/web/20140908194359/https://twitter.com/tos']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Download\n",
"htmls = [requests.get(url).text for url in urls]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Extract relevant information\n",
"list_of_sentences = [text_to_sentences(html_to_text(html)) \n",
" for html in htmls]\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"differ = difflib.Differ()\n",
"for sents1, sents2 in zip(list_of_sentences[:-1], list_of_sentences[1:]):\n",
" diff = differ.compare(sents1, sents2)\n",
" for line in diff:\n",
" if not line.startswith(' '):\n",
" print(line)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"- These Terms of Service (\"Terms\") govern your access to and use of the services, including our various websites, SMS, APIs, email notifications, applications, buttons, and widgets, (the \"Services\" or \u201cTwitter\u201d), and any information, text, graphics, photos or other materials uploaded, downloaded or appearing on the Services (collectively referred to as \"Content\").\n",
"? ----\n",
"\n",
"+ These Terms of Service (\"Terms\") govern your access to and use of the services, including our various websites, SMS, APIs, email notifications, applications, buttons, widgets, ads, and commerce services (the \"Services\" or \u201cTwitter\u201d), and any information, text, graphics, photos or other materials uploaded, downloaded or appearing on the Services (collectively referred to as \"Content\").\n",
"? ++++++++++++++++++++++++++++\n",
"\n",
"+ If you use commerce features of the Services that require credit or debit card information, such as our Buy Now feature, you agree to our Twitter Commerce Terms.\n",
"- Effective: June 25, 2012\n",
"+ Effective: September 8, 2014\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 119
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment