Skip to content

Instantly share code, notes, and snippets.

View philshem's full-sized avatar
💭
🐙

@philshem philshem

💭
🐙
View GitHub Profile
@philshem
philshem / cleanlist.txt
Last active August 29, 2015 13:55
Scrape the count of google search results (which are very approximate). Maybe require tweaking based on your browser language, etc.
able,academic,addiction,afraid,agricultural,analog,analogue,architectural,art,artistic,assistant,associate,audio,bad,bank,beauty,beauty ,benefits,best,birth,brave,business,busy,campaign,care,career,careers,careful,cheap,chief,clean,clever,client,clinical,co,comfortable,communications,competent,compliance,confidential,congressional,consumer,content,contigencies,core,course,court,customer,dangerous,database,deputy,difficult,digital,dirty,district,doctoral,dramatic,early,economic,education,ejaculation,emotional intelligence,employment ,empty,enrollment,enrolment,environmental,equal opportunity,exciting,executive,expensive,expert,external,faculty,fair,family,famous,fashion,fast,favorite,favourite,fifth,finance,financial,fine,first,food,fourth,free,full,funny,gastronomic,general,goal,good,google,graduate,great,green building,hairstyle,happy,health,home,important,industrial,information,insurance,interesting,internal,investment,jewellry,jewelry,job,junior,kind,language,late,law,lay,lazy,learning,learning development
@philshem
philshem / get_wiki_pv.py
Last active August 29, 2015 13:56
Collect daily Wikipedia page view counts for an array of terms. In this case, it's 'Advisor' and 'Adviser'. It helps to check that the Wikipedia page exists, first.
import requests
import collections
import time
searchlist = ['Advisor','Adviser']
minyear = 2008
maxyear = 2014
for search in searchlist:
views = {}
@philshem
philshem / twitter_search.py
Last active August 29, 2015 13:56
Search twitter via the API and download all corresponding tweets, for later analysis.
import json
import twitter # https://github.com/bear/python-twitter
import time
def main():
api = twitter.Api(consumer_key='INSERT', \
consumer_secret='INSERT', \
access_token_key='INSERT', \
access_token_secret='INSERT')
@philshem
philshem / create_csv_unicode.py
Last active August 29, 2015 13:58
Code to create a CSV file with unicode lookup and HTML escapes.
import sys
with open('unicode.csv','wb') as output:
for i in xrange(sys.maxunicode):
output.write(unicode(i))
output.write(u',')
output.write(unichr(i).encode('utf-8'))
output.write(u',')
output.write(unichr(i).encode('ascii', 'xmlcharrefreplace'))
output.write(u'\n')
print sys.maxunicode
# encoding: utf-8
import os
import shelve
import boto.glacier
import boto
from boto.glacier.exceptions import UnexpectedHTTPResponseError
ACCESS_KEY_ID = "XXXXXXXXXXXXX"
SECRET_ACCESS_KEY = "XXXXXXXXXXX"
SHELVE_FILE = os.path.expanduser("~/.glaciervault.db")
@philshem
philshem / revgeo.py
Last active August 29, 2015 14:09
Google reverse geocoding for a list of latitude & longitude (CSV output)
import requests
urlbase = 'http://maps.googleapis.com/maps/api/geocode/json?latlng='
key = None
# list of latitude, longitude pairs
latlong = [(40.714224,-73.961452), (47.3667, 8.5500)]
for xy in latlong:
@philshem
philshem / Get_stackexchange_stats.py
Last active August 29, 2015 14:10
Collect traffic stats to compare Stack Exchange network of sites.
# -*- coding: utf-8 -*-
# collect traffic data from the stackexchange sites page
import requests
from bs4 import BeautifulSoup
from collections import defaultdict
def main():
url = 'http://stackexchange.com/sites?view=list#traffic'
@philshem
philshem / twitter-search-language-mapping.csv
Created January 12, 2015 19:21
Data file to map Twitter advanced search language code (lang:en) to language name
Language Name Language Code
Amharic am
Arabic ar
Bulgarian bg
Bengali bn
Tibetan bo
Cherokee chr
Danish da
German de
Maldivian dv
@philshem
philshem / swissa4.py
Created February 27, 2015 19:41
Comparing Switzerland to an A4 document
from geopy.distance import great_circle
# how similar is switzerland to an A4?
# http://isithackday.com/geoplanet-explorer/index.php?woeid=23424957
topright = (47.808380, 10.492030)
topleft = (47.808380, 5.955870)
bottomleft = (45.818020, 5.955870)
bottomright = (45.818020, 10.492030)
@philshem
philshem / twitter_notification_website_change.py
Last active August 29, 2015 14:19
Send a Twitter status or message when a webpage has a change or is updated
# -*- coding: utf-8 -*-
# 1. scrape a webpage
# 2. compare to previous version
# 3. send a tweet (or direct message) when page is updated
import requests
import os
from lxml import html
from datetime import datetime