Skip to content

Instantly share code, notes, and snippets.

@hughdbrown
hughdbrown / data-wikipedia.md
Created August 31, 2015 17:26
Data project in wikipedia

Wikipedia data

Description

I like wikipedia. There must be some sort of project I could do with this data.

Data source

  • Wikipedia There are accessible dumps of wikipedia data.
@hughdbrown
hughdbrown / data-UN-voting-blocs.md
Created August 31, 2015 17:31
UN global warming voting blocs

UN global warming voting blocs

Description

I was listening on NPR today and heard that within the UN, there are about a dozen different blocs that vote together on global warming issues:

  • Switzerland alone
  • Developed countries
  • European group
  • "77 countries plus China" ... which is actually 134 countries
  • Various island nations most affected
@hughdbrown
hughdbrown / data-chronic-kidney-disease.md
Last active August 31, 2015 17:35
Chronic kidney disease predictor

Chronic kidney disease

Description

Data source

@hughdbrown
hughdbrown / data-job-recommender.md
Created August 31, 2015 22:59
Job recommender that bootstraps from list of job postings

Job recommender

Description

So often, job sites give candidates job listings that are far off topic. The job title is often not applicable for the candidate, and less often, the location does not match the cadidate's location.

Question

Can we build a better system for users by applying a recommender system to existing public listings?

Data source

  • glassdoor.com API
  • indeed.com web scraping
@hughdbrown
hughdbrown / data-homeaway.md
Created August 31, 2015 23:04
Homeaway data

Homeaway data

Description

Homeaway has data on vacation rentals. The data is not nearly so worked over as AirBNB data. Possibly there is something interesting in there to disover.

Data source

  • Homeaway API access The main problem with the project is that the Homeaway API is pretty opaque. I can't figure out how to get a data dump. Also, the API requires registration and advance permission.
@hughdbrown
hughdbrown / data-bitly.md
Last active September 1, 2015 10:42
Analysis of bit.ly data

Bit.ly data

Description

GermanWings crash/suicide news story spreads over bit.ly links.

Data source

  • bit.ly
  • twitter

Display style

@hughdbrown
hughdbrown / aws-copy-s3-to-s3.md
Last active September 4, 2015 17:10
Copy s3 to s3

Here is how I copied data from one S3 bucket to another:

aws s3 sync s3://bitly-challenges/hdb_sanitized s3://hughdbrown/data-capstone

Adapted from stackoverflow

@hughdbrown
hughdbrown / sha_backup.py
Created January 17, 2011 21:23
Idea for a git-like backup program
"""
Python script to backup data in src to dst using sha1 hashes of the files
in a backing directory.
Hugh Brown
hughdbrown@yahoo.com
"""
from hashlib import sha1
import os
@hughdbrown
hughdbrown / ds_a_b_test.py
Last active October 2, 2015 15:39
Data science: a-b-test
import numpy
import scipy.stats as scs
def a_b_test(new_views, new_clicks, old_views, old_clicks, size=10000):
new_site = scs.beta(a=new_clicks + 1, b=new_views + 1).rvs(size=size)
old_site = scs.beta(a=old_clicks + 1, b=old_views + 1).rvs(size=size)
return (new_site > old_site).mean()