Hugh Brown hughdbrown

## data-wikipedia.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / data-wikipedia.md
            
            
              Created
              August 31, 2015 17:26
            
              
                Data project in wikipedia
              
          
    Wikipedia data

Description

I like wikipedia. There must be some sort of project I could do with this data.
Data source


Wikipedia
There are accessible dumps of wikipedia data.


## data-UN-voting-blocs.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / data-UN-voting-blocs.md
            
            
              Created
              August 31, 2015 17:31
            
              
                UN global warming voting blocs
              
          
    UN global warming voting blocs

Description

I was listening on NPR today and heard that within the UN, there are about a dozen different blocs that vote together on global warming issues:

Switzerland alone
Developed countries
European group
"77 countries plus China"
... which is actually 134 countries
Various island nations most affected


## data-chronic-kidney-disease.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / data-chronic-kidney-disease.md
            
            
              Last active
              August 31, 2015 17:35
            
              
                Chronic kidney disease predictor
              
          
    Chronic kidney disease

Description

Data source


UC Irving
400 rows by 25 attributes


## data-job-recommender.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / data-job-recommender.md
            
            
              Created
              August 31, 2015 22:59
            
              
                Job recommender that bootstraps from list of job postings
              
          
    Job recommender

Description

So often, job sites give candidates job listings that are far off topic. The job title is often not applicable for the candidate, and less often, the location does not match the cadidate's location.
Question

Can we build a better system for users by applying a recommender system to existing public listings?
Data source


glassdoor.com API
indeed.com web scraping


## data-homeaway.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / data-homeaway.md
            
            
              Created
              August 31, 2015 23:04
            
              
                Homeaway data
              
          
    Homeaway data

Description

Homeaway has data on vacation rentals. The data is not nearly so worked over as AirBNB data. Possibly there is something interesting in there to disover.
Data source


Homeaway API access
The main problem with the project is that the Homeaway API is pretty opaque. I can't figure out how to get a data dump. Also, the API requires registration and advance permission.


## data-bitly.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / data-bitly.md
            
            
              Last active
              September 1, 2015 10:42
            
              
                Analysis of bit.ly data
              
          
    Bit.ly data

Description

GermanWings crash/suicide news story spreads over bit.ly links.
Data source


bit.ly
twitter

Display style


## data-projects.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / data-projects.md
            
            
              Last active
              September 4, 2015 16:19
            
              
                Projects for me to work on
              
          
    Projects


Denver drivers
Real estate factors
Movie factors
Wikipedia data
UN global warming blocs
Job recommender
Homeaway data
Resume clustering


## aws-copy-s3-to-s3.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                hughdbrown
                / aws-copy-s3-to-s3.md
            
            
              Last active
              September 4, 2015 17:10
            
              
                Copy s3 to s3
              
          
    Here is how I copied data from one S3 bucket to another:
aws s3 sync s3://bitly-challenges/hdb_sanitized s3://hughdbrown/data-capstone

Adapted from stackoverflow

  
## sha_backup.py
"""
Python script to backup data in src to dst using sha1 hashes of the files
in a backing directory.

Hugh Brown
hughdbrown@yahoo.com
"""

from hashlib import sha1
import os

## ds_a_b_test.py
import numpy
import scipy.stats as scs

def a_b_test(new_views, new_clicks, old_views, old_clicks, size=10000):
    new_site = scs.beta(a=new_clicks + 1, b=new_views + 1).rvs(size=size)
    old_site = scs.beta(a=old_clicks + 1, b=old_views + 1).rvs(size=size)
    return (new_site > old_site).mean()
	"""
	Python script to backup data in src to dst using sha1 hashes of the files
	in a backing directory.

	Hugh Brown
	hughdbrown@yahoo.com
	"""

	from hashlib import sha1
	import os
	import numpy
	import scipy.stats as scs

	def a_b_test(new_views, new_clicks, old_views, old_clicks, size=10000):
	new_site = scs.beta(a=new_clicks + 1, b=new_views + 1).rvs(size=size)
	old_site = scs.beta(a=old_clicks + 1, b=old_views + 1).rvs(size=size)
	return (new_site > old_site).mean()