Skip to content

Instantly share code, notes, and snippets.

@jseabold
Created November 28, 2011 04:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jseabold/1399100 to your computer and use it in GitHub Desktop.
Save jseabold/1399100 to your computer and use it in GitHub Desktop.
Stata's webuse in python
import pandas
import numpy as np
def webuse(data, baseurl='http://www.stata-press.com/data/r11/'):
"""
Parameters
----------
data : str
Name of dataset to fetch.
Examples
--------
>>> dta = webuse('auto')
Notes
-----
Make sure baseurl has trailing forward slash. Doesn't do any
error checking in response URLs.
"""
# lazy imports
from scikits.statsmodels.iolib import genfromdta
from urllib2 import urlopen
from urlparse import urljoin
from StringIO import StringIO
url = urljoin(baseurl, data+'.dta')
dta = urlopen(url)
dta = StringIO(dta.read()) # make it truly file-like
return genfromdta(dta)
dta = webuse('auto')
df = pandas.DataFrame.from_records(dta)
# how do I do boolean indexing on a whole DataFrame?
df.ix[df['rep78'] == -999, 'rep78'] = np.nan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment