Skip to content

Instantly share code, notes, and snippets.

@jseabold jseabold/
Created Nov 28, 2011

What would you like to do?
Stata's webuse in python
import pandas
import numpy as np
def webuse(data, baseurl=''):
data : str
Name of dataset to fetch.
>>> dta = webuse('auto')
Make sure baseurl has trailing forward slash. Doesn't do any
error checking in response URLs.
# lazy imports
from scikits.statsmodels.iolib import genfromdta
from urllib2 import urlopen
from urlparse import urljoin
from StringIO import StringIO
url = urljoin(baseurl, data+'.dta')
dta = urlopen(url)
dta = StringIO( # make it truly file-like
return genfromdta(dta)
dta = webuse('auto')
df = pandas.DataFrame.from_records(dta)
# how do I do boolean indexing on a whole DataFrame?
df.ix[df['rep78'] == -999, 'rep78'] = np.nan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.