The Centers for Medicare and Medicaid Services (CMS) today released a 9m row dataset detailing payment data about doctors and other providers who get paid by Medicare. Read more about the release.
The data, along with a description PDF, is here, and it's roughly 400MB zipped, 1.3GB unzipped, tab-delimited.
I stashed the tab-delimited data file itself on S3, it's public, you can grab it.
It's not a huge file, so you could load it with the schema I made into a local pgsql instance on your laptop, but it's fun to play around with it on AWS Redshift. To load it in Redshift, start up a cluster, connect to it with psql, execute the schema SQL, and run the following commands:
dev=# copy medicare_physician from 's3://paulsmith/medicare/medicare.txt.gz' credentials 'aws_access_key_id=<YOUR ACCESS KEY>;aws_secret_access_key=<YOUR SECRET KEY>' delimiter '\t' gzip ignoreheader 1;
dev=# copy place_of_service from 's3://paulsmith/medicare/place_of_service.txt' credentials 'aws_access_key_id=<YOUR ACCESS KEY>;aws_secret_access_key=<YOUR SECRET KEY>' delimiter '\t' ignoreheader 1;