Skip to content

Instantly share code, notes, and snippets.

@JoeGermuska
Last active August 29, 2015 13:56
Show Gist options
  • Save JoeGermuska/9240177 to your computer and use it in GitHub Desktop.
Save JoeGermuska/9240177 to your computer and use it in GitHub Desktop.

After some discussion about how a map showing that half of US income can be attributed to the major metro areas, I wanted to see if I could work out how public assistance is distributed.

The American Community Survey has a table, B19067, Aggregate Public Assistance Income for Households, which tells the total amount of dollars in public assistance income that go to each geography. As you might guess, the biggest metro areas have the most dollars.

So I downloaded some CSVs from Census Reporter for all CBSA (metro/micropolitan) areas in the US:

That last is because at first I was going to divide the public assistance income among people. Then I thought I should probably use households, and table B19057 includes total households, as well as the number of households receiving public assistance.

I downloaded the data as CSV from Census Reporter. That data comes as a ZIP file per table, with metadata that we can ignore for now. I unzipped the downloads, and renamed the CSV files to simpler names for easier typing. Then I used csvkit to join the tables together:

csvjoin -c geoid b01003.csv b19057.csv b19067.csv | csvcut -c 1,2,B01003001,B19057001,B19057002,B19057003,B19067001 > combined.csv

This command saves us from the duplicated geoid and name columns. It also discards the margin of error. That's just because this is a casual exercise.

Next, to pandas, which I don't really know all that well, but which I can get to do what I want here.

import pandas as pd
data = pd.read_csv("combined.csv",index_col="geoid")
data['perpahh'] = data['B19067001'].astype(float)/data['B19057002'].astype(float)
data.sort('perpahh',ascending=False,inplace=True)
print data[['name','perpahh']][:10]
                                                       name      perpahh
geoid
31000US22060            Faribault-Northfield, MN Micro Area  8023.707665
31000US36140                      Ocean City, NJ Metro Area  7441.758242
31000US45540                    The Villages, FL Micro Area  6983.609959
31000US27540                          Jasper, IN Micro Area  6964.754098
31000US43260                        Sheridan, WY Micro Area  6909.302326
31000US45000                      Susanville, CA Micro Area  6550.404858
31000US31460               Madera-Chowchilla, CA Metro Area  6364.470976
31000US41940  San Jose-Sunnyvale-Santa Clara, CA Metro Area  6195.218489
31000US27340                    Jacksonville, NC Metro Area  6185.892116
31000US28180                           Kapaa, HI Micro Area  6146.725441

So this shows us the top 10 CBSAs by average amount of public assistance given to households getting any public assistance.

We can add some more summary statistics:

data['perhh'] = data['B19067001'].astype(float)/data['B19057001'].astype(float)
data.sort('perhh',ascending=False,inplace=True)
print data[['name','perhh']][:10]
                                            name       perhh
geoid
31000US18860        Crescent City, CA Micro Area  502.916624
31000US47300  Visalia-Porterville, CA Metro Area  432.417921
31000US32900               Merced, CA Metro Area  407.005745
31000US23420               Fresno, CA Metro Area  401.295100
31000US44900          Summerville, GA Micro Area  376.434075
31000US31460    Madera-Chowchilla, CA Metro Area  367.527281
31000US16500            Centralia, WA Micro Area  359.936919
31000US12540   Bakersfield-Delano, CA Metro Area  358.303249
31000US49700            Yuba City, CA Metro Area  353.179243
31000US10260             Adjuntas, PR Micro Area  346.668980

data['percapita'] = data['B19067001'].astype(float)/data['B01003001']
data.sort('percapita',ascending=False,inplace=True)
print data[['name','percapita']][:10]
                                            name   percapita
geoid
31000US18860        Crescent City, CA Micro Area  171.245087
31000US16500            Centralia, WA Micro Area  140.759161
31000US44900          Summerville, GA Micro Area  135.780737
31000US47300  Visalia-Porterville, CA Metro Area  127.213229
31000US23420               Fresno, CA Metro Area  123.807088
31000US39780            Red Bluff, CA Micro Area  119.346519
31000US32900               Merced, CA Metro Area  118.254823
31000US49700            Yuba City, CA Metro Area  118.040381
31000US17340            Clearlake, CA Micro Area  115.843692
31000US12540   Bakersfield-Delano, CA Metro Area  108.102456

So I don't know if this is helpful, and I may have made a mistake or two, but I thought I'd log it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment