After some discussion about how a map showing that half of US income can be attributed to the major metro areas, I wanted to see if I could work out how public assistance is distributed.
The American Community Survey has a table, B19067, Aggregate Public Assistance Income for Households, which tells the total amount of dollars in public assistance income that go to each geography. As you might guess, the biggest metro areas have the most dollars.
So I downloaded some CSVs from Census Reporter for all CBSA (metro/micropolitan) areas in the US:
- B19067 Aggregate Public Assistance Income for Households
- B19057 Public Assistance Income in the Past 12 Months for Households
- B01003 Total Population
That last is because at first I was going to divide the public assistance income among people. Then I thought I should probably use households, and table B19057 includes total households, as well as the number of households receiving public assistance.
I downloaded the data as CSV from Census Reporter. That data comes as a ZIP file per table, with metadata that we can ignore for now. I unzipped the downloads, and renamed the CSV files to simpler names for easier typing. Then I used csvkit
to join the tables together:
csvjoin -c geoid b01003.csv b19057.csv b19067.csv | csvcut -c 1,2,B01003001,B19057001,B19057002,B19057003,B19067001 > combined.csv
This command saves us from the duplicated geoid
and name
columns. It also discards the margin of error. That's just because this is a casual exercise.
Next, to pandas
, which I don't really know all that well, but which I can get to do what I want here.
import pandas as pd
data = pd.read_csv("combined.csv",index_col="geoid")
data['perpahh'] = data['B19067001'].astype(float)/data['B19057002'].astype(float)
data.sort('perpahh',ascending=False,inplace=True)
print data[['name','perpahh']][:10]
name perpahh
geoid
31000US22060 Faribault-Northfield, MN Micro Area 8023.707665
31000US36140 Ocean City, NJ Metro Area 7441.758242
31000US45540 The Villages, FL Micro Area 6983.609959
31000US27540 Jasper, IN Micro Area 6964.754098
31000US43260 Sheridan, WY Micro Area 6909.302326
31000US45000 Susanville, CA Micro Area 6550.404858
31000US31460 Madera-Chowchilla, CA Metro Area 6364.470976
31000US41940 San Jose-Sunnyvale-Santa Clara, CA Metro Area 6195.218489
31000US27340 Jacksonville, NC Metro Area 6185.892116
31000US28180 Kapaa, HI Micro Area 6146.725441
So this shows us the top 10 CBSAs by average amount of public assistance given to households getting any public assistance.
We can add some more summary statistics:
data['perhh'] = data['B19067001'].astype(float)/data['B19057001'].astype(float)
data.sort('perhh',ascending=False,inplace=True)
print data[['name','perhh']][:10]
name perhh
geoid
31000US18860 Crescent City, CA Micro Area 502.916624
31000US47300 Visalia-Porterville, CA Metro Area 432.417921
31000US32900 Merced, CA Metro Area 407.005745
31000US23420 Fresno, CA Metro Area 401.295100
31000US44900 Summerville, GA Micro Area 376.434075
31000US31460 Madera-Chowchilla, CA Metro Area 367.527281
31000US16500 Centralia, WA Micro Area 359.936919
31000US12540 Bakersfield-Delano, CA Metro Area 358.303249
31000US49700 Yuba City, CA Metro Area 353.179243
31000US10260 Adjuntas, PR Micro Area 346.668980
data['percapita'] = data['B19067001'].astype(float)/data['B01003001']
data.sort('percapita',ascending=False,inplace=True)
print data[['name','percapita']][:10]
name percapita
geoid
31000US18860 Crescent City, CA Micro Area 171.245087
31000US16500 Centralia, WA Micro Area 140.759161
31000US44900 Summerville, GA Micro Area 135.780737
31000US47300 Visalia-Porterville, CA Metro Area 127.213229
31000US23420 Fresno, CA Metro Area 123.807088
31000US39780 Red Bluff, CA Micro Area 119.346519
31000US32900 Merced, CA Metro Area 118.254823
31000US49700 Yuba City, CA Metro Area 118.040381
31000US17340 Clearlake, CA Micro Area 115.843692
31000US12540 Bakersfield-Delano, CA Metro Area 108.102456
So I don't know if this is helpful, and I may have made a mistake or two, but I thought I'd log it.