Skip to content

Instantly share code, notes, and snippets.

View JoeGermuska's full-sized avatar

Joe Germuska JoeGermuska

View GitHub Profile
@JoeGermuska
JoeGermuska / 00_README.md
Last active December 14, 2023 19:55
Read Census 2020 PL94-171 ("redistricting") files into Pandas DataFrames

Some quick work to facilitate reading data for the Census 2020 PL94-171 data release into Pandas dataframes.

Sample data for Providence County, RI can be downloaded from https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html, as can auxiliary materials.

The file headers.py was created by parsing the SAS import scripts from the link above.

It seems as though the Census Bureau removed the sample data for Providence County, RI, against which this code was tested. You can get a copy of it from http://files.censusreporter.org/ri2018_2020Style.pl.zip

Note: The full data release is now available at https://www2.census.gov/programs-surveys/decennial/2020/data/01-Redistricting_File--PL_94-171/

@JoeGermuska
JoeGermuska / encoding_fixup.sql
Last active September 8, 2022 19:41
SQL to fix importing Latin-1 text as if it were UTF-8
-- Sometimes one accidentally loads data that is in ISO8859-1 (aka "Latin-1") encoding having assumed that it was actually UTF-8
-- so far it seems like à is a good flag although if your data might also have that correctly, this is less simple...
update tiger2020.census_name_lookup
set simple_name = replace(simple_name, 'ñ', 'ñ' ),
display_name = replace(display_name, 'ñ', 'ñ' ),
prefix_match_name = replace(prefix_match_name, 'ñ', 'ñ' );
update tiger2020.census_name_lookup
set simple_name = replace(simple_name, 'ü', 'ü' ),
display_name = replace(display_name, 'ü', 'ü' ),
prefix_match_name = replace(prefix_match_name, 'ü', 'ü' );
@JoeGermuska
JoeGermuska / README.md
Last active August 18, 2022 21:00
Chicago Community Areas with region and key neighborhoods, and 2020 population by race

I had reason to want a list of Chicago Community Areas annotated to assign each to a region of the city. I couldn't find it in structured form, so I built it.

I used this research guide from Harold Washington College Libraries, which assigned each community area to a region and listed key neighborhoods. I simplified "Central, Near North, and Near South Side" to just "Central". I also simplified "West and Near West Side" to just "West."

Since part of my project included aligning population numbers, the CSV in this gist also includes a simplified version of the 2020 Decennial Census redistricting table P2, "Hispanic or Latino, and not Hispanic or Latino by Race," as downloaded from Census Reporter. The simplifications were to rename columns, and to omit all of the detailed columns for "two or more races." I also dropped the 2010 and percentage change columns, and computed the per

@JoeGermuska
JoeGermuska / csvcut
Created September 1, 2010 20:51 — forked from bycoffe/csvcut
#!/usr/bin/env python
"""
Like cut, but for CSVs. To be used from a shell command line.
Note that fields are zero-based, as opposed to 'cut' where they are 1-based.
Should use something better than getopt, but this works...
Usage:
csvcut foobar.csv
@JoeGermuska
JoeGermuska / csvcut
Created September 14, 2009 17:56
Like cut, but smart about CSV quoting
#!/usr/bin/env python
"""
Like cut, but for CSVs. To be used from a shell command line.
Change row[1] to the row index to be printed. row[1] will print the second
item in the row.
Note that fields are zero-based, as opposed to 'cut' where they are 1-based.
Leveraged from/motivated by an example from @bycoffe
@JoeGermuska
JoeGermuska / README.md
Created September 10, 2020 22:22
Get last commit dates for remote git branches

I had a repo with dozens of abandoned or merged remote branches. Before deleting them wholesale, I wanted to know just how old they were.

I adapted this Stack Overflow answer, but dressed it up in a for loop since I wanted all of them. Also, since I wanted to sort them, I switched the date format to %cs (Short ISO), which didn't work in git 2.18.0 but does work in 2.28.0

@JoeGermuska
JoeGermuska / fipsToState.json
Last active August 13, 2020 20:34 — forked from wavded/fipsToState.json
State FIPS JSON
{
"01": "Alabama",
"02": "Alaska",
"04": "Arizona",
"05": "Arkansas",
"06": "California",
"08": "Colorado",
"09": "Connecticut",
"10": "Delaware",
"11": "District of Columbia",
@JoeGermuska
JoeGermuska / gct_tables.txt
Created November 17, 2015 22:36
A list of American Community Survey (ACS) tables in the Geographic Comparison Tables series
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- States; and Puerto Rico
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- American Indian Area/Alaska Native Area/Alaska Native Regional Corporation
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Congressional District by State; and for Puerto Rico
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- County by State; and for Puerto Rico
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Places by State; and for Puerto Rico
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Metropolitan and Micropolitan Statistical Area; and for Puerto Rico
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Urban/Rural and Inside/Outside Metropolitan and Micropolitan Area
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Urbanized Area; and for Puerto Rico
GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Combined Statistical Area with Metropolitan and Micropolitan Statistic
@JoeGermuska
JoeGermuska / 01_readme.md
Last active June 17, 2020 03:22
A cross-reference of ZCTAs by state, and how it was made

A question came up on the US Census slack, leading to the recognition that the US Census Bureau API doesn't support queries for data for "all ZCTAs in a state". Nothing about the Census Bureau's definition of ZCTA requires that they be contained within a single state, which is probably why the API rejects the query with a message, error: unknown/unsupported geography heirarchy.

I've been looking for a general method to answer these kinds of questions for a long time. This Gist demonstrates a workable approach. It's based on data published by the Census LEHD LODES program, which provides, for every Census block in the US, a crosswalk indicating which geographies that block is in. (The set of geographies is limited but still very useful. See the technical doc PDF for more details.)

For any two geography types, one can simply select those two columns from the crosswalk and eliminate dupli

#!/usr/bin/env python
import json
import os, os.path
DEFAULT_PROFILE_NAME = "default.2m2"
SESSION_PATH = "Library/Application Support/Firefox/Profiles/%s/sessionstore.js"
# TODO consider generalizing the session store to open (different users, different profiles)
def session_path():
return os.path.join(os.environ['HOME'],SESSION_PATH % DEFAULT_PROFILE_NAME)