Joe Germuska JoeGermuska

## 00_README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              6 stars
            
          
                JoeGermuska
                / 00_README.md
            
            
              Last active
              December 14, 2023 19:55
            
              
                Read Census 2020 PL94-171 ("redistricting") files into Pandas DataFrames
              
          
    Some quick work to facilitate reading data for the Census 2020 PL94-171 data release into Pandas dataframes.
Sample data for Providence County, RI can be downloaded from https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html, as can auxiliary materials.
The file headers.py was created by parsing the SAS import scripts from the link above.
It seems as though the Census Bureau removed the sample data for Providence County, RI, against which this code was tested. You can get a copy of it from http://files.censusreporter.org/ri2018_2020Style.pl.zip
Note: The full data release is now available at https://www2.census.gov/programs-surveys/decennial/2020/data/01-Redistricting_File--PL_94-171/

  
## encoding_fixup.sql
-- Sometimes one accidentally loads data that is in ISO8859-1 (aka "Latin-1") encoding having assumed that it was actually UTF-8
-- so far it seems like Ã is a good flag although if your data might also have that correctly, this is less simple...
update tiger2020.census_name_lookup
    set simple_name =       replace(simple_name,       'Ã±', 'ñ' ),
        display_name =      replace(display_name,      'Ã±', 'ñ' ),
        prefix_match_name = replace(prefix_match_name, 'Ã±', 'ñ' );
update tiger2020.census_name_lookup
    set simple_name =       replace(simple_name,       'Ã¼', 'ü' ),
        display_name =      replace(display_name,      'Ã¼', 'ü' ),
        prefix_match_name = replace(prefix_match_name, 'Ã¼', 'ü' );

## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                JoeGermuska
                / README.md
            
            
              Last active
              August 18, 2022 21:00
            
              
                Chicago Community Areas with region and key neighborhoods, and 2020 population by race
              
          
    I had reason to want a list of Chicago Community Areas annotated to assign each to a region of the city. I couldn't find it in structured form, so I built it.
I used this research guide from Harold Washington College Libraries, which assigned each community area to a region and listed key neighborhoods. I simplified "Central, Near North, and Near South Side" to just "Central". I also simplified "West and Near West Side" to just "West."
Since part of my project included aligning population numbers, the CSV in this gist also includes a simplified version of the 2020 Decennial Census redistricting table P2, "Hispanic or Latino, and not Hispanic or Latino by Race," as downloaded from Census Reporter. The simplifications were to rename columns, and to omit all of the detailed columns for "two or more races." I also dropped the 2010 and percentage change columns, and computed the per

  
## csvcut
#!/usr/bin/env python
"""
Like cut, but for CSVs. To be used from a shell command line.

Note that fields are zero-based, as opposed to 'cut' where they are 1-based.

Should use something better than getopt, but this works...

Usage:
    csvcut foobar.csv

## csvcut
#!/usr/bin/env python
"""
Like cut, but for CSVs. To be used from a shell command line.

Change row[1] to the row index to be printed. row[1] will print the second
item in the row.

Note that fields are zero-based, as opposed to 'cut' where they are 1-based.

Leveraged from/motivated by an example from @bycoffe

## README.md

      
              2 files
            
          
              0 forks
            
          
              1 comment
            
          
              1 star
            
          
                JoeGermuska
                / README.md
            
            
              Created
              September 10, 2020 22:22
            
              
                Get last commit dates for remote git branches
              
          
    I had a repo with dozens of abandoned or merged remote branches. Before deleting them wholesale, I wanted to know just how old they were.
I adapted this Stack Overflow answer, but dressed it up
in a for loop since I wanted all of them. Also, since I wanted to sort them, I switched the date format to %cs (Short ISO),
which didn't work in git 2.18.0 but does work in 2.28.0

  
## fipsToState.json
{
   "01": "Alabama",
   "02": "Alaska",
   "04": "Arizona",
   "05": "Arkansas",
   "06": "California",
   "08": "Colorado",
   "09": "Connecticut",
   "10": "Delaware",
   "11": "District of Columbia",

## gct_tables.txt
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- States; and Puerto Rico
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- American Indian Area/Alaska Native Area/Alaska Native Regional Corporation
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- Congressional District by State; and for Puerto Rico
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- County by State; and for Puerto Rico
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- Places by State; and for Puerto Rico
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- Metropolitan and Micropolitan Statistical Area; and for Puerto Rico
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- Urban/Rural and Inside/Outside Metropolitan and Micropolitan Area
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- Urbanized Area; and for Puerto Rico
GCT0101	MEDIAN AGE OF THE TOTAL POPULATION - United States -- Combined Statistical Area with Metropolitan and Micropolitan Statistic

## 01_readme.md

      
              6 files
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                JoeGermuska
                / 01_readme.md
            
            
              Last active
              June 17, 2020 03:22
            
              
                A cross-reference of ZCTAs by state, and how it was made
              
          
    A question came up on the US Census slack, leading to the recognition that the US Census Bureau API
doesn't support queries for data for "all ZCTAs in a state".  Nothing about the Census Bureau's definition of
ZCTA requires that they be contained within a single state, which is probably why the API rejects the query with a message,
error: unknown/unsupported geography heirarchy.
I've been looking for a general method to answer these kinds of questions for a long time. This Gist demonstrates a workable approach. It's based on data published by the Census LEHD LODES program, which provides, for every Census block in the US, a crosswalk indicating which geographies that block is in. (The set of geographies is limited but still very useful. See the technical doc PDF for more details.)
For any two geography types, one can simply select those two columns from the crosswalk and eliminate dupli

  
## Firefox Session Saver
#!/usr/bin/env python
import json
import os, os.path

DEFAULT_PROFILE_NAME = "default.2m2"
SESSION_PATH = "Library/Application Support/Firefox/Profiles/%s/sessionstore.js"

# TODO consider generalizing the session store to open (different users, different profiles)
def session_path():
    return os.path.join(os.environ['HOME'],SESSION_PATH % DEFAULT_PROFILE_NAME)
	-- Sometimes one accidentally loads data that is in ISO8859-1 (aka "Latin-1") encoding having assumed that it was actually UTF-8
	-- so far it seems like Ã is a good flag although if your data might also have that correctly, this is less simple...
	update tiger2020.census_name_lookup
	set simple_name = replace(simple_name, 'Ã±', 'ñ' ),
	display_name = replace(display_name, 'Ã±', 'ñ' ),
	prefix_match_name = replace(prefix_match_name, 'Ã±', 'ñ' );
	update tiger2020.census_name_lookup
	set simple_name = replace(simple_name, 'Ã¼', 'ü' ),
	display_name = replace(display_name, 'Ã¼', 'ü' ),
	prefix_match_name = replace(prefix_match_name, 'Ã¼', 'ü' );
	#!/usr/bin/env python
	"""
	Like cut, but for CSVs. To be used from a shell command line.

	Note that fields are zero-based, as opposed to 'cut' where they are 1-based.

	Should use something better than getopt, but this works...

	Usage:
	csvcut foobar.csv
	#!/usr/bin/env python
	"""
	Like cut, but for CSVs. To be used from a shell command line.

	Change row[1] to the row index to be printed. row[1] will print the second
	item in the row.

	Note that fields are zero-based, as opposed to 'cut' where they are 1-based.

	Leveraged from/motivated by an example from @bycoffe
	{
	"01": "Alabama",
	"02": "Alaska",
	"04": "Arizona",
	"05": "Arkansas",
	"06": "California",
	"08": "Colorado",
	"09": "Connecticut",
	"10": "Delaware",
	"11": "District of Columbia",
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- States; and Puerto Rico
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- American Indian Area/Alaska Native Area/Alaska Native Regional Corporation
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Congressional District by State; and for Puerto Rico
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- County by State; and for Puerto Rico
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Places by State; and for Puerto Rico
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Metropolitan and Micropolitan Statistical Area; and for Puerto Rico
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Urban/Rural and Inside/Outside Metropolitan and Micropolitan Area
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Urbanized Area; and for Puerto Rico
	GCT0101 MEDIAN AGE OF THE TOTAL POPULATION - United States -- Combined Statistical Area with Metropolitan and Micropolitan Statistic
	#!/usr/bin/env python
	import json
	import os, os.path

	DEFAULT_PROFILE_NAME = "default.2m2"
	SESSION_PATH = "Library/Application Support/Firefox/Profiles/%s/sessionstore.js"

	# TODO consider generalizing the session store to open (different users, different profiles)
	def session_path():
	return os.path.join(os.environ['HOME'],SESSION_PATH % DEFAULT_PROFILE_NAME)