Skip to content

Instantly share code, notes, and snippets.

@stevenleeg
Last active November 28, 2021 17:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stevenleeg/5407617fa621e3cffb7fad56a40625ca to your computer and use it in GitHub Desktop.
Save stevenleeg/5407617fa621e3cffb7fad56a40625ca to your computer and use it in GitHub Desktop.
A description of data sources and methodology used in generating the maps and conclusions in my recent blog post, "Visualizing the topography of Citibike"

Citibike topography methodology

This document outlines the methodology and data sources used in Visualizing the topography of Citibike

Data Sources

  • Used this Python script to download all citibike stations
  • Used NYC Neighborhood Tabluation Area geography files.
  • Used MapPLUTO data for household calculations.
  • Fetched 2015-2019 ACS data and census tract geographies from the National Historical Geographic Information System data portal.

Steps

  1. Areas within 0.5km of a Citibike station
    • Imported Citibike station CSV from Python script
    • Reproject to New York/Long Island CRS
    • Create a buffer of 0.5km around each station
    • Dissolve all buffers into a single polygon
    • Clip the polygon using the NTA polygons
  2. Households served (within the 0.5km range)
    • Imported MapPLUTO data
    • Clip using the 0.5km station buffers
    • Ran the Basic statistics operation on...
      • Unclipped MapPLUTO data to get the total number of households
      • Clipped MapPLUTO data to get the total number of households within 0.5km of a station
    • Calculated percentages based on these values
  3. Neighborhood station capacity
    • Imported Citibike station CSV from Python script
    • Ran Join attributes by location (summary) operation
      • Summed up capacity column of each station per neighborhood
    • Created a new column: capacity_count / ($area * 100) to generate capacity_per_100sqkm
    • Visualized the column onto the NTA map
  4. Neighborhood station capacity in NTAs below the poverty line
    • Fetched and imported census tract geographies sourced from NHGIS
    • Fetched and joined NHGIS 2015-2019 ACS median income per-household data
    • Used NTA geography files
    • Generate centroids of each polygon
    • Run Join attributes by location (summary) operation to merge ACS data into NTA polygons
      • Used the median of the median income field
    • Filtered out NTAs below the poverty line of $35k
    • Ran Join attributes by location (summary) to merge station data with NTA polygons
      • Summed up capacity column
    • Created a new column: capacity_count / $area * 100 to generate capacity_per_100sqkm
    • Visualized the column onto map along with station locations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment