stevenleeg/methodology.md Secret

## methodology.md

      
    Raw
  

              methodology.md
            
          
    Citibike topography methodology

This document outlines the methodology and data sources used in Visualizing the topography of Citibike
Data Sources


Used this Python script to download all citibike stations
Used NYC Neighborhood Tabluation Area geography files.
Used MapPLUTO data for household calculations.
Fetched 2015-2019 ACS data and census tract geographies from the National Historical Geographic Information System data portal.

Steps


Areas within 0.5km of a Citibike station

Imported Citibike station CSV from Python script
Reproject to New York/Long Island CRS
Create a buffer of 0.5km around each station
Dissolve all buffers into a single polygon
Clip the polygon using the NTA polygons


Households served (within the 0.5km range)

Imported MapPLUTO data
Clip using the 0.5km station buffers
Ran the Basic statistics operation on...

Unclipped MapPLUTO data to get the total number of households
Clipped MapPLUTO data to get the total number of households within 0.5km of a station


Calculated percentages based on these values


Neighborhood station capacity

Imported Citibike station CSV from Python script
Ran Join attributes by location (summary) operation

Summed up capacity column of each station per neighborhood


Created a new column: capacity_count / ($area * 100) to generate capacity_per_100sqkm
Visualized the column onto the NTA map


Neighborhood station capacity in NTAs below the poverty line

Fetched and imported census tract geographies sourced from NHGIS
Fetched and joined NHGIS 2015-2019 ACS median income per-household data
Used NTA geography files
Generate centroids of each polygon
Run Join attributes by location (summary) operation to merge ACS data into NTA polygons

Used the median of the median income field


Filtered out NTAs below the poverty line of $35k
Ran Join attributes by location (summary) to merge station data with NTA polygons

Summed up capacity column


Created a new column: capacity_count / $area * 100 to generate capacity_per_100sqkm
Visualized the column onto map along with station locations