Skip to content

Instantly share code, notes, and snippets.

@d-wasserman
Last active August 16, 2018 18:14
Show Gist options
  • Save d-wasserman/2988f4c0b0588616c5d0437340ae508e to your computer and use it in GitHub Desktop.
Save d-wasserman/2988f4c0b0588616c5d0437340ae508e to your computer and use it in GitHub Desktop.
Take input census blocks and census block groups. Develop spatial joins of relationships and proportionally allocate Census block group estimates to blocks based on block proportions.
---
title: "Census Proportional Downsampling"
author: "David Wasserman"
date: "August 6, 2018"
purpose: "Take input census blocks and census block groups. Develop spatial joins of relationships and proportionally allocate Census block group estimates to blocks based on block proportions."
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::knit_engines$set(python=reticulate::eng_python)
library(ggplot2)
library(arcgisbinding)
library(tidyr)
library(reticulate)
library(dplyr)
arc.check_product()
```
## Census Down Sampling
This R notebook downsamples ACS Estimates to Census Block Group level based on the relative proproportions of population and households in 2010 blocks that make up each block group.
Process is as follows:
1. Spatially join Census BG to Census Blocks by Centroid
2. Identify the percent of population and households each census block is of each block group geography
3. Apply those weights to the appropriate columns from ACS Estimates (Getting approximate households from ACS based on 2010 Housing Units)
```{python engine.path="C:/Users/dwasserman/AppData/Local/ESRI/conda/envs/geoscipy/python.exe"}
import arcpy
import os
arcpy.env.overwriteOutput = True
workspace = r"./Project_Data.gdb/BaseData"
census_bg= os.path.join(workspace, "Bay_Area_Census_BG_2016_5YR_WData")
blocks = os.path.join(workspace, "Bay_Area_Census_Blocks_2010_WJobs")
out_fc = os.path.join(workspace,"Bay_Area_Census_Blocks_WBG_Data")
print("Processing spatial join...")
def generate_statistical_fieldmap(target_features, join_features, prepended_name="_", merge_rule_dict={}):
"""Generates field map object based on passed field objects based on passed tables (list),
input_field_objects (list), and passed statistics fields to choose for numeric and categorical variables. Output
fields take the form of *merge rule*+*prepended_name*+*fieldname*
:params
target_features(str): target feature class that will maintain its field attributes
join_features(str): join feature class whose numeric fields will be joined based on the merge rule dictionary
prepended_name(str): modifies output join fields with param text between the statistics and the original field name
merge_rule_dict (dict): a dictionary of the form {statistic_type:[Fields,To,Summarize]}
:returns arcpy field mapping object"""
field_mappings = arcpy.FieldMappings()
# We want every field in 'target_features' and all fields in join_features that are present
# in the field statistics mappping.
field_mappings.addTable(target_features)
for merge_rule in merge_rule_dict:
for field in merge_rule_dict[merge_rule]:
new_field_map = arcpy.FieldMap()
new_field_map.addInputField(join_features, field)
new_field_map.mergeRule = merge_rule
out_field = new_field_map.outputField
out_field.name = str(merge_rule) + str(prepended_name) + str(field)
out_field.aliasName = str(merge_rule) + str(prepended_name) + str(field)
new_field_map.outputField = out_field
field_mappings.addFieldMap(new_field_map)
return field_mappings
fmap = generate_statistical_fieldmap(blocks,census_bg,merge_rule_dict={"FIRST":["GEOID","B01001e1","B11001e1"]})
arcpy.SpatialJoin_analysis(blocks,census_bg,out_fc,match_option="HAVE_THEIR_CENTER_IN",field_mapping=fmap)
print("Output FC Joined:",out_fc)
```
```{r ,echo=FALSE}
workspace = "./Project_Data.gdb/BaseData"
data.path = file.path(workspace,"Bay_Area_Census_Blocks_WBG_Data")
print("Block Group Columns for Population & Households: B01001e1 | B11001e1")
print("Block Data Columns for Population & Households: POP10 | HOUSING10")
arcds = arc.open(data.path)
df <- arc.select(arcds)
print(arc.shapeinfo(arc.shape(df)))
```
```{r , echo=FALSE}
head(df)
```
```{r , echo=FALSE}
df <- group_by(df,FIRST_GEOID) %>%
mutate(POP_BLKGRP_PERCENT = POP10/sum(POP10)) %>%
mutate(HH_BLKGRP_PERCENT = HOUSING10/sum(HOUSING10)) %>%
mutate(POP_BLKGRP_ALLOC= POP_BLKGRP_PERCENT*FIRST_B01001e1) %>%
mutate(HH_BLKGRP_ALLOC= HH_BLKGRP_PERCENT*FIRST_B11001e1) %>%
mutate(POP_BLKGRP_ALRND= round(POP_BLKGRP_PERCENT*FIRST_B01001e1)) %>%
mutate(HH_BLKGRP_ALRND= round(HH_BLKGRP_PERCENT*FIRST_B11001e1)) %>%
ungroup(df)
head(df)
```
## Review Plots
The following R code provides data review plots.
```{r , echo=FALSE}
df.totals <- select(df,POP10,HOUSING10,POP_BLKGRP_ALLOC,HH_BLKGRP_ALLOC,POP_BLKGRP_ALRND,HH_BLKGRP_ALRND) %>%
gather(BLKPOP10=POP10,BLKPOPAlloc=POP_BLKGRP_ALLOC,BLKPOPAlrnd=POP_BLKGRP_ALRND,BLKHOUSING10=HOUSING10, BLKHHAlloc=HH_BLKGRP_ALLOC, BLKHHAlrnd=HH_BLKGRP_ALRND) %>%
group_by(key) %>% summarize(Totals = sum(value,na.rm=TRUE),Std = sd(value,na.rm=TRUE), NaNs = sum(is.na(value)))
df.totals["Class"] <- c("HH","HH","HH", "POP", "POP", "POP")
df.totals
```
```{r , echo=FALSE}
plot <- ggplot(data = df.totals, aes(x=key, y=Totals,fill=Class)) +
geom_bar(stat="identity", position=position_dodge()) +
geom_errorbar(aes(ymin=Totals-Std*2, ymax=Totals+Std*2), width=.5, position=position_dodge(.9)) +
labs(title="Control Totals for Housing and Population", y = "Totals", x="Allocation Sub-Groups") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
plot
```
Export the results to shapefile.
```{r , echo=FALSE}
final_fc <- file.path(workspace,"Census_Blocks_2010_With_BG_Allocations_Fin")
arc.write(final_fc,df)
print("Export complete.")
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment