d-wasserman/Census_Proportional_Downsampling.Rmd

## Census_Proportional_Downsampling.Rmd
---
title: "Census Proportional Downsampling"
author: "David Wasserman"
date: "August 6, 2018"
purpose: "Take input census blocks and census block groups. Develop spatial joins of relationships and proportionally allocate Census block group estimates to blocks based on block proportions."
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::knit_engines$set(python=reticulate::eng_python)
library(ggplot2)
library(arcgisbinding)
library(tidyr)
library(reticulate)
library(dplyr)
arc.check_product()
```

## Census Down Sampling
This R notebook downsamples ACS Estimates to Census Block Group level based on the relative proproportions of population and households in 2010 blocks that make up each block group.

Process is as follows:

1. Spatially join Census BG to Census Blocks by Centroid
2. Identify the percent of population and households each census block is of each block group geography
3. Apply those weights to the appropriate columns from ACS Estimates (Getting approximate households from ACS based on 2010 Housing Units)

```{python engine.path="C:/Users/dwasserman/AppData/Local/ESRI/conda/envs/geoscipy/python.exe"}
import arcpy
import os
arcpy.env.overwriteOutput = True
workspace = r"./Project_Data.gdb/BaseData"
census_bg= os.path.join(workspace, "Bay_Area_Census_BG_2016_5YR_WData")
blocks = os.path.join(workspace, "Bay_Area_Census_Blocks_2010_WJobs")
out_fc = os.path.join(workspace,"Bay_Area_Census_Blocks_WBG_Data")
print("Processing spatial join...")
def generate_statistical_fieldmap(target_features, join_features, prepended_name="_", merge_rule_dict={}):
    """Generates field map object based on passed field objects based on passed tables (list),
    input_field_objects (list), and passed statistics fields to choose for numeric and categorical variables. Output
    fields take the form of *merge rule*+*prepended_name*+*fieldname*
    :params
    target_features(str): target feature class that will maintain its field attributes
    join_features(str): join feature class whose numeric fields will be joined based on the merge rule dictionary
    prepended_name(str): modifies output join fields with param text between the statistics and the original field name
    merge_rule_dict (dict): a  dictionary of the form {statistic_type:[Fields,To,Summarize]}
    :returns arcpy field mapping object"""
    field_mappings = arcpy.FieldMappings()
    # We want every field in 'target_features' and all fields in join_features that are present
    # in the field statistics mappping.
    field_mappings.addTable(target_features)
    for merge_rule in merge_rule_dict:
        for field in merge_rule_dict[merge_rule]:
            new_field_map = arcpy.FieldMap()
            new_field_map.addInputField(join_features, field)
            new_field_map.mergeRule = merge_rule
            out_field = new_field_map.outputField
            out_field.name = str(merge_rule) + str(prepended_name) + str(field)
            out_field.aliasName = str(merge_rule) + str(prepended_name) + str(field)
            new_field_map.outputField = out_field
            field_mappings.addFieldMap(new_field_map)
    return field_mappings
fmap = generate_statistical_fieldmap(blocks,census_bg,merge_rule_dict={"FIRST":["GEOID","B01001e1","B11001e1"]})
arcpy.SpatialJoin_analysis(blocks,census_bg,out_fc,match_option="HAVE_THEIR_CENTER_IN",field_mapping=fmap)
print("Output FC Joined:",out_fc)
```


```{r  ,echo=FALSE}
workspace = "./Project_Data.gdb/BaseData"
data.path = file.path(workspace,"Bay_Area_Census_Blocks_WBG_Data")
print("Block Group Columns for Population & Households:  B01001e1 | B11001e1")
print("Block Data Columns for Population & Households: POP10 | HOUSING10")
arcds = arc.open(data.path)
df <- arc.select(arcds)
print(arc.shapeinfo(arc.shape(df)))
```
```{r , echo=FALSE}
head(df)
```
```{r , echo=FALSE}
df <- group_by(df,FIRST_GEOID) %>%
  mutate(POP_BLKGRP_PERCENT = POP10/sum(POP10)) %>%
  mutate(HH_BLKGRP_PERCENT = HOUSING10/sum(HOUSING10)) %>%
  mutate(POP_BLKGRP_ALLOC= POP_BLKGRP_PERCENT*FIRST_B01001e1) %>%
  mutate(HH_BLKGRP_ALLOC= HH_BLKGRP_PERCENT*FIRST_B11001e1) %>%
  mutate(POP_BLKGRP_ALRND= round(POP_BLKGRP_PERCENT*FIRST_B01001e1)) %>%
  mutate(HH_BLKGRP_ALRND= round(HH_BLKGRP_PERCENT*FIRST_B11001e1)) %>%
  ungroup(df)
head(df)
```
## Review Plots

The following R code provides data review plots.
```{r , echo=FALSE}
df.totals  <- select(df,POP10,HOUSING10,POP_BLKGRP_ALLOC,HH_BLKGRP_ALLOC,POP_BLKGRP_ALRND,HH_BLKGRP_ALRND) %>%
  gather(BLKPOP10=POP10,BLKPOPAlloc=POP_BLKGRP_ALLOC,BLKPOPAlrnd=POP_BLKGRP_ALRND,BLKHOUSING10=HOUSING10, BLKHHAlloc=HH_BLKGRP_ALLOC,   BLKHHAlrnd=HH_BLKGRP_ALRND) %>%
  group_by(key) %>% summarize(Totals = sum(value,na.rm=TRUE),Std = sd(value,na.rm=TRUE), NaNs = sum(is.na(value)))
df.totals["Class"] <- c("HH","HH","HH", "POP", "POP", "POP")
df.totals

```
```{r , echo=FALSE}
plot <- ggplot(data = df.totals, aes(x=key, y=Totals,fill=Class)) +
  geom_bar(stat="identity", position=position_dodge()) +
  geom_errorbar(aes(ymin=Totals-Std*2, ymax=Totals+Std*2), width=.5, position=position_dodge(.9)) +
  labs(title="Control Totals for Housing and Population", y = "Totals", x="Allocation Sub-Groups") +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

plot

```
Export the results to shapefile.
```{r , echo=FALSE}
final_fc <- file.path(workspace,"Census_Blocks_2010_With_BG_Allocations_Fin")
arc.write(final_fc,df)
print("Export complete.")

```
	---
	title: "Census Proportional Downsampling"
	author: "David Wasserman"
	date: "August 6, 2018"
	purpose: "Take input census blocks and census block groups. Develop spatial joins of relationships and proportionally allocate Census block group estimates to blocks based on block proportions."
	output: html_document
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)
	knitr::knit_engines$set(python=reticulate::eng_python)
	library(ggplot2)
	library(arcgisbinding)
	library(tidyr)
	library(reticulate)
	library(dplyr)
	arc.check_product()
	```

	## Census Down Sampling
	This R notebook downsamples ACS Estimates to Census Block Group level based on the relative proproportions of population and households in 2010 blocks that make up each block group.

	Process is as follows:

	1. Spatially join Census BG to Census Blocks by Centroid
	2. Identify the percent of population and households each census block is of each block group geography
	3. Apply those weights to the appropriate columns from ACS Estimates (Getting approximate households from ACS based on 2010 Housing Units)

	```{python engine.path="C:/Users/dwasserman/AppData/Local/ESRI/conda/envs/geoscipy/python.exe"}
	import arcpy
	import os
	arcpy.env.overwriteOutput = True
	workspace = r"./Project_Data.gdb/BaseData"
	census_bg= os.path.join(workspace, "Bay_Area_Census_BG_2016_5YR_WData")
	blocks = os.path.join(workspace, "Bay_Area_Census_Blocks_2010_WJobs")
	out_fc = os.path.join(workspace,"Bay_Area_Census_Blocks_WBG_Data")
	print("Processing spatial join...")
	def generate_statistical_fieldmap(target_features, join_features, prepended_name="_", merge_rule_dict={}):
	"""Generates field map object based on passed field objects based on passed tables (list),
	input_field_objects (list), and passed statistics fields to choose for numeric and categorical variables. Output
	fields take the form of merge rule+prepended_name+fieldname
	:params
	target_features(str): target feature class that will maintain its field attributes
	join_features(str): join feature class whose numeric fields will be joined based on the merge rule dictionary
	prepended_name(str): modifies output join fields with param text between the statistics and the original field name
	merge_rule_dict (dict): a dictionary of the form {statistic_type:[Fields,To,Summarize]}
	:returns arcpy field mapping object"""
	field_mappings = arcpy.FieldMappings()
	# We want every field in 'target_features' and all fields in join_features that are present
	# in the field statistics mappping.
	field_mappings.addTable(target_features)
	for merge_rule in merge_rule_dict:
	for field in merge_rule_dict[merge_rule]:
	new_field_map = arcpy.FieldMap()
	new_field_map.addInputField(join_features, field)
	new_field_map.mergeRule = merge_rule
	out_field = new_field_map.outputField
	out_field.name = str(merge_rule) + str(prepended_name) + str(field)
	out_field.aliasName = str(merge_rule) + str(prepended_name) + str(field)
	new_field_map.outputField = out_field
	field_mappings.addFieldMap(new_field_map)
	return field_mappings
	fmap = generate_statistical_fieldmap(blocks,census_bg,merge_rule_dict={"FIRST":["GEOID","B01001e1","B11001e1"]})
	arcpy.SpatialJoin_analysis(blocks,census_bg,out_fc,match_option="HAVE_THEIR_CENTER_IN",field_mapping=fmap)
	print("Output FC Joined:",out_fc)
	```



	```{r ,echo=FALSE}
	workspace = "./Project_Data.gdb/BaseData"
	data.path = file.path(workspace,"Bay_Area_Census_Blocks_WBG_Data")
	print("Block Group Columns for Population & Households: B01001e1 \| B11001e1")
	print("Block Data Columns for Population & Households: POP10 \| HOUSING10")
	arcds = arc.open(data.path)
	df <- arc.select(arcds)
	print(arc.shapeinfo(arc.shape(df)))
	```
	```{r , echo=FALSE}
	head(df)
	```
	```{r , echo=FALSE}
	df <- group_by(df,FIRST_GEOID) %>%
	mutate(POP_BLKGRP_PERCENT = POP10/sum(POP10)) %>%
	mutate(HH_BLKGRP_PERCENT = HOUSING10/sum(HOUSING10)) %>%
	mutate(POP_BLKGRP_ALLOC= POP_BLKGRP_PERCENT*FIRST_B01001e1) %>%
	mutate(HH_BLKGRP_ALLOC= HH_BLKGRP_PERCENT*FIRST_B11001e1) %>%
	mutate(POP_BLKGRP_ALRND= round(POP_BLKGRP_PERCENT*FIRST_B01001e1)) %>%
	mutate(HH_BLKGRP_ALRND= round(HH_BLKGRP_PERCENT*FIRST_B11001e1)) %>%
	ungroup(df)
	head(df)
	```
	## Review Plots

	The following R code provides data review plots.
	```{r , echo=FALSE}
	df.totals <- select(df,POP10,HOUSING10,POP_BLKGRP_ALLOC,HH_BLKGRP_ALLOC,POP_BLKGRP_ALRND,HH_BLKGRP_ALRND) %>%
	gather(BLKPOP10=POP10,BLKPOPAlloc=POP_BLKGRP_ALLOC,BLKPOPAlrnd=POP_BLKGRP_ALRND,BLKHOUSING10=HOUSING10, BLKHHAlloc=HH_BLKGRP_ALLOC, BLKHHAlrnd=HH_BLKGRP_ALRND) %>%
	group_by(key) %>% summarize(Totals = sum(value,na.rm=TRUE),Std = sd(value,na.rm=TRUE), NaNs = sum(is.na(value)))
	df.totals["Class"] <- c("HH","HH","HH", "POP", "POP", "POP")
	df.totals

	```
	```{r , echo=FALSE}
	plot <- ggplot(data = df.totals, aes(x=key, y=Totals,fill=Class)) +
	geom_bar(stat="identity", position=position_dodge()) +
	geom_errorbar(aes(ymin=Totals-Std2, ymax=Totals+Std2), width=.5, position=position_dodge(.9)) +
	labs(title="Control Totals for Housing and Population", y = "Totals", x="Allocation Sub-Groups") +
	theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

	plot

	```
	Export the results to shapefile.
	```{r , echo=FALSE}
	final_fc <- file.path(workspace,"Census_Blocks_2010_With_BG_Allocations_Fin")
	arc.write(final_fc,df)
	print("Export complete.")

	```