cokelly/USA Census Data

## USA Census Data
---
title: Child Poverty and Race, USA 2016
author: Ciaran
date: '2018-05-01'
slug: child-poverty-and-race-usa
categories:
  - rstats
tags:
  - tidy_tuesday
  - inequality
  - race
header:
  caption: ''
  image: ''
---


```{r setup, echo=FALSE, include=FALSE}
knitr::opts_chunk$set(cache = TRUE,
                      echo = FALSE)

library(tidyverse)
library(cowplot)
library(kableExtra)

```

This week's [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) invited us to examine county-level census data from 2016 in the United States (sourced [here](https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml)). I have been poking around in the data and discovered an interesting visual artefact in the following otherwise unsurprising quick plot:

```{r poverty_unemployment}

county_data_with_region <- county_data %>% # Divide states by regions
      mutate(Region = case_when(State ==  "Connecticut" |
                                      State ==  "Maine" |
                                      State ==  "Massachusetts" |
                                      State ==  "New Hampshire" |
                                      State ==  "Rhode Island" |
                                      State == "Vermont" |
                                      State ==  "New Jersey" |
                                      State ==  "New York" |
                                      State == "Pennsylvania" ~ "Northeast",
                                State ==  "Illinois" |
                                      State == "Indiana" |
                                      State == "Michigan" |
                                      State == "Ohio" |
                                      State == "Wisconsin" |
                                      State == "Iowa" |
                                      State == "Kansas" |
                                      State == "Minnesota" |
                                      State == "Missouri" |
                                      State == "Nebraska" |
                                      State == "North Dakota" |
                                      State == "South Dakota" ~ "Midwest",
                                State ==  "Delaware" |
                                      State == "Florida" |
                                      State == "Georgia" |
                                      State == "Maryland" |
                                      State == "North Carolina" |
                                      State == "South Carolina" |
                                      State == "Virginia" |
                                      State == "District of Columbia" |
                                      State == "West Virginia" |
                                      State == "Alabama" |
                                      State == "Kentucky" |
                                      State == "Mississippi" |
                                      State == "Tennessee" |
                                      State == "Arkansas" |
                                      State == "Louisiana" |
                                      State == "Oklahoma" |
                                      State == "Texas" ~ "South",
                                State == "Arizona" |
                                      State == "Colorado" |
                                      State == "Idaho" |
                                      State == "Montana" |
                                      State == "Nevada" |
                                      State == "New Mexico" |
                                      State == "Utah" |
                                      State == "Wyoming" |
                                      State == "Alaska" |
                                      State == "California" |
                                      State == "Hawaii" |
                                      State == "Oregon" |
                                      State == "Washington"  ~ "West",
                                TRUE ~ "Puerto Rico")) %>%
      filter(Region != "Puerto Rico")
```

```{r make_plot}
poverty_plot <- ggplot(county_data_with_region, aes(White, ChildPoverty)) + # Compare Child Poverty to total white people, by region
      geom_point(na.rm = TRUE, alpha = 0.2, colour = "lightblue") +
      geom_smooth(method = loess, na.rm = TRUE) +
      scale_color_brewer(type = "seq", palette = "Blues") +
      facet_grid(. ~ Region) +
      background_grid(major = "xy", minor = "none") +
      ggtitle("Child poverty falls as the proportion of white people increases.", subtitle = "Note the 'tipping point' at about 75White (%) in the South and Northeast") +
      xlab("White (%)") +
      ylab("% children in poverty")


poverty_plot
```

I split the USA's `r prettyNum(nrow(county_data_with_region), big.mark = ",")` counties (excluding Puerto Rico) into regions, based on the US Census Bureau's [four statistical regions](https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf) (pdf). I excluded Puetro Rico because its racial breakdown does not fit the exercise here. And I chose to classify counties by White (%) because that seems the best way to capture the percentage of the population from racial minorities. No doubt this is a hugely crude way to go about the exercise.

Anyway, even with my basic familiarity with the United States I am hardly surprised to see child poverty correlate negatively with the proportion of the population that is white. What _is_ surprising I think is the way that child poverty seems higher in the Northeast and especially in the South in more homogeneous counties.

Still, we have to be careful. When we look at the Southern counties what we are seeing here _might_ relate to a combination of [rural poverty](https://www.prb.org/childpoverty/) and an uneven distribution of people by race. Indeed, it is very striking _how_ segregated the United States are.

```{r segregation}
county_data_south <- county_data_with_region %>% filter(Region == "South")

segregation <- county_data_south %>%
      mutate(segregated = case_when(Hispanic >= 75 | #Test for any county having a 75% of more racial share
                                          White >= 75 |
                                          Black >= 75 |
                                          Native >= 75 |
                                          Asian >= 75 |
                                          Pacific >= 75 ~ "Yes", TRUE ~ "No")) %>%
      group_by(segregated, State) %>% # Group by whether segregated plus by state
      summarise(Segregated2 = n()) %>%
      tidyr::spread(., segregated, Segregated2) %>% # Spread to "yes" or "no" columns
      mutate_all(funs(replace(., is.na(.), 0))) %>% # Replace NAs
      mutate(Counties = (Yes+No)) %>% # Total number of counties
      mutate(`Segregated (%)` = round((Yes/Counties)*100, digits = 0)) %>% #Get segregated %
      select(State, Counties, Segregated = Yes, `Not Segregated` = No, `Segregated (%)`) %>%
      arrange(desc(`Segregated (%)`))

knitr::kable(segregation, format = "html", align = "l", caption = "Counties with population composed of 75% or more from one racial group") %>%
      kable_styling(full_width = FALSE, bootstrap_options = "striped")


```


But note also that each state (and I imagine at more granular levels than that) has its own population distributions:

```{r plot2, warning=FALSE}


south_plot <- ggplot(county_data_south, aes(White, ChildPoverty)) +
      geom_point(na.rm = TRUE, alpha = 0.2, colour = "lightblue") +
      geom_smooth(method = loess, na.rm = TRUE) +
      facet_wrap(~ State, ncol = 4) +
      scale_color_brewer(type = "seq", palette = "Blues") +
      background_grid(major = "xy", minor = "none") +
      xlab("White (%)") +
      ylab("% children in poverty") +
      theme(axis.ticks.y = element_blank(),
            axis.text.y = element_blank())


south_plot
```

So, as always, more study required.

Gist [here](https://gist.github.com/cokelly/0f6a6cc73d372989408d0fab73c7de04).
	---
	title: Child Poverty and Race, USA 2016
	author: Ciaran
	date: '2018-05-01'
	slug: child-poverty-and-race-usa
	categories:
	- rstats
	tags:
	- tidy_tuesday
	- inequality
	- race
	header:
	caption: ''
	image: ''
	---


	```{r setup, echo=FALSE, include=FALSE}
	knitr::opts_chunk$set(cache = TRUE,
	echo = FALSE)

	library(tidyverse)
	library(cowplot)
	library(kableExtra)

	```

	This week's [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) invited us to examine county-level census data from 2016 in the United States (sourced [here](https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml)). I have been poking around in the data and discovered an interesting visual artefact in the following otherwise unsurprising quick plot:

	```{r poverty_unemployment}

	county_data_with_region <- county_data %>% # Divide states by regions
	mutate(Region = case_when(State == "Connecticut" \|
	State == "Maine" \|
	State == "Massachusetts" \|
	State == "New Hampshire" \|
	State == "Rhode Island" \|
	State == "Vermont" \|
	State == "New Jersey" \|
	State == "New York" \|
	State == "Pennsylvania" ~ "Northeast",
	State == "Illinois" \|
	State == "Indiana" \|
	State == "Michigan" \|
	State == "Ohio" \|
	State == "Wisconsin" \|
	State == "Iowa" \|
	State == "Kansas" \|
	State == "Minnesota" \|
	State == "Missouri" \|
	State == "Nebraska" \|
	State == "North Dakota" \|
	State == "South Dakota" ~ "Midwest",
	State == "Delaware" \|
	State == "Florida" \|
	State == "Georgia" \|
	State == "Maryland" \|
	State == "North Carolina" \|
	State == "South Carolina" \|
	State == "Virginia" \|
	State == "District of Columbia" \|
	State == "West Virginia" \|
	State == "Alabama" \|
	State == "Kentucky" \|
	State == "Mississippi" \|
	State == "Tennessee" \|
	State == "Arkansas" \|
	State == "Louisiana" \|
	State == "Oklahoma" \|
	State == "Texas" ~ "South",
	State == "Arizona" \|
	State == "Colorado" \|
	State == "Idaho" \|
	State == "Montana" \|
	State == "Nevada" \|
	State == "New Mexico" \|
	State == "Utah" \|
	State == "Wyoming" \|
	State == "Alaska" \|
	State == "California" \|
	State == "Hawaii" \|
	State == "Oregon" \|
	State == "Washington" ~ "West",
	TRUE ~ "Puerto Rico")) %>%
	filter(Region != "Puerto Rico")
	```

	```{r make_plot}
	poverty_plot <- ggplot(county_data_with_region, aes(White, ChildPoverty)) + # Compare Child Poverty to total white people, by region
	geom_point(na.rm = TRUE, alpha = 0.2, colour = "lightblue") +
	geom_smooth(method = loess, na.rm = TRUE) +
	scale_color_brewer(type = "seq", palette = "Blues") +
	facet_grid(. ~ Region) +
	background_grid(major = "xy", minor = "none") +
	ggtitle("Child poverty falls as the proportion of white people increases.", subtitle = "Note the 'tipping point' at about 75White (%) in the South and Northeast") +
	xlab("White (%)") +
	ylab("% children in poverty")


	poverty_plot
	```

	I split the USA's `r prettyNum(nrow(county_data_with_region), big.mark = ",")` counties (excluding Puerto Rico) into regions, based on the US Census Bureau's [four statistical regions](https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf) (pdf). I excluded Puetro Rico because its racial breakdown does not fit the exercise here. And I chose to classify counties by White (%) because that seems the best way to capture the percentage of the population from racial minorities. No doubt this is a hugely crude way to go about the exercise.

	Anyway, even with my basic familiarity with the United States I am hardly surprised to see child poverty correlate negatively with the proportion of the population that is white. What _is_ surprising I think is the way that child poverty seems higher in the Northeast and especially in the South in more homogeneous counties.

	Still, we have to be careful. When we look at the Southern counties what we are seeing here _might_ relate to a combination of [rural poverty](https://www.prb.org/childpoverty/) and an uneven distribution of people by race. Indeed, it is very striking _how_ segregated the United States are.

	```{r segregation}
	county_data_south <- county_data_with_region %>% filter(Region == "South")

	segregation <- county_data_south %>%
	mutate(segregated = case_when(Hispanic >= 75 \| #Test for any county having a 75% of more racial share
	White >= 75 \|
	Black >= 75 \|
	Native >= 75 \|
	Asian >= 75 \|
	Pacific >= 75 ~ "Yes", TRUE ~ "No")) %>%
	group_by(segregated, State) %>% # Group by whether segregated plus by state
	summarise(Segregated2 = n()) %>%
	tidyr::spread(., segregated, Segregated2) %>% # Spread to "yes" or "no" columns
	mutate_all(funs(replace(., is.na(.), 0))) %>% # Replace NAs
	mutate(Counties = (Yes+No)) %>% # Total number of counties
	mutate(`Segregated (%)` = round((Yes/Counties)*100, digits = 0)) %>% #Get segregated %
	select(State, Counties, Segregated = Yes, `Not Segregated` = No, `Segregated (%)`) %>%
	arrange(desc(`Segregated (%)`))

	knitr::kable(segregation, format = "html", align = "l", caption = "Counties with population composed of 75% or more from one racial group") %>%
	kable_styling(full_width = FALSE, bootstrap_options = "striped")


	```




	But note also that each state (and I imagine at more granular levels than that) has its own population distributions:

	```{r plot2, warning=FALSE}


	south_plot <- ggplot(county_data_south, aes(White, ChildPoverty)) +
	geom_point(na.rm = TRUE, alpha = 0.2, colour = "lightblue") +
	geom_smooth(method = loess, na.rm = TRUE) +
	facet_wrap(~ State, ncol = 4) +
	scale_color_brewer(type = "seq", palette = "Blues") +
	background_grid(major = "xy", minor = "none") +
	xlab("White (%)") +
	ylab("% children in poverty") +
	theme(axis.ticks.y = element_blank(),
	axis.text.y = element_blank())


	south_plot
	```

	So, as always, more study required.

	Gist [here](https://gist.github.com/cokelly/0f6a6cc73d372989408d0fab73c7de04).