francisbarton/map_mutate_solution.Rmd

## map_mutate_solution.Rmd
I asked [a question on Stack Overflow][soq] about a super-annoying problem I was experiencing.
I created a [reprex][repr] for it and posted it [as a gist here][gist1] but in the end I did not need to point to the whle reprex: the slightly edited, shorter reprex I posted on the SO q was sufficient.

Within a matter of minutes the question had received a very accurate and helpful reply from [Eugene Chong][ec_up].

[soq]: https://stackoverflow.com/questions/60155799/how-can-i-use-map-and-mutate-to-convert-a-list-into-a-set-of-additional-columns
[repr]: https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html
[gist1]: https://gist.github.com/francisbarton/3c9f755a7f17ce5624edb9d4da0f4f59
[ec_up]: https://www.design.upenn.edu/city-regional-planning/graduate/work/developing-new-metrics-transportation-safety-cyclist-and

I converted Eugene's answer into a function (`expand_table()`) and tested it, then testing it by mapping across my list of dataframes.

Once this was working (this didn't take long), I was then able to combine the three functions (`get_concept_list()`, `get_concept_info()` and `expand_table()`) into a single function `get_concept_data()`.

The whole bunch of code is given below.


```{r}
# load packages -----------------------------------------------------------

library(rlang)
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(nomisr)
```

```{r}
# set up initial list of tibbles ------------------------------------------

df <- list(
  district_population = tibble(
    dataset_title = "Population estimates - local authority based by single year",
    dataset_id = "NM_2002_1"
  ),
  jsa_claimants = tibble(
    dataset_title = "Jobseeker\'s Allowance with rates and proportions",
    dataset_id = "NM_1_1"
  )
)
```

My initial functions:

```{r}
# function to get info on each concept (except geography) -----------------

get_concept_info <- function(df, concept_name) {
  dataset_id <- pluck(df, "dataset_id")
  nomis_overview(id = dataset_id, select = "dimensions") %>%
    pluck("value", 1, "dimension") %>%
    filter(concept == concept_name) %>%
    pluck("codes.code", 1) %>%
    select(name, value) %>%
    nest(data = everything()) %>%
    as.list() %>%
    pluck("data")
}


get_concept_list <- function(df) {
    dataset_id <- pluck(df, "dataset_id")
    nomis_overview(id = dataset_id,
                   select = c("dimensions", "codes")) %>%
      pluck("value", 1, "dimension") %>%
      filter(!concept == "geography") %>%
      pull("concept")
}
```

Eugene's answer, turned into a function:
```{r}
expand_table <- function(df) {
  map(get_concept_list(df),
      ~ mutate(df,
               !!.x := get_concept_info(df, .x))) %>%
    reduce(left_join, by = c("dataset_title", "dataset_id"))
}

df2 <- df %>% map(., ~ expand_table(.))
```

I then wanted to make a single function to flow from dataset input to dataset output.
I have read that this is not best practice as longer functions are less easily parsed, and are harder to debug.
But as long as I am sure it is working right, it would be good to just have everything in one place.
I think.
Hmmm.
Anyway, it was a good bit of learning to assemble this.


I was aware that in my two `get_concept*` functions above I was duplicating a call to `nomis_overview()`, a duplication that could easily be eliminated by storing the result first time within the function.

I realised I could also get this function to deal with the `geography` concept in a different call.
But I will leave that for now.
I think I might disassemble this back into separate more specific functions (cf the Unix philosophy), a process which is much easier to envisage and to manage once you know you have actually got it working overall.

Here's my final function:

```{r}
get_concept_data <- function(df) {
  dataset_id <- pluck(df, "dataset_id")
  data <- nomis_overview(id = dataset_id,
                         select = c("dimensions", "codes")) %>%
    pluck("value", 1, "dimension")
  data %>%
    filter(!concept == "geography") %>%
    pull("concept") %>%
    map(., ~ mutate(df,
                    !!.x := filter(.data = data, concept == .x) %>%
                      pluck("codes.code", 1) %>%
                      select(name, value) %>%
                      nest(data = everything()) %>%
                      as.list %>%
                      pluck("data"))) %>%
    reduce(left_join, by = c("dataset_title", "dataset_id"))
}

df3 <- df %>% map(., ~ get_concept_data(.))
```
	I asked [a question on Stack Overflow][soq] about a super-annoying problem I was experiencing.
	I created a [reprex][repr] for it and posted it [as a gist here][gist1] but in the end I did not need to point to the whle reprex: the slightly edited, shorter reprex I posted on the SO q was sufficient.

	Within a matter of minutes the question had received a very accurate and helpful reply from [Eugene Chong][ec_up].

	[soq]: https://stackoverflow.com/questions/60155799/how-can-i-use-map-and-mutate-to-convert-a-list-into-a-set-of-additional-columns
	[repr]: https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html
	[gist1]: https://gist.github.com/francisbarton/3c9f755a7f17ce5624edb9d4da0f4f59
	[ec_up]: https://www.design.upenn.edu/city-regional-planning/graduate/work/developing-new-metrics-transportation-safety-cyclist-and

	I converted Eugene's answer into a function (`expand_table()`) and tested it, then testing it by mapping across my list of dataframes.

	Once this was working (this didn't take long), I was then able to combine the three functions (`get_concept_list()`, `get_concept_info()` and `expand_table()`) into a single function `get_concept_data()`.

	The whole bunch of code is given below.


	```{r}
	# load packages -----------------------------------------------------------

	library(rlang)
	library(dplyr)
	library(tidyr)
	library(magrittr)
	library(purrr)
	library(nomisr)
	```

	```{r}
	# set up initial list of tibbles ------------------------------------------

	df <- list(
	district_population = tibble(
	dataset_title = "Population estimates - local authority based by single year",
	dataset_id = "NM_2002_1"
	),
	jsa_claimants = tibble(
	dataset_title = "Jobseeker\'s Allowance with rates and proportions",
	dataset_id = "NM_1_1"
	)
	)
	```

	My initial functions:

	```{r}
	# function to get info on each concept (except geography) -----------------

	get_concept_info <- function(df, concept_name) {
	dataset_id <- pluck(df, "dataset_id")
	nomis_overview(id = dataset_id, select = "dimensions") %>%
	pluck("value", 1, "dimension") %>%
	filter(concept == concept_name) %>%
	pluck("codes.code", 1) %>%
	select(name, value) %>%
	nest(data = everything()) %>%
	as.list() %>%
	pluck("data")
	}


	get_concept_list <- function(df) {
	dataset_id <- pluck(df, "dataset_id")
	nomis_overview(id = dataset_id,
	select = c("dimensions", "codes")) %>%
	pluck("value", 1, "dimension") %>%
	filter(!concept == "geography") %>%
	pull("concept")
	}
	```

	Eugene's answer, turned into a function:
	```{r}
	expand_table <- function(df) {
	map(get_concept_list(df),
	~ mutate(df,
	!!.x := get_concept_info(df, .x))) %>%
	reduce(left_join, by = c("dataset_title", "dataset_id"))
	}

	df2 <- df %>% map(., ~ expand_table(.))
	```

	I then wanted to make a single function to flow from dataset input to dataset output.
	I have read that this is not best practice as longer functions are less easily parsed, and are harder to debug.
	But as long as I am sure it is working right, it would be good to just have everything in one place.
	I think.
	Hmmm.
	Anyway, it was a good bit of learning to assemble this.


	I was aware that in my two `get_concept*` functions above I was duplicating a call to `nomis_overview()`, a duplication that could easily be eliminated by storing the result first time within the function.

	I realised I could also get this function to deal with the `geography` concept in a different call.
	But I will leave that for now.
	I think I might disassemble this back into separate more specific functions (cf the Unix philosophy), a process which is much easier to envisage and to manage once you know you have actually got it working overall.

	Here's my final function:

	```{r}
	get_concept_data <- function(df) {
	dataset_id <- pluck(df, "dataset_id")
	data <- nomis_overview(id = dataset_id,
	select = c("dimensions", "codes")) %>%
	pluck("value", 1, "dimension")
	data %>%
	filter(!concept == "geography") %>%
	pull("concept") %>%
	map(., ~ mutate(df,
	!!.x := filter(.data = data, concept == .x) %>%
	pluck("codes.code", 1) %>%
	select(name, value) %>%
	nest(data = everything()) %>%
	as.list %>%
	pluck("data"))) %>%
	reduce(left_join, by = c("dataset_title", "dataset_id"))
	}

	df3 <- df %>% map(., ~ get_concept_data(.))
	```