Skip to content

Instantly share code, notes, and snippets.

@francisbarton
Last active October 20, 2021 03:23
Show Gist options
  • Save francisbarton/aea23b429688033af96ba1df96988a8a to your computer and use it in GitHub Desktop.
Save francisbarton/aea23b429688033af96ba1df96988a8a to your computer and use it in GitHub Desktop.
solution to my terrible mutate problem, via Eugene Chong on StackOverflow
I asked [a question on Stack Overflow][soq] about a super-annoying problem I was experiencing.
I created a [reprex][repr] for it and posted it [as a gist here][gist1] but in the end I did not need to point to the whle reprex: the slightly edited, shorter reprex I posted on the SO q was sufficient.
Within a matter of minutes the question had received a very accurate and helpful reply from [Eugene Chong][ec_up].
[soq]: https://stackoverflow.com/questions/60155799/how-can-i-use-map-and-mutate-to-convert-a-list-into-a-set-of-additional-columns
[repr]: https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html
[gist1]: https://gist.github.com/francisbarton/3c9f755a7f17ce5624edb9d4da0f4f59
[ec_up]: https://www.design.upenn.edu/city-regional-planning/graduate/work/developing-new-metrics-transportation-safety-cyclist-and
I converted Eugene's answer into a function (`expand_table()`) and tested it, then testing it by mapping across my list of dataframes.
Once this was working (this didn't take long), I was then able to combine the three functions (`get_concept_list()`, `get_concept_info()` and `expand_table()`) into a single function `get_concept_data()`.
The whole bunch of code is given below.
```{r}
# load packages -----------------------------------------------------------
library(rlang)
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(nomisr)
```
```{r}
# set up initial list of tibbles ------------------------------------------
df <- list(
district_population = tibble(
dataset_title = "Population estimates - local authority based by single year",
dataset_id = "NM_2002_1"
),
jsa_claimants = tibble(
dataset_title = "Jobseeker\'s Allowance with rates and proportions",
dataset_id = "NM_1_1"
)
)
```
My initial functions:
```{r}
# function to get info on each concept (except geography) -----------------
get_concept_info <- function(df, concept_name) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id, select = "dimensions") %>%
pluck("value", 1, "dimension") %>%
filter(concept == concept_name) %>%
pluck("codes.code", 1) %>%
select(name, value) %>%
nest(data = everything()) %>%
as.list() %>%
pluck("data")
}
get_concept_list <- function(df) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id,
select = c("dimensions", "codes")) %>%
pluck("value", 1, "dimension") %>%
filter(!concept == "geography") %>%
pull("concept")
}
```
Eugene's answer, turned into a function:
```{r}
expand_table <- function(df) {
map(get_concept_list(df),
~ mutate(df,
!!.x := get_concept_info(df, .x))) %>%
reduce(left_join, by = c("dataset_title", "dataset_id"))
}
df2 <- df %>% map(., ~ expand_table(.))
```
I then wanted to make a single function to flow from dataset input to dataset output.
I have read that this is not best practice as longer functions are less easily parsed, and are harder to debug.
But as long as I am sure it is working right, it would be good to just have everything in one place.
I think.
Hmmm.
Anyway, it was a good bit of learning to assemble this.
I was aware that in my two `get_concept*` functions above I was duplicating a call to `nomis_overview()`, a duplication that could easily be eliminated by storing the result first time within the function.
I realised I could also get this function to deal with the `geography` concept in a different call.
But I will leave that for now.
I think I might disassemble this back into separate more specific functions (cf the Unix philosophy), a process which is much easier to envisage and to manage once you know you have actually got it working overall.
Here's my final function:
```{r}
get_concept_data <- function(df) {
dataset_id <- pluck(df, "dataset_id")
data <- nomis_overview(id = dataset_id,
select = c("dimensions", "codes")) %>%
pluck("value", 1, "dimension")
data %>%
filter(!concept == "geography") %>%
pull("concept") %>%
map(., ~ mutate(df,
!!.x := filter(.data = data, concept == .x) %>%
pluck("codes.code", 1) %>%
select(name, value) %>%
nest(data = everything()) %>%
as.list %>%
pluck("data"))) %>%
reduce(left_join, by = c("dataset_title", "dataset_id"))
}
df3 <- df %>% map(., ~ get_concept_data(.))
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment