Last active
October 20, 2021 03:23
-
-
Save francisbarton/aea23b429688033af96ba1df96988a8a to your computer and use it in GitHub Desktop.
solution to my terrible mutate problem, via Eugene Chong on StackOverflow
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I asked [a question on Stack Overflow][soq] about a super-annoying problem I was experiencing. | |
I created a [reprex][repr] for it and posted it [as a gist here][gist1] but in the end I did not need to point to the whle reprex: the slightly edited, shorter reprex I posted on the SO q was sufficient. | |
Within a matter of minutes the question had received a very accurate and helpful reply from [Eugene Chong][ec_up]. | |
[soq]: https://stackoverflow.com/questions/60155799/how-can-i-use-map-and-mutate-to-convert-a-list-into-a-set-of-additional-columns | |
[repr]: https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html | |
[gist1]: https://gist.github.com/francisbarton/3c9f755a7f17ce5624edb9d4da0f4f59 | |
[ec_up]: https://www.design.upenn.edu/city-regional-planning/graduate/work/developing-new-metrics-transportation-safety-cyclist-and | |
I converted Eugene's answer into a function (`expand_table()`) and tested it, then testing it by mapping across my list of dataframes. | |
Once this was working (this didn't take long), I was then able to combine the three functions (`get_concept_list()`, `get_concept_info()` and `expand_table()`) into a single function `get_concept_data()`. | |
The whole bunch of code is given below. | |
```{r} | |
# load packages ----------------------------------------------------------- | |
library(rlang) | |
library(dplyr) | |
library(tidyr) | |
library(magrittr) | |
library(purrr) | |
library(nomisr) | |
``` | |
```{r} | |
# set up initial list of tibbles ------------------------------------------ | |
df <- list( | |
district_population = tibble( | |
dataset_title = "Population estimates - local authority based by single year", | |
dataset_id = "NM_2002_1" | |
), | |
jsa_claimants = tibble( | |
dataset_title = "Jobseeker\'s Allowance with rates and proportions", | |
dataset_id = "NM_1_1" | |
) | |
) | |
``` | |
My initial functions: | |
```{r} | |
# function to get info on each concept (except geography) ----------------- | |
get_concept_info <- function(df, concept_name) { | |
dataset_id <- pluck(df, "dataset_id") | |
nomis_overview(id = dataset_id, select = "dimensions") %>% | |
pluck("value", 1, "dimension") %>% | |
filter(concept == concept_name) %>% | |
pluck("codes.code", 1) %>% | |
select(name, value) %>% | |
nest(data = everything()) %>% | |
as.list() %>% | |
pluck("data") | |
} | |
get_concept_list <- function(df) { | |
dataset_id <- pluck(df, "dataset_id") | |
nomis_overview(id = dataset_id, | |
select = c("dimensions", "codes")) %>% | |
pluck("value", 1, "dimension") %>% | |
filter(!concept == "geography") %>% | |
pull("concept") | |
} | |
``` | |
Eugene's answer, turned into a function: | |
```{r} | |
expand_table <- function(df) { | |
map(get_concept_list(df), | |
~ mutate(df, | |
!!.x := get_concept_info(df, .x))) %>% | |
reduce(left_join, by = c("dataset_title", "dataset_id")) | |
} | |
df2 <- df %>% map(., ~ expand_table(.)) | |
``` | |
I then wanted to make a single function to flow from dataset input to dataset output. | |
I have read that this is not best practice as longer functions are less easily parsed, and are harder to debug. | |
But as long as I am sure it is working right, it would be good to just have everything in one place. | |
I think. | |
Hmmm. | |
Anyway, it was a good bit of learning to assemble this. | |
I was aware that in my two `get_concept*` functions above I was duplicating a call to `nomis_overview()`, a duplication that could easily be eliminated by storing the result first time within the function. | |
I realised I could also get this function to deal with the `geography` concept in a different call. | |
But I will leave that for now. | |
I think I might disassemble this back into separate more specific functions (cf the Unix philosophy), a process which is much easier to envisage and to manage once you know you have actually got it working overall. | |
Here's my final function: | |
```{r} | |
get_concept_data <- function(df) { | |
dataset_id <- pluck(df, "dataset_id") | |
data <- nomis_overview(id = dataset_id, | |
select = c("dimensions", "codes")) %>% | |
pluck("value", 1, "dimension") | |
data %>% | |
filter(!concept == "geography") %>% | |
pull("concept") %>% | |
map(., ~ mutate(df, | |
!!.x := filter(.data = data, concept == .x) %>% | |
pluck("codes.code", 1) %>% | |
select(name, value) %>% | |
nest(data = everything()) %>% | |
as.list %>% | |
pluck("data"))) %>% | |
reduce(left_join, by = c("dataset_title", "dataset_id")) | |
} | |
df3 <- df %>% map(., ~ get_concept_data(.)) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment