Created
July 27, 2017 18:04
-
-
Save plpxsk/4f84dfaef0ad3d9d98ff5b0d29ebac28 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "A new analysis workflow" | |
output: github_document | |
--- | |
# Organize your data processing program with MECE pieces | |
*MECE = Mutually exclusive, collectively exhaustive. From McKinsey* | |
## Summary | |
When processing an input dataset, instead of creating many copies of it with | |
names like data1, data2, data3, which has its problems, instead create mutually | |
exclusive pieces, and then just merge them together at the end. | |
# Details | |
You often input a dataset and then need to manipulate it | |
This is often how you are taught this in school. | |
This is sort of what that looks like: | |
```{r} | |
asl <- read.csv("asl.csv") | |
asl1 <- asl %>% mutate(newvar=oldvar/12) | |
asl2 <- asl1 %>% mutate(usubjid = pt) | |
asl_final <- asl2 | |
``` | |
Problems with this approach: | |
* it is hard to keep track of all of these pieces | |
* if things change, you have to rename all the numbers | |
# A better approach | |
For a better approach, at each step, create a mutually exclusive data frame that contains | |
only what you need, and at the end, merge all the pieces together. Use | |
informative names for these pieces. | |
advantages | |
* no need to constantly reorder and rename pieces that end in numbers | |
Here is a real example: | |
```{r} | |
asl <- get_csv("data/clinical/asl.csv") | |
## STEP1 : process the input dataset using MECE pieces | |
## one piece: | |
asl_study_flags <- asl %>% | |
select(usubjid, studyid) %>% | |
mutate(...) | |
select(-studyid) | |
## another piece: | |
asl_new_censor_vars <- asl %>% | |
select(usubjid, oscnsr, pfscnsr) %>% | |
mutate(...) | |
## another piece: | |
asl_biomarker_flags <- asl0 %>% | |
select(usubjid) %>% | |
left_join(...) | |
mutate(...) | |
## STEP 2: at the end, join the mutually exclusive pieces | |
asl_edited <- asl %>% | |
left_join(asl_study_flags) %>% | |
left_join(asl_biomarker_flags) %>% | |
left_join(asl_new_censor_vars) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment