Skip to content

Instantly share code, notes, and snippets.

@plpxsk
Created July 27, 2017 18:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save plpxsk/4f84dfaef0ad3d9d98ff5b0d29ebac28 to your computer and use it in GitHub Desktop.
Save plpxsk/4f84dfaef0ad3d9d98ff5b0d29ebac28 to your computer and use it in GitHub Desktop.
---
title: "A new analysis workflow"
output: github_document
---
# Organize your data processing program with MECE pieces
*MECE = Mutually exclusive, collectively exhaustive. From McKinsey*
## Summary
When processing an input dataset, instead of creating many copies of it with
names like data1, data2, data3, which has its problems, instead create mutually
exclusive pieces, and then just merge them together at the end.
# Details
You often input a dataset and then need to manipulate it
This is often how you are taught this in school.
This is sort of what that looks like:
```{r}
asl <- read.csv("asl.csv")
asl1 <- asl %>% mutate(newvar=oldvar/12)
asl2 <- asl1 %>% mutate(usubjid = pt)
asl_final <- asl2
```
Problems with this approach:
* it is hard to keep track of all of these pieces
* if things change, you have to rename all the numbers
# A better approach
For a better approach, at each step, create a mutually exclusive data frame that contains
only what you need, and at the end, merge all the pieces together. Use
informative names for these pieces.
advantages
* no need to constantly reorder and rename pieces that end in numbers
Here is a real example:
```{r}
asl <- get_csv("data/clinical/asl.csv")
## STEP1 : process the input dataset using MECE pieces
## one piece:
asl_study_flags <- asl %>%
select(usubjid, studyid) %>%
mutate(...)
select(-studyid)
## another piece:
asl_new_censor_vars <- asl %>%
select(usubjid, oscnsr, pfscnsr) %>%
mutate(...)
## another piece:
asl_biomarker_flags <- asl0 %>%
select(usubjid) %>%
left_join(...)
mutate(...)
## STEP 2: at the end, join the mutually exclusive pieces
asl_edited <- asl %>%
left_join(asl_study_flags) %>%
left_join(asl_biomarker_flags) %>%
left_join(asl_new_censor_vars)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment