Created
October 25, 2017 17:54
-
-
Save gvdr/3d9f9a30356e1252d307ccae5e5bb432 to your computer and use it in GitHub Desktop.
How to group by numeric variables in a dataframe and compute percentiles
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# install.packages("tidyverse") # If not yet installed, run this | |
library(tidyverse) # Everything will be don in a tidyverse fashion | |
# This is the kind of dataframe I think Roberta is dealing with. | |
# Vitd is an integer | |
# Age is a numeric | |
# We first need to cut the numeric age into a factor. | |
roberta_df <- tibble( | |
Age = as.integer(runif(100,10,100)), # Age, as an integer | |
Vitd = as.integer(runif(100,80,140)) # Vitd, as an integer | |
) | |
# To give a look at the data frame we created execute the following line | |
# roberta_df %>% View() | |
# To categorise Vitd for the age groups, | |
# we group by Age categories, | |
# and summarise computing medians and quantiles for each Age group | |
# (we could compute whatever else statistics) | |
roberta_df %>% | |
mutate(Age_category = cut(Age, # We cut the Age | |
breaks=c(0, 55, 65,75,Inf))) %>% # into groups with this boundaries | |
group_by(Age_category) %>% # We group by Age class | |
summarise(`25%`=quantile(Vitd, probs=0.25), # And compute the 25% | |
med=median(Vitd), # median | |
`75%`=quantile(Vitd, probs=0.75)) # and 75% percentile |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment