Skip to content

Instantly share code, notes, and snippets.

@gvdr
Created October 25, 2017 17:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gvdr/3d9f9a30356e1252d307ccae5e5bb432 to your computer and use it in GitHub Desktop.
Save gvdr/3d9f9a30356e1252d307ccae5e5bb432 to your computer and use it in GitHub Desktop.
How to group by numeric variables in a dataframe and compute percentiles
# install.packages("tidyverse") # If not yet installed, run this
library(tidyverse) # Everything will be don in a tidyverse fashion
# This is the kind of dataframe I think Roberta is dealing with.
# Vitd is an integer
# Age is a numeric
# We first need to cut the numeric age into a factor.
roberta_df <- tibble(
Age = as.integer(runif(100,10,100)), # Age, as an integer
Vitd = as.integer(runif(100,80,140)) # Vitd, as an integer
)
# To give a look at the data frame we created execute the following line
# roberta_df %>% View()
# To categorise Vitd for the age groups,
# we group by Age categories,
# and summarise computing medians and quantiles for each Age group
# (we could compute whatever else statistics)
roberta_df %>%
mutate(Age_category = cut(Age, # We cut the Age
breaks=c(0, 55, 65,75,Inf))) %>% # into groups with this boundaries
group_by(Age_category) %>% # We group by Age class
summarise(`25%`=quantile(Vitd, probs=0.25), # And compute the 25%
med=median(Vitd), # median
`75%`=quantile(Vitd, probs=0.75)) # and 75% percentile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment