Skip to content

Instantly share code, notes, and snippets.

@lejon
Last active January 10, 2019 05:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lejon/cac7b6214c1737207c9f1c97dd2afdba to your computer and use it in GitHub Desktop.
Save lejon/cac7b6214c1737207c9f1c97dd2afdba to your computer and use it in GitHub Desktop.
Plot the distribution of variables in a data frame
#' plot_df_dists
#'
#' Takes a dataframe and produces plots of the distributions of
#' all variables in the dataframe. Variables containing only
#' unique values should be filtered out before calling this
#' function since it is quite pointless plotting distributions
#' for these types of variables and it takes a long time too
#'
#' @param df input data frame
#'
#' @return a list of distribution plots, one for each variable in the dataframe
#' @export
#'
#' @examples
plot_df_dists <- function(df, fill_var = NULL) {
dist_plots <- list()
if(is.null(fill_var)) {
for(var_x in names(df)) {
p <-
ggplot(df) +
aes_string(var_x)
if(is.numeric(df[[var_x]])) {
p <- p + geom_density()
} else {
p <- p + geom_bar()
}
dist_plots <- c(dist_plots,list(p))
}
} else {
for(var_x in names(df)) {
p <-
ggplot(df,aes_string(fill=fill_var)) +
aes_string(var_x)
if(is.numeric(df[[var_x]])) {
p <- p + geom_density(aes_string(fill=fill_var),alpha = 0.5)
} else {
p <- p + geom_bar(aes_string(fill=fill_var),alpha = 0.5)
}
dist_plots <- c(dist_plots,list(p))
}
}
dist_plots
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment