Skip to content

Instantly share code, notes, and snippets.

@glamp
Last active December 3, 2018 04:36
Show Gist options
  • Save glamp/6268674 to your computer and use it in GitHub Desktop.
Save glamp/6268674 to your computer and use it in GitHub Desktop.
#figure out which columns are numeirc (and hence we can look at the distribution)
numeric_cols <- sapply(df, is.numeric)
#turn the data into long format (key->value esque)
df.lng <- melt(df[,numeric_cols], id="is_bad")
head(df.lng)
#plot the distribution for bads and goods for each variable
p <- ggplot(aes(x=value, group=is_bad, colour=factor(is_bad)), data=df.lng)
#quick and dirty way to figure out if you have any good variables
p + geom_density() +
facet_wrap(~variable, scales="free")
#NOTES:
# - be careful of using variables that get created AFTER a loan is issued (prinicpal/interest related)
# - any ID variables that are numeric will be plotted as well. be sure to ignore those as well.
@rinze
Copy link

rinze commented Feb 7, 2014

This is an incredibly useful snippet when doing exploratory analysis. Thanks a lot for sharing it!

@glamp
Copy link
Author

glamp commented Aug 29, 2014

glad you like it!

@actsasflinn
Copy link

+1 the whole blog post has been one of the most approachable and accessible tutorials on R and modeling that I've read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment