Skip to content

Instantly share code, notes, and snippets.

@machinatoonist
Created August 7, 2021 03:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save machinatoonist/eee149f0c909ef4342f4814673ff2cdf to your computer and use it in GitHub Desktop.
Save machinatoonist/eee149f0c909ef4342f4814673ff2cdf to your computer and use it in GitHub Desktop.
How to create a correlation funnel plot
library(correlationfunnel)
library(dplyr)
data("ames")
data_prep <- ames
### Step 1 - Prepare Data as Binary Features
# We use the `binarize()` function to produce a feature set of binary (0/1) variables.
# Numeric data are binned (using `n_bins`) into categorical data, then all categorical data
# is one-hot encoded to produce binary features. To prevent low frequency categories
# from increasing the dimensionality, we use `thresh_infreq = 0.01` and `name_infreq = "OTHER"`
# to group excess categories.
binarised_data <- data_prep %>%
binarize(n_bins = 5, thresh_infreq = 0.01, name_infreq = "OTHER", one_hot = TRUE)
binarised_data %>% glimpse()
### Step 2 - Correlate to the Target
# Next, we use `correlate()` to correlate the binary features to a target (in our case
# the highest Sales Price bin identified in the binarised data.
correlation_summary <- binarised_data %>%
correlate(Sale_Price__230000_Inf)
correlation_summary
### Step 3 - Plot the Correlation Funnel
# Finally, we visualize the correlation using the `plot_correlation_funnel()` function.
correlation_summary %>%
filter(abs(correlation) > .2) %>%
plot_correlation_funnel(limits = c(-.75, 1), interactive = FALSE) +
labs(title = "Correlation Funnel for Ames Housing Dataset",
subtitle = "Using the correlationfunnel R package by Matt Dancho",
x = "Pearson Correlation Coefficient",
y = "Feature") +
geom_vline(xintercept = .3, lty = 2) +
geom_vline(xintercept = -.3, lty = 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment