How to create a correlation funnel plot
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(correlationfunnel) | |
library(dplyr) | |
data("ames") | |
data_prep <- ames | |
### Step 1 - Prepare Data as Binary Features | |
# We use the `binarize()` function to produce a feature set of binary (0/1) variables. | |
# Numeric data are binned (using `n_bins`) into categorical data, then all categorical data | |
# is one-hot encoded to produce binary features. To prevent low frequency categories | |
# from increasing the dimensionality, we use `thresh_infreq = 0.01` and `name_infreq = "OTHER"` | |
# to group excess categories. | |
binarised_data <- data_prep %>% | |
binarize(n_bins = 5, thresh_infreq = 0.01, name_infreq = "OTHER", one_hot = TRUE) | |
binarised_data %>% glimpse() | |
### Step 2 - Correlate to the Target | |
# Next, we use `correlate()` to correlate the binary features to a target (in our case | |
# the highest Sales Price bin identified in the binarised data. | |
correlation_summary <- binarised_data %>% | |
correlate(Sale_Price__230000_Inf) | |
correlation_summary | |
### Step 3 - Plot the Correlation Funnel | |
# Finally, we visualize the correlation using the `plot_correlation_funnel()` function. | |
correlation_summary %>% | |
filter(abs(correlation) > .2) %>% | |
plot_correlation_funnel(limits = c(-.75, 1), interactive = FALSE) + | |
labs(title = "Correlation Funnel for Ames Housing Dataset", | |
subtitle = "Using the correlationfunnel R package by Matt Dancho", | |
x = "Pearson Correlation Coefficient", | |
y = "Feature") + | |
geom_vline(xintercept = .3, lty = 2) + | |
geom_vline(xintercept = -.3, lty = 2) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment