Skip to content

Instantly share code, notes, and snippets.

@brshallo
Created February 12, 2019 17:44
Show Gist options
  • Save brshallo/e53aeec73d91bcacc606343ca30fdb5c to your computer and use it in GitHub Desktop.
Save brshallo/e53aeec73d91bcacc606343ca30fdb5c to your computer and use it in GitHub Desktop.
Plot showing relationship between entropy and gini in relation to proportion of event (and that gini and entropy follow same pattern).
library(tidyverse)
df_metrics <- tibble(
prob = seq.int(0.001, 0.999, length.out = 999),
entropy = -2 * (prob * log(prob) + (1-prob)*log(1-prob)),
gini_index = 4 * prob * (1 - prob)
) %>%
gather(entropy, gini_index, key = "purity_metric", value = "value")
ggplot(df_metrics, aes(x = prob, y = value, colour = purity_metric))+
geom_line()+
facet_wrap(~purity_metric, scales = "free_y", ncol = 1)+
labs(title = "Plot of entropy and gini vs proportion of binary outcome after one split",
subtitle = "number of regions = 2 ; number of outcomes = 2",
x = "proportion K")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment