Skip to content

Instantly share code, notes, and snippets.

@mlopatka

mlopatka/BPCI.R Secret

Created August 31, 2018 16:05
Show Gist options
  • Save mlopatka/71d4976f3b4c878f68d0d01c77670260 to your computer and use it in GitHub Desktop.
Save mlopatka/71d4976f3b4c878f68d0d01c77670260 to your computer and use it in GitHub Desktop.
Standard implementation of Binomial proportion confidence interval with example data
library(stats)
# generate a random data frame of hypothetical performance.
# - each row corresponds to the performance of a fathom rule set on a single page
# - each column corresponds to the success or failure of correctly identifying a single feature (price, image, title, description)
# This could be extended to evaluate multiple fathom rulesets in parallel by the addition of another index column.
fathom_classification_strategy1 <- data.frame(replicate(4,sample(0:1,100,rep=TRUE)))
fathom_classification_strategy2 <- data.frame(replicate(4,sample(0:1,100,rep=TRUE)))
# Add an indication of the fathom recipe
fathom_classification_strategy1[,'recipe'] <- 'fathom_recipe_1'
fathom_classification_strategy2[,'recipe'] <- 'fathom_recipe_2'
# Concatentate the data from multipel experiments.
exp_data <- rbind(fathom_classification_strategy1, fathom_classification_strategy2)
# Set alpha
alpha = 0.05
# Cast the column (feature) of interest as a factor
feature_1 <- factor(exp_data$X1)
# Count the numebr of points
n = length(feature_1)
# Cast it to a tabe for handy builtin functions
as_table <- table(feature_1)
# Compute the proportion of correct classifications
p_hat = as_table[1]/n
# Calculate the critical z-score
z = qnorm(1-alpha/2)
# Compute the Confidence Intervals for that performance
p_hat
p_hat + c(-1,1)*z*sqrt(p_hat*(1-p_hat)/n)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment