Last active
July 15, 2022 14:34
-
-
Save ribsy/111b2fa86d83b79517f7359d2991b357 to your computer and use it in GitHub Desktop.
This is a simple probabilistic programming example for AB testing security products. Its merely a simple starter example...with more to come. You can read more about this at https://www.soluble.ai/blog/security_budgets_at_risk
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' --- | |
#' title: 3 Metrics For Getting Better Security ROI | |
#' author: Richard Seiersen | |
#' date: June 4, 2020 | |
#' output: | |
#' html_document: | |
#' toc: true | |
#' highlight: zenburn | |
#' --- | |
#' ##Why did I do this? | |
#' To answer this question in whole start here: [Soluble Blog: Security Budgets At Risk](www.soluble.ai/blogs/security-budgets-at-risk) | |
#' | |
#' Also, a fellow startup founder was having trouble demonstrating value in POCs. He wanted a way to more concretely | |
#' demonstrate product effectiveness in terms of overall costs impact. This is the result of that experiment. | |
#' | |
#' I make no claims about its absolute effectivness. But, I do believe it beats the competing model which is | |
#' nine time out of ten you unaided intuition. What I mean is, while this tool will not answer the question, | |
#' "should I buy this solution or not!?" It will move you in the direction of a mathematically unambigous and | |
#' consistent approach to comparing some products. | |
#' | |
#' ##Point Of View for this Demo | |
#' 1. When figuring out which security solution to buy you need to consider sticker costs and actual cost. | |
#' 2. Tools that do the right thing may produce cost - but I consider that the cost of doing business. | |
#' 3. Tools that make errors create waste in the form of extra re-work and risk (in the case of security) | |
#' 4. We want to measure all of that - but we emphasize **waste measurement** in this excercise. | |
#' 5. And we want to do that simply and relatively fast with small data. | |
#' 6. This script allows you to compare two products under test so you can | |
#' make a more informed buying choice...even if the test size is tiny. | |
#' 7. [See Soluble blog for more](www.soluble.ai/blogs/security-budgets-at-risk) | |
#' | |
#' | |
#' ###Software You Will Need | |
#' * [You need R](https://www.r-project.org/) | |
#' * [R Studio](https://rstudio.com/products/rstudio/download/) | |
#' | |
#' ###Libraries You Will Need | |
#' There are generally two classes of libraries in use: Bayesian and Visualization. I am not necessarily endorsing any of these, | |
#' they just got the job done. | |
#' <br> | |
#' * Bayes: ProbBayes, Bolstad, mc2d (sorta), bayestestR. | |
#' <br> | |
#' * Visual: ggplot2, ggdistribute, scales, knitr | |
#' <br> | |
# Specify packages, and install missing ones. | |
my_packages <- c("ProbBayes", "Bolstad", "ggplot2", "ggdistribute", | |
"mc2d", "bayestestR", "scales", "knitr") | |
not_installed <- my_packages[!(my_packages %in% installed.packages()[ , "Package"])] | |
if (length(not_installed)){ | |
msgVal = paste("Install packages: ",toString(not_installed)) | |
msgRes <- askYesNo(msgVal) | |
if( msgRes != TRUE){ | |
stopMsg <- paste("Quiting because you don't want to install:", not_installed) | |
stop(stopMsg) | |
} | |
} | |
if(length(not_installed)) install.packages(not_installed) | |
library(ProbBayes) | |
library(Bolstad) | |
library(ggplot2) | |
library(ggdistribute) | |
library(mc2d) | |
library(bayestestR) | |
library(scales) | |
library(knitr) | |
#' | |
#' Run this to get an html version: rmarkdown::render("generative_model_blog.R",output_file = "generative_model_blog_report.html") | |
#' | |
#' ##Configuration Elements | |
#' ### These are the only things you need to play with to run an anaysis. | |
#' | |
#' You edit these for your particular environment and test size. | |
#' You do not need to understand any of the code toward the bottom of this page. That would likely be torture | |
#' Your job is to change these parameters to fit your particular use case. | |
#' | |
#' ##Base Configs product A and product B defined | |
#' * Events (a_n_events/b_n_events) consists of errors and non-errors. It's the total set | |
#' * An error could be a false positive or a false negative as described in the article | |
#' * These break down into a few **implied** sets. I say implied because you are only | |
#' tracking the total events and the errors (false positives and negatives) | |
#' + Shared set of TRUE events | |
#' + Shared set of False Positives | |
#' + Shared set of False Negatives (discovered by some other solution) | |
#' + False Positive unique to one product | |
#' + False Negatives unique to one product | |
#' * IF the sum total events from one product are less than the TRUE events of another product | |
#' the script will throw an error. It means you missed one or more false negatives. That is the only | |
#' error checking that really matters. You could have countless false positives for one product and not | |
#' the other. | |
# These parameters set the basis for the analyssis in terms of simulations, events and timeframe | |
base_configs <- list() | |
base_configs["n_draws"] <- 100000 # Number for simulations 100K default - no need to change really | |
base_configs["a_n_events"] <- 47 # Total events for test like 100 vulns, phising events etc - could be 10 | |
base_configs["b_n_events"] <- 40 # Total events for test like 100 vulns, phising events etc - could be 10 | |
base_configs["time_multiple"] <- 12 # Time period: If a month then use 12, if a day use 365 | |
# Beliefs about error counts prior to testing | |
event_priors <- list() | |
event_priors["median_errors"] <- 0.20 # True rate is just as likely above/below this | |
event_priors["edge_errors"] <- 0.45 # 90% sure true value is below this | |
# What was the real count after testing. it's in relation to base_configs["n_events"] above. | |
# This includes false positives and discoverable false negatives (deltas found between the solutions) | |
# Note, the solutions also will correct for some of the false negative errors dynamically. | |
event_counts <- list() | |
event_counts["product_a_errors"] <- 14 # After conducting test, what are the real errors for prod a | |
event_counts["product_b_errors"] <- 7 # After conducting test, what are the real errors for prod b | |
# Forecasted range of error response hours that can happen per event | |
eng_hour_range <- list() | |
eng_hour_range["mode"] <- 3 # Expect hours | |
eng_hour_range["low"] <- 1 # Low hours | |
eng_hour_range["high"] <- 5 # Max hours | |
# Forecasted cost of responding to an error event for ONE HOUR | |
eng_cost_range <- list() | |
eng_cost_range["mode"] <- 600 # Expected cost per event | |
eng_cost_range["low"] <- 200 # Low end cost per event | |
eng_cost_range["high"] <- 2000 # High end cost per event | |
#Year 1 Deterministic Cost of Product A & B license or subscrption etc | |
product_costs <- list() | |
product_costs["prod_a_cost"] <- 72000.00 | |
product_costs["prod_b_cost"] <- 65000.00 | |
product_costs["prod_a_run"] <- 10000 # There could be a material cost to operate product A | |
product_costs["prod_b_run"] <- 10000 # Perhaps product B is less expensive becuase its SaaS | |
#' ##Simplifying Functions | |
#' This are tiny functions that do computations, graphs or print data to screen. | |
#' | |
# Check error inputs and quit if not correct | |
ErrorCheck <- function(base_configs, event_counts){ | |
a_total <- as.numeric(base_configs["a_n_events"]) | |
a_errors <- as.numeric(event_counts["product_a_errors"]) | |
b_total <- as.numeric(base_configs["b_n_events"]) | |
b_errors <- as.numeric(event_counts["product_b_errors"]) | |
#Remove errors to get on true events | |
a_true <- a_total - a_errors | |
b_true <- b_total - b_errors | |
#cat(paste("A Total:",a_total,"A Errors:", a_errors,"A True:", a_true, "\n")) | |
#cat(paste("B Total:",b_total,"A Errors:", b_errors,"B True:", b_true, "\n")) | |
#Is Prod A/B Missing errors - false negatives specifically? | |
if( a_true < b_true ){ | |
diff <- b_true - a_true | |
cat(paste("Adding ", diff ," FALSE NEGATIVE errors to product A for a new total of: ",diff + a_errors,"\n")) | |
return(c("product_a_errors", diff + a_errors)) | |
} | |
if( b_true < a_true ){ | |
diff <- a_true - b_true | |
cat(paste("Adding ", diff ," FALSE NEGATIVE errors to product B for a new total of: ",diff + b_errors,"\n")) | |
return(c("product_b_errors", diff + b_errors)) | |
} | |
if( b_total < a_true ){ | |
diff <- a_true - b_total | |
cat(paste("Product B is missing", diff, "false positive errors.")) | |
stop("Exiting") | |
} | |
if( a_total < a_errors){ | |
stop("Exiting: Your errors for product A exceed your total events for product A") | |
} | |
if( b_total < b_errors){ | |
stop("Exiting: Your errors for product B exceed your total events for product B") | |
} | |
return(c("No Change", 0)) | |
} | |
CatTotalCost <- function(total_costs, product_costs){ | |
#Get Cost Ratios | |
prod_a_total_costs <- total_costs[1] | |
prod_b_total_costs <- total_costs[2] | |
prod_ratio <- round( prod_a_total_costs / prod_b_total_costs, 3) * 100 | |
prod_ratio <- paste(prod_ratio, "%", sep = "") | |
#Format as dollars | |
prod_a_orginal_cost <- dollar_format()(as.numeric(product_costs["prod_a_cost"])) | |
prod_a_total_final <- dollar_format()(round(as.numeric(prod_a_total_costs),2)) | |
prod_b_orginal_cost <- dollar_format()(as.numeric(product_costs["prod_b_cost"])) | |
prod_b_total_final <- dollar_format()(round(as.numeric(prod_b_total_costs),2)) | |
#Print Message to stdout | |
cat("--------Which Product Cost Most To Operate Given Forecasted Errors-------\n") | |
cat(paste(" Product A is", prod_ratio , "the total cost of product B\n")) | |
cat(paste(" Product A (with base price of ", prod_a_orginal_cost ,") is expected to cost to operate with errors: ", prod_a_total_final,"\n")) | |
cat(paste(" Product B (with base price of ", prod_b_orginal_cost,") is expected to cost to operate with errors: ", prod_b_total_final,"\n")) | |
cat("\n Still uncertain? Get more data by running more tests!") | |
return(c(prod_ratio, prod_a_orginal_cost, prod_a_total_final, prod_b_orginal_cost, prod_b_total_final)) | |
#print(knitr::kable(out)) | |
} | |
GraphErrorImpact <- function(posterior, impact){ | |
newdf_a_impact <- data.frame(posterior = posterior$prod_a_impact, product = "Product A") | |
newdf_b_impact <- data.frame(posterior = posterior$prod_b_impact, product = "Product B") | |
newdf_ab_impact <- rbind(newdf_a_impact, newdf_b_impact) | |
impact_text <- paste(" Product A is", impact[1] , "the total cost of product B\n Product A (with base price of ", impact[2] ,") is expected to cost to operate with errors: ", impact[3],"\n Product B (with base price of ", impact[4],") is expected to cost to operate with errors: ", impact[5],"\n") | |
print(ggplot(newdf_ab_impact, aes(x = posterior, y = ..density.., fill = product)) + | |
stat_density_ci(alpha = 0.5, n = 1024, geom = "density",position = "identity") + | |
scale_fill_manual(values=c("#333399", "#CCCCFF")) + | |
labs(title = "FORECASTING APPSEC ERROR IMPACT\nUsing Bayesian A/B Testing\n", x="Using Small Samples: Predicted Difference In Product Error Impacts", | |
subtitle = impact_text, | |
y = "Product Error Comparisons")) | |
} | |
CatErrorDiffProp <- function(posterior){ | |
error_rate_diff <- sum(posterior$prop_diff_events > 0 )/length(posterior$prop_diff_events) | |
cat(paste("--------Error Rate Difference For This Simmulation--------\n Probability that product A makes more errors that Product B is ",round(error_rate_diff * 100) ,"%\n\n", sep = "")) | |
} | |
GraphErrorDiff <- function(posterior){ | |
xval <- posterior$prop_diff_events | |
breaks <- pretty(range(xval), n = nclass.FD(xval), min.n = 1) | |
bwidth <- breaks[2]-breaks[1] | |
print(ggplot(posterior, aes(x=prop_diff_events)) + | |
geom_histogram(aes(y=..density..), colour="black", fill="white", binwidth = bwidth)+ | |
geom_density(alpha=.2, fill="#333399") + labs(title="Event Amount Difference: Product A - Product B",x="Event Rate", y = "Amount")+ | |
geom_vline(xintercept=0, color="red", size=1, linetype = "dotted") | |
) | |
} | |
GraphErrorRates <- function(posterior){ | |
newdf_a <- data.frame(posterior = posterior$prod_a, product = "Product A") | |
newdf_b <- data.frame(posterior = posterior$prod_b, product = "Product B") | |
newdf_ab <- rbind(newdf_a, newdf_b) | |
print(ggplot(newdf_ab, aes(x = posterior, y = ..density.., fill = product)) + | |
stat_density_ci(alpha = 0.5, n = 1024, geom = "density",position = "identity") + | |
scale_fill_manual(values=c("#333399", "#CCCCFF")) + | |
labs(title = "Product A/B Testing", x="Using Small Samples: Predicted Difference In Product Error Rates", | |
y = "Product Comparisons")) | |
} | |
CatUpdatedBeliefRates <- function(posterior, event_counts, base_configs){ | |
out <- "--------Updated Forecasts Given Small Data, Beliefs & Lots of Simulations--------------\n" | |
# 89% Highest Density Intervals | |
prod_a_points <- point_estimate(posterior$prod_a) | |
ci_hdi_prod_a <- ci(posterior$prod_a, method = "HDI") | |
out <- paste(out, " The previous product A forecast combined with ", base_configs["a_n_events"] ," events and ", event_counts["product_a_errors"] ," errors is between:", | |
round(ci_hdi_prod_a$CI_low,2) * 100, "% and ", | |
round(ci_hdi_prod_a$CI_high,2) * 100, "% \n And centered near:", | |
round(prod_a_points[[1]],2) * 100, "%\n\n", sep="") | |
# 89% Highest Density Interval | |
prod_b_points <- point_estimate(posterior$prod_b) | |
ci_hdi_prod_b <- ci(posterior$prod_b, method = "HDI") | |
out <- paste(out, " The previous product B foreacst combined with ", base_configs["b_n_events"] ," events and ", event_counts["product_b_errors"]," errors is between:", | |
round(ci_hdi_prod_b$CI_low,2) * 100, "% and ", | |
round(ci_hdi_prod_b$CI_high,2) * 100, "% \n And centered near:", | |
round(prod_b_points[[1]],2) * 100, "%\n\n", sep="") | |
out <- paste(out, "See the markdown report or plot tab graphs in rstudio to see forecasts combined with data.\n") | |
cat(out) | |
} | |
CatPriorBeliefRates <- function(event_priors){ | |
#Pring Beta Curve | |
#print(MakeCurves(event_priors[["median_errors"]], event_priors[["edge_errors"]],color="Purple", CI=TRUE, CIT=TRUE)) | |
xval = "Event Probability" | |
yval = "Event Strength" | |
tval = "Credible Beliefs About Events" | |
sval = "With 95% Credible Belief Interval" | |
beta_vals <- GetBeliefsEvents(event_priors[["median_errors"]], event_priors[["edge_errors"]] ) | |
print(ggplotBeta(beta_vals[1], beta_vals[2], xlab = xval, ylab = yval, tlab = tval, sval = sval)) | |
#Get Beta Prio CIs | |
priorCI <- GetBetaCI(event_priors["median_errors"], event_priors["edge_errors"]) | |
cat("------Prior Beliefs About Error Rates--------\n") | |
cat(paste(" You forecasted an ERROR RATE equally ABOVE/BELOW ", round(event_priors[["median_errors"]]*100), "%.\n", sep="")) | |
cat(paste(" You're also 90% sure the error rate is LESS THAN ", round(event_priors[["edge_errors"]]*100), "%.\n", sep="")) | |
cat(paste(" Given that forecast and the model we are using you you believe the true rate likely sits between ", priorCI[1],"% and ", priorCI[3], "%\n\n", sep="")) | |
} | |
GetTotalCosts <- function(impacts, costs, time_multiple, event_counts, prior_events, base_configs){ | |
product_a_counts <- event_counts[[1]] | |
product_b_counts <- event_counts[[2]] | |
# Get Gamma Prior for product a which will get fed into poisson dist in GetEventFrequency() | |
a_number_events <- as.numeric(base_configs$a_n_events) | |
a_gamma_median_prior <- prior_events[[1]] * a_number_events | |
a_gamma_sd <- prior_events[[2]] * a_number_events | |
# Get Gamma Prior for product b which will get fed into poisson dist in GetEventFrequency() | |
b_number_events <- as.numeric(base_configs$b_n_events) | |
b_gamma_median_prior <- prior_events[[1]] * b_number_events | |
b_gamma_sd <- prior_events[[2]] * b_number_events | |
prod_a_event_results <- GetEventFrequency(product_a_counts, a_gamma_median_prior, a_gamma_sd) | |
prod_a_mean_events <- prod_a_event_results$mean | |
prod_a_event_totals <- prod_a_mean_events * as.numeric(time_multiple) | |
prod_b_event_results <- GetEventFrequency(product_b_counts, b_gamma_median_prior, b_gamma_sd) | |
prod_b_mean_events <- prod_b_event_results$mean | |
prod_b_event_totals <- prod_b_mean_events * as.numeric(time_multiple) | |
prod_a_total_cost <- prod_a_event_totals * impacts$prod_a_impact + costs$prod_a_cost + costs$prod_a_run | |
prod_b_total_cost <- prod_b_event_totals * impacts$prod_b_impact + costs$prod_b_cost + costs$prod_b_run | |
return(c(prod_a_total_cost, prod_b_total_cost) ) | |
} | |
GetEventFrequency <- function(event_counts, avg_event_count, event_variation, graph = FALSE){ | |
#rate vals can be a vector of results, perhaps events per day, week , month etc | |
# mode and sd will be converted into rate and scale. | |
event_count_priors <- GetRatePriors(avg_event_count, event_variation, graph) | |
result <- poisgamp(event_counts, event_count_priors[1], event_count_priors[2], plot = FALSE, suppressOutput = TRUE) | |
return(result) | |
} | |
GetRatePriors <- function(mode, sd, graph = FALSE){ | |
# This function allows you to think in terms of mode and SD. | |
# Note that everything is positive..so while the mode might be a small number that is close to 0 | |
# The SD can be large skewing far to the right, without going past zero on the left. | |
# Here are the corresponding rate and shape parameter values: | |
ra = ( mode + sqrt( mode^2 + 4*sd^2 ) ) / ( 2 * sd^2 ) | |
sh = 1 + mode * ra | |
if(graph){ | |
x = seq(0,mode+5*sd,len=1001) | |
plot( x , dgamma( x , shape=sh , rate=ra ) , type="l" , | |
main=paste("dgamma, mode=",mode,", sd=",sd,sep="") , | |
ylab=paste("dgamma( shape=",signif(sh,3)," , rate=",signif(ra,3)," )", | |
sep="") ) | |
abline( v=mode , lty="dotted" ) | |
} | |
return(c(sh, ra)) | |
} | |
GetEngCost <- function(size, eng_cost_pert, eng_hour_priors){ | |
mode <- as.numeric(eng_cost_pert[["mode"]]) | |
low <- as.numeric(eng_cost_pert[["low"]]) | |
high <- as.numeric(eng_cost_pert[["high"]]) | |
mode_hrs <- as.numeric(eng_hour_priors[["mode"]]) | |
low_hrs <- as.numeric(eng_hour_priors[["low"]]) | |
high_hrs <- as.numeric(eng_hour_priors[["high"]]) | |
eng_rate <- rpert(size, min = low, mode = mode, max = high, shape = 4) | |
eng_hours <- rpert(size, min = low_hrs, mode = mode_hrs, max = high_hrs, shape = 4) | |
eng_cost <- eng_hours * eng_rate | |
return(eng_cost) | |
} | |
GetErrors <- function(n_draws, n_events, proportion_events){ | |
n_errors <- rbinom(n = n_draws, size = n_events, prob = proportion_events) | |
return(n_errors) | |
} | |
GetPropotionOfEvents <- function(n_draws,prior_events){ | |
error_beliefs <- GetBeliefsEvents(prior_events[["median_errors"]], prior_events[["edge_errors"]]) | |
#Make an n_draws length list of error beliefs | |
proportion_events <- rbeta(as.numeric(n_draws), as.numeric(error_beliefs[[1]]) , as.numeric(error_beliefs[[2]]) ) | |
return(proportion_events) | |
} | |
GetBeliefsHits <- function(hits, max_hits, events ){ | |
alpha_ratio <- as.numeric(hits) / as.numeric(events) | |
beta_ratio <- as.numeric(max_hits) / as.numeric(events) | |
q1 = list(p=.5, x= alpha_ratio) | |
q2 = list(p=.9, x= beta_ratio) | |
return( beta.select(q1, q2) ) | |
} | |
GetBeliefsEvents <- function(belief, max_belief ){ | |
alpha_ratio <- as.numeric(belief) | |
beta_ratio <- as.numeric(max_belief) | |
q1 = list(p=.5, x= alpha_ratio) | |
q2 = list(p=.9, x= beta_ratio) | |
return( beta.select(q1, q2) ) | |
} | |
SimulateHits <- function(freq_hits, num_events){ | |
freq_miss <- 1 - freq_hits | |
res <- sample(c("hit", "miss"), size=num_events, replace=TRUE, | |
prob=c(freq_hits, freq_miss)) | |
hit <- sum(res == "hit") | |
miss <- sum(res == "miss") | |
return( c(hit,miss)) | |
} | |
GetBetaCI <- function(hit,edge){ | |
betas <- GetBeliefsEvents(hit, edge) | |
intervals <- qbeta(c(.025, .5, .975), betas[1], betas[2]) | |
intervals <- round(intervals * 100) | |
return(intervals) | |
} | |
MakeCurves <- function(alpha = 0, beta = 0 , color = "black", ytop = 20, CI = FALSE, CIT = FALSE, add_val = FALSE){ | |
alpha <- alpha + 1 | |
beta <- beta + 1 | |
curve(dbeta(x, alpha , beta ), from=0, to=1, | |
xlab="SME Forecast For Proportion of Events", ylab="Density", | |
col= color, lwd=4, las=1, add = add_val , ylim = c(0,ytop)) | |
if( CI ){ | |
intervals <- qbeta(c(.025, .975), alpha, beta) | |
abline(v = intervals[1], lty=2) | |
abline(v = intervals[2], lty=2) | |
if( CIT ){ | |
adj <- 0 | |
if( intervals[2] - intervals[1] < 0.11 ){ | |
adj <- 1 | |
} | |
text(intervals[1] + 0.03,ytop, round(intervals[1], 2)) | |
text(intervals[2] + 0.03 ,ytop - adj, round(intervals[2], 2)) | |
} | |
} | |
} | |
ggplotBeta <- function(alpha, beta, xlab = xval, ylab = yval, tlab = "Beliefs", sval = "Test"){ | |
x = seq(0, 1, 0.01) | |
y = dbeta(x, shape1=alpha, shape2=beta) | |
density = data.frame(x=x, y=y) | |
mean=alpha/(alpha+beta) | |
mode=(alpha-1)/(alpha+beta-2) | |
intervals <- qbeta(c(.025, .975), alpha, beta) | |
res <- ggplot(density, aes(x, y)) + | |
geom_area(fill= "darkslateblue", alpha = 0.6) + | |
theme_minimal()+ | |
theme(axis.text.x = element_text(angle = 90, hjust = 1), | |
axis.text.y = element_blank()) + | |
geom_vline(xintercept = intervals[1]) + | |
geom_vline(xintercept = intervals[2])+ | |
labs(x = xlab, y = ylab, title = tlab, subtitle = sval) | |
return(res) | |
} | |
GetSize <- function(size, first, second){ | |
res <- ifelse(length(first) >= length(second), length(second), length(first) ) | |
if( size > res){ | |
return(res) | |
} | |
return(size) | |
} | |
GetEngHours <- function(size, med = 10, ninety = 60, total_hours = 100){ | |
beta_vals <- GetBeliefsHits(med, ninety, total_hours) | |
prop_hours <- rbeta(size, beta_vals[1], beta_vals[2]) | |
hours <- rbinom(size, total_hours, prop_hours) | |
return(hours) | |
} | |
#' ##Wrapper Function | |
#' This just makes it easy for users to call one function without much fuss. | |
runProductCostComparison <- function(base_configs, event_priors, event_counts, eng_hour_range, eng_cost_range, product_costs){ | |
# A Quick Check for missed false negatives between products and other sundry. | |
update_count <- ErrorCheck(base_configs, event_counts) | |
if( update_count[2] > 0){ | |
event_counts[update_count[[1]]] <- as.numeric(update_count[[2]]) | |
} | |
#' ##Prior Beliefs About Error Rates | |
#' In the *event_priors* config element above you specified what you think is the likely error rate given N events. This graph shows you" | |
#' the spread of possible rates given our beliefs. It also gives some indication of where the most likely rates may cluster." | |
CatPriorBeliefRates(event_priors) | |
#Number of simulations (draws), tests (events), and default array index sizes | |
n_draws <- as.numeric(base_configs[["n_draws"]]) | |
a_n_events <- as.numeric(base_configs[["a_n_events"]]) | |
b_n_events <- as.numeric(base_configs[["b_n_events"]]) | |
size <- 10000 | |
#Create a list of simmulated events and error RATES based on priors | |
proportion_events <- GetPropotionOfEvents(n_draws, event_priors) | |
a_n_errors <- GetErrors(n_draws, a_n_events, proportion_events) | |
b_n_errors <- GetErrors(n_draws, b_n_events, proportion_events) | |
#Put simulations into dataframe | |
a_prior <- data.frame(proportion_events, a_n_errors) | |
b_prior <- data.frame(proportion_events, b_n_errors) | |
# Select the simulationed prior data that correlates to the real product errors | |
product_a <- a_prior[a_prior$a_n_errors == event_counts[["product_a_errors"]], ] | |
product_b <- b_prior[b_prior$b_n_errors == event_counts[["product_b_errors"]], ] | |
#Sample to create indices for posterior and load into data frame | |
sample_a_indices <- sample( nrow(product_a), size = size, replace = TRUE, prob = product_a$proportion_events) | |
sample_b_indices <- sample( nrow(product_b), size = size, replace = TRUE, prob = product_b$proportion_events) | |
posterior <- data.frame( | |
prod_a = product_a$proportion_events[sample_a_indices], | |
prod_b = product_b$proportion_events[sample_b_indices]) | |
#' Updated beliefs given real error rates | |
CatUpdatedBeliefRates(posterior, event_counts, base_configs) | |
#' Create and Print out Error Rate Graphs | |
GraphErrorRates(posterior) | |
#' Calculate the rate differences, graph and print out | |
posterior$prop_diff_events <- posterior$prod_a - posterior$prod_b | |
GraphErrorDiff(posterior) | |
CatErrorDiffProp(posterior) | |
#This gets our impact per event with lots of uncertainty packed in | |
eng_cost <- GetEngCost(size, eng_cost_range, eng_hour_range) | |
# Add forecasted expected cost column to posterior for product a and b and store in list | |
posterior$prod_a_impact <- posterior$prod_a * eng_cost | |
posterior$prod_b_impact <- posterior$prod_b * eng_cost | |
#Load Median values into list of use later | |
impacts <- list() | |
impacts["prod_a_impact"] <- median(posterior$prod_a_impact) | |
impacts["prod_b_impact"] <- median(posterior$prod_b_impact) | |
# Add the column that holds the difference between the two costs | |
posterior$prod_diff_impact <- posterior$prod_a_impact - posterior$prod_b_impact | |
print("Here is a glimpse into some of the data.") | |
print(head(posterior,10)) | |
total_costs <- GetTotalCosts(impacts, product_costs, base_configs["time_multiple"], event_counts, event_priors, base_configs) | |
#' #Total Costs Differences | |
impact_res <- CatTotalCost(total_costs, product_costs) | |
#' Graph Impact Difference | |
GraphErrorImpact(posterior, impact_res) | |
} | |
#' ## Run Program, Get Graphs, Data Output and The Code | |
#' This will spit out analytics and some graphs. | |
runProductCostComparison(base_configs,event_priors, event_counts, eng_hour_range, eng_cost_range, product_costs) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment