lenarddome/COSTS.md Secret

## COSTS.md

      
    Raw
  

              COSTS.md
            
          
    We have explored 166 695 645 points in space, which includes only one ordinal pattern (A, B, BC, C) and 117 participants.
For the whole sample (approx. 3 * 117) the estimate then becomes 500 086 935, which is three times as much.
I single run of EXIT is very fast.
## install microbenchmark for much better accuracy than system.time
devtools::install_github("olafmersmann/microbenchmarkCore")
devtools::install_github("olafmersmann/microbenchmark")

library(microbenchmark)
library(ply207) # a package I quickly made for running simulations

## setup all we need for imacEXIT
i = unique(dome21$ppt)[1]
trial_order <- data[ppt == i]$abstim 
phase_index <- data[ppt == i]$phase 
phase_index[phase_index == "training"] <- 0 
phase_index[phase_index == "test"] <- 2 
phase_index <- as.numeric(phase_index) 
trials <- data[ppt == i]$trial 
train <- dome21train(trial_order = trial_order, phase_index = phase_index,
                     trials = trials, ppt = i)

## define expressin to evaluate by microbenchmark
## imac stands for inequality matrix constructor
expr <- function() {
  imacEXIT(tr = train, stimuli = c("A", "B", "C", "BC", "AB", "AC"))
  }

microbenchmark(exp, times = 100000, unit = "s")
# Unit: seconds
#  expr   min    lq         mean   median      uq       max  neval
#  exp   1e-05 2e-05 2.355473e-05  2e-05     2e-05 0.003967  1e+05
This is really good. The model is super fast. Let's look at the pspEXIT algorithm itself for 10 iterations.
pspEXIT does parameter space partitioning for all trial orders in the data by running imacEXIT.
Here, I will only do a single trial order.
microbenchmark({
    pspEXIT(data = data[ppt == i], stimuli = c("A", "B", "C", "BC"),
    iteration = 10)}, times = 1)
    })
#   Unit: seconds
#     min       lq     mean   median       uq      max neval
# 2.290074 2.290074 2.290074 2.290074 2.290074 2.290074     1

## calculate the rough estimate on how many sim a second we can run on a single core
1 / (2.29 / 10)
# 4.366812
4.36 simulations per second including all the setup. Keep in mind that it is probably an underestimate, as the number of ordinal patterns probably increased by iteration, so 4.36 is only true if every iteration we ran 1 simulation.
all <- 500086935 ## all sampled points
eps <- 1 / (0.229) ## evaluation per second
eps <- eps * 96 ## on 96 core

## calculate seconds on 96 core CPU
cpu <- all / eps
cpu
# [1] 1192916

## calculate the number of days it would take
day <- 24 * 60 * 60 ## number of seconds in a day (hour * minutes * seconds)
cpu / day
# [1] 13.80689
This is only 1 out of the 20 simulations. The only thing remains is to calculate the cost.
ph <- 0.83 ## per hour cost of the vm in £
runtime <- (cpu / day) * 24 # FYI 332 hours
ph * runtime
# [1] 275.0333
£275 seems a lot, but my hunch is that it is standard for computationally intensive workloads.