Readme for debug dataset in JASP

Debug Dataset


The test data file for JASP. debug.csv contains 100 observations on many variables to test functions in JASP.


  • 9 continuous variables
    • Normal: Normally distributed data
    • Gamma: Gamma distributed data (only positive!)
    • Binomial: random 0/1 data
    • Expon: Exponentiated normal data (wide interval)
    • Wide: a very wide interval on the order of 10^100
    • Narrow: a very narrow interval on the order of 10^-100
    • Outlier: normal data with several outliers
    • Cor1&2: Correlated normal data
  • 5 factor variables
    • Gender: m/f coded
    • Experim: Experimental/Control coded
    • Five: Factor with 5 levels
    • Fifty: Factor with 50 levels
    • Outlier: Factor with 4 levels of which 2 are only one observation
  • 15 debug variables
    • String: A-Z randomised
    • Miss1: Normal data with 1 missing value
    • Miss30: Normal data with 30 missings
    • Miss80: Normal data with 80 missings
    • Miss99: Normal data with 99 missings
    • BinMiss20: Binomial data with 20 missings
    • NaN: Full column with NaN
    • NaN10: Normal data with 10 NaN
    • Inf: Full column with Inf
    • Collin1&2&3: Three collinear normal variables
    • Equal1&2: Two normal variables with exactly the same values
    • Same: Column with all "12.3" values (no variance)

How to add variables

Open the R file testData.R, available as a gist, to recreate the dataset. Here, additional columns can be added. When the dataset has been generated, save it as a csv, open it with a spreadsheet editor, and replace the cells containing the value NA with empty cells.

The code


s <- matrix(c(1,0.68,0.68,1), nrow = 2)
mvn <- mvrnorm(100,c(0,0),s)

cont <- data.frame(contNormal = rnorm(100), # Standard Normal
                   contGamma = rgamma(100,2), # Gamma Distributed
                   contBinom = rbinom(100, 1, 0.4), # Bernoulli trials
                   contExpon = exp(rnorm(100, sd = 50)), # Exponentiated normal
                   contWide = runif(100,-9e99,9e99), # Very wide interval
                   contNarrow = runif(100,-1e-99,1e-99), # Very narrow
                   contOutlier = sample(c(rnorm(95), # With outliers
                   contcor1 = mvn[,1], # Multivariate normal with cor 0.68
                   contcor2 = mvn[,2])


fac <- data.frame(facGender = factor(sample(rep(c("m", "f"), 50), replace = F)),
                  facExperim = factor(rep(c("control", "experimental"), 50)),
                  facFive = factor(rep(1:5, 20)),
                  facFifty = factor(c(1:50,1:50)),
                  facOutlier = factor(c(rep(c("f1","f2"),49), "f3",


col <- rbeta(100, 23, 12)
eq <- rnorm(100,10,2.5) * rgamma(100,1)

deb <- data.frame(debString = sample(letters, 100, T), # Random letter string
                  debMiss1 = sample(c(rnorm(99,10,25), NA)), # Various # Missing
                  debMiss30 = sample(c(rnorm(70,10,25), rep(NA,30))),
                  debMiss80 = sample(c(rnorm(20,10,25), rep(NA,80))),
                  debMiss99 = sample(c(rnorm(1,10,25), rep(NA,99))),
                  debBinMiss20 = sample(c(rbinom(80,1,0.6), rep(NA, 20))),
                  debNaN = rep(NaN, 100), # All NaN
                  debNaN10 = sample(c(rnorm(90,10,25), rep(NaN,10))), # 10 NaN
                  debInf = rep(Inf, 100), # All Inf values
                  debCollin1 = col, # Three multicollinear variables
                  debCollin2 = col + 2,
                  debCollin3 = col * 2,
                  debEqual1 = eq, # Two exactly equal variables
                  debEqual2 = eq,
                  debSame = rep(12.3,100)) # Exactly the same value 100 times
