The test data file for JASP. debug.csv
contains 100 observations on many variables to test functions in JASP.
- 9 continuous variables
- Normal: Normally distributed data
- Gamma: Gamma distributed data (only positive!)
- Binomial: random 0/1 data
- Expon: Exponentiated normal data (wide interval)
- Wide: a very wide interval on the order of 10^100
- Narrow: a very narrow interval on the order of 10^-100
- Outlier: normal data with several outliers
- Cor1&2: Correlated normal data
- 5 factor variables
- Gender: m/f coded
- Experim: Experimental/Control coded
- Five: Factor with 5 levels
- Fifty: Factor with 50 levels
- Outlier: Factor with 4 levels of which 2 are only one observation
- 15 debug variables
- String: A-Z randomised
- Miss1: Normal data with 1 missing value
- Miss30: Normal data with 30 missings
- Miss80: Normal data with 80 missings
- Miss99: Normal data with 99 missings
- BinMiss20: Binomial data with 20 missings
- NaN: Full column with NaN
- NaN10: Normal data with 10 NaN
- Inf: Full column with Inf
- Collin1&2&3: Three collinear normal variables
- Equal1&2: Two normal variables with exactly the same values
- Same: Column with all "12.3" values (no variance)
Open the R file testData.R
, available as a gist, to recreate the dataset. Here, additional columns can be added. When the dataset has been generated, save it as a csv, open it with a spreadsheet editor, and replace the cells containing the value NA
with empty cells.
Continuous
s <- matrix(c(1,0.68,0.68,1), nrow = 2)
mvn <- mvrnorm(100,c(0,0),s)
cont <- data.frame(contNormal = rnorm(100), # Standard Normal
contGamma = rgamma(100,2), # Gamma Distributed
contBinom = rbinom(100, 1, 0.4), # Bernoulli trials
contExpon = exp(rnorm(100, sd = 50)), # Exponentiated normal
contWide = runif(100,-9e99,9e99), # Very wide interval
contNarrow = runif(100,-1e-99,1e-99), # Very narrow
contOutlier = sample(c(rnorm(95), # With outliers
c(12,-23,4.5,5.7,-3.12)),100),
contcor1 = mvn[,1], # Multivariate normal with cor 0.68
contcor2 = mvn[,2])
Factors
fac <- data.frame(facGender = factor(sample(rep(c("m", "f"), 50), replace = F)),
facExperim = factor(rep(c("control", "experimental"), 50)),
facFive = factor(rep(1:5, 20)),
facFifty = factor(c(1:50,1:50)),
facOutlier = factor(c(rep(c("f1","f2"),49), "f3",
"totallyridiculoussuperlongfactorname")))
Debug
col <- rbeta(100, 23, 12)
eq <- rnorm(100,10,2.5) * rgamma(100,1)
deb <- data.frame(debString = sample(letters, 100, T), # Random letter string
debMiss1 = sample(c(rnorm(99,10,25), NA)), # Various # Missing
debMiss30 = sample(c(rnorm(70,10,25), rep(NA,30))),
debMiss80 = sample(c(rnorm(20,10,25), rep(NA,80))),
debMiss99 = sample(c(rnorm(1,10,25), rep(NA,99))),
debBinMiss20 = sample(c(rbinom(80,1,0.6), rep(NA, 20))),
debNaN = rep(NaN, 100), # All NaN
debNaN10 = sample(c(rnorm(90,10,25), rep(NaN,10))), # 10 NaN
debInf = rep(Inf, 100), # All Inf values
debCollin1 = col, # Three multicollinear variables
debCollin2 = col + 2,
debCollin3 = col * 2,
debEqual1 = eq, # Two exactly equal variables
debEqual2 = eq,
debSame = rep(12.3,100)) # Exactly the same value 100 times