Skip to content

Instantly share code, notes, and snippets.

@vankesteren
Last active October 10, 2016 09:49
Show Gist options
  • Save vankesteren/9b142c37a71bd4f848b927ae6083ac4d to your computer and use it in GitHub Desktop.
Save vankesteren/9b142c37a71bd4f848b927ae6083ac4d to your computer and use it in GitHub Desktop.
Readme for debug dataset in JASP

Debug Dataset

Description

The test data file for JASP. debug.csv contains 100 observations on many variables to test functions in JASP.

Columns

  • 9 continuous variables
    • Normal: Normally distributed data
    • Gamma: Gamma distributed data (only positive!)
    • Binomial: random 0/1 data
    • Expon: Exponentiated normal data (wide interval)
    • Wide: a very wide interval on the order of 10^100
    • Narrow: a very narrow interval on the order of 10^-100
    • Outlier: normal data with several outliers
    • Cor1&2: Correlated normal data
  • 5 factor variables
    • Gender: m/f coded
    • Experim: Experimental/Control coded
    • Five: Factor with 5 levels
    • Fifty: Factor with 50 levels
    • Outlier: Factor with 4 levels of which 2 are only one observation
  • 15 debug variables
    • String: A-Z randomised
    • Miss1: Normal data with 1 missing value
    • Miss30: Normal data with 30 missings
    • Miss80: Normal data with 80 missings
    • Miss99: Normal data with 99 missings
    • BinMiss20: Binomial data with 20 missings
    • NaN: Full column with NaN
    • NaN10: Normal data with 10 NaN
    • Inf: Full column with Inf
    • Collin1&2&3: Three collinear normal variables
    • Equal1&2: Two normal variables with exactly the same values
    • Same: Column with all "12.3" values (no variance)

How to add variables

Open the R file testData.R, available as a gist, to recreate the dataset. Here, additional columns can be added. When the dataset has been generated, save it as a csv, open it with a spreadsheet editor, and replace the cells containing the value NA with empty cells.

The code

Continuous

s <- matrix(c(1,0.68,0.68,1), nrow = 2)
mvn <- mvrnorm(100,c(0,0),s)

cont <- data.frame(contNormal = rnorm(100), # Standard Normal
                   contGamma = rgamma(100,2), # Gamma Distributed
                   contBinom = rbinom(100, 1, 0.4), # Bernoulli trials
                   contExpon = exp(rnorm(100, sd = 50)), # Exponentiated normal
                   contWide = runif(100,-9e99,9e99), # Very wide interval
                   contNarrow = runif(100,-1e-99,1e-99), # Very narrow
                   contOutlier = sample(c(rnorm(95), # With outliers
                                          c(12,-23,4.5,5.7,-3.12)),100),
                   contcor1 = mvn[,1], # Multivariate normal with cor 0.68
                   contcor2 = mvn[,2])

Factors

fac <- data.frame(facGender = factor(sample(rep(c("m", "f"), 50), replace = F)),
                  facExperim = factor(rep(c("control", "experimental"), 50)),
                  facFive = factor(rep(1:5, 20)),
                  facFifty = factor(c(1:50,1:50)),
                  facOutlier = factor(c(rep(c("f1","f2"),49), "f3",
                                        "totallyridiculoussuperlongfactorname")))

Debug

col <- rbeta(100, 23, 12)
eq <- rnorm(100,10,2.5) * rgamma(100,1)

deb <- data.frame(debString = sample(letters, 100, T), # Random letter string
                  debMiss1 = sample(c(rnorm(99,10,25), NA)), # Various # Missing
                  debMiss30 = sample(c(rnorm(70,10,25), rep(NA,30))),
                  debMiss80 = sample(c(rnorm(20,10,25), rep(NA,80))),
                  debMiss99 = sample(c(rnorm(1,10,25), rep(NA,99))),
                  debBinMiss20 = sample(c(rbinom(80,1,0.6), rep(NA, 20))),
                  debNaN = rep(NaN, 100), # All NaN
                  debNaN10 = sample(c(rnorm(90,10,25), rep(NaN,10))), # 10 NaN
                  debInf = rep(Inf, 100), # All Inf values
                  debCollin1 = col, # Three multicollinear variables
                  debCollin2 = col + 2,
                  debCollin3 = col * 2,
                  debEqual1 = eq, # Two exactly equal variables
                  debEqual2 = eq,
                  debSame = rep(12.3,100)) # Exactly the same value 100 times
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment