JanGVoelkel/JASP_Design_New_R_Analysis.md

## JASP_Design_New_R_Analysis.md

      
    Raw
  

              JASP_Design_New_R_Analysis.md
            
          
    JASP - Design for new R Analysis

The main goal of this design is to make as easy as possible for R programmers to write their own analyses and produce the output of the results in an intuitive way.
1. Current state of art

Programming an analysis in R for JASP is currently quite difficult and error-prone. A lot of lines of code is necessary to read the data, handle the state and options, and also to describe the results with complicated meta information. This code is not intuitive for R programmers.
First work was done by Tim de Jong to simplify the task of the R programmer. This work tried as much as possible to remove all meta data from the R analysis itself. This is done by making JASP clever enough to induce from the types (data.frame, plot or list) of the objects returned by the analysis, which output should be displayed. Also, a JSON file is used to specify all extra information needed for the output.
One main problem is that most of the analyses generate results dynamically, which does not fit well with the static approach of a description file (in JSON or another format). This issue constraints us to add possibilities in R to add meta information (contradicting the main goal). The results is that it would make R programmers confused to know when the meta data must be set in R, or when it must be set in a description file. Describing the result of the analysis in a file also sometimes doubles the work: e.g the column names of a table are already in a data.frame, it makes then no sense to be obliged to name them again in the description file.
The JSON file was also used to describe the input (the options of the analysis). This part is also not needed anymore, since we will use QML from Qt to generate the user interface of the analysis options. This QML file is much simpler and better editable than the current UI file we use, so we can describe easily the options directly in QML.
Also the dataset properties added by Tim in the JSON file to describe which variables should be read in factor or scale format can be easily added in the QML file.
So the decision is to give up completely this JSON/descriptive file: only a QML and a R file should suffice to implement an analysis.
Another main problem in the current analyses is the way long computing is handled. The R programmer has to call callbacks in the middle of the analysis, so that the user can already see part of the final output. However, the callback also has to check whether the user did change the options, so the current run must be aborted. This is really cumbersome to program, and another way to do this should be available.
2. Description of the new concepts

The R programmer should be able to describe the output with 'natural' R ojects, i.e. data.frames, plots and lists. These objects do not have all necessary functionalities to describe the output: for example to add overtitle or footnotes to a dataframe. For this we will extend these objects by recreating our own jasp.data.frame, jasp.plot and jasp.container respectively. These objects will work as their 'natural' conterparts, but will have all extra functions needed for describing the output. For example, jasp.data.frame also has an addOvertitle(columns = list(...)) function.
One great advantage of using our own objects, is that we can implement them to generate the callbacks automatically to update the output. This will be completely transparent to the R programmers: e.g. a loop generating a new row in a jasp.data.frame will be direcly displayed in the output:
   myDataFrame <- jasp.data.frame(colnames = c("col1", "col2", ...))
   result[["My Table"]] <- myDataFrame # This add automatically a table in the output.
   for (row in rows) {
     myDataFrame <- rbind(myDataFrame, generateNewRow(row)) 
     result[["My Table"]] <- myDataFrame
     # this new row will be added to myDataFrame and to "My Table".
   }

To interrupt an analysis (because the user aborts it, or changes the options), we can use the callbacks to throw an exception, that will be caught directly by JASP (in common.R): once again, the R programmer does not have to care about this.
The result object will be given as a parameter of the analysis, and won't have to be given back at the end of the R code:
MyAnalysis <- function(result, dataset, options, state, ...) {
   plot <- generateOnePlot(...)
   result[["one plot"]] <- plot
   ...
}

result will in fact be a jasp.container object (that enhances the list object).
Another advantage of this feature, is that the 'init' phase of the analysis is not needed anymore.
As each time that an option is changed, JASP calls the R analysis, it is important that the R code does not compute always all the plots and data.frame objects. Currently this is done by putting the old results in the state and comparing the old options with the new ones. This is not needed anymore, since the result argument will contain the results of the previous call. It will also tell whether a part of the old results is still valid after the change of the options. For this, the R programmer will have to tell on which options an object depends:
MyAnalysis <- function(result, dataset, options, state, ...) {
  if (options[["option1"]] == TRUE && options[["option3"]] == "wide") {
    if (! isValid(results[["one plot"]]) {
      plot <- generateOnePlot(...)
      plot[["depends"]] <- list("option1", "option3")
      # the depends element makes that the isValid("one plot") function returns true if option1 and option3 have not changed.
      results[["one plot"]] <- plot
    }
  }
  ...
}

If an option consists of list of columns (i.e. a variable option), and an object depends on 1 of these columns (e.g. there is one plot per column selected in the option), then the depends argument will be:
  plot[["depends"]] <- list("option1" = list(colname)))

If a table depends on a variable option, and that a new row will be created per column, it would be useful to use the old data.frame so that if a new column is added to the option, the analysis should have only to add a new row to the data.frame. For this the values of the options used for the previous result must be known. This must be passed also by argument to the analysis:
MyAnalysis <- function(result, dataset, options, state, oldOptions, ...) {
  if (options[["variables"]]) {
    if (! isValid(results[["my table"]]) {
         myOldTable <- result[["my table"]]
         # generateTable can check whether a column has been added or removed, and change the jasp.data.frame according. 
         myTable <- generateTable(myOldTable, options[["variables"]], oldOptions[["variables"]])
         myTable[["depends"]] <- list("variables")
         result[["my table"]] <- myTable      
    }
  }
}

The r programmer can add a jasp.data.frame, a jasp.plot or a jasp.container object to a jasp.container (like result). For convenience, jasp.container should also accept data.frame, plot and list objects, so that a simple analysis can be coded directly, without knowing the special JASP interface.
The state is not needed anymore to save the old results and control whether relevant options were changed. However, for some analyses, some complex R objects should be saved to that they don't have to be computed again for each run.