stain/00-review-datasciencehub-549-1529.md

## 00-review-datasciencehub-549-1529.md

      
    Raw
  

              00-review-datasciencehub-549-1529.md
            
          
    Review of "Scholarly data analysis to aid scientific model development" (#549-1529)


Permalink: https://gist.github.com/stain/4cd4cb1763a4ac57f1de270aa6f1a996 (this review)

also at https://datasciencehub.net/paper/scholarly-data-analysis-aid-scientific-model-development-0#edit-fset3


Authors: 	Gabriele Scalia, Matteo Pelucchi, Alessandro Stagni, Alberto Cuoci, Tiziano Faravelli, Barbara Pernici
Title: Scholarly data analysis to aid scientific model development
Submitted to: Data Science (Research Paper)

Special issue: Selection of extended papers from SAVE-SD 2017 and Save-SD 2018
Submitted on: 2018-12-16
Reviewer guidelines


Responsible editor: Silvio Peroni
Reviewed by: Stian Soiland-Reyes (3/3)

Reviewed on: 2019-02-06


Submitted version: https://datasciencehub.net/paper/scholarly-data-analysis-aid-scientific-model-development-0 [pdf]

Previous version (workshop paper)
Annotated version [pdf]


Dataset/source code: https://doi.org/10.5281/zenodo.2629359 aka https://github.com/sciexpem/sciexpem
Outcome: Undecided 2019-05-06 (requesting revision)

Revised version accepted 2019-03-19
Published version: https://doi.org/10.3233/DS-190017 "Towards a scientific data framework to support scientific model development" (2019-04-24)


Evaluation


Overall impression: Good
Suggested decision: Accept
Reviewer's confidence: Medium

Significance

Does the work address an important problem within the research fields covered by the journal?
High significance
Background

Is the work appropriately based on and connected to the relevant related work?
Reasonable
Novelty

Does the work provide new insights or new methods of a substantial kind?)
Limited novelty
Technical Quality

Are the methods adequate for the addressed problem, are they correctly and thoroughly applied, and are their results interpreted in a sound manner?)
Good
Presentation

Are the text, figures, and tables of the work accessible, pleasant to read, clearly structured, and free of major errors in grammar or style?)
Weak
Length of the manuscript

The length of this manuscript is about right
Data availability

Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Summary of paper in a few sentences

The authors address the development of scientific models from experimental data,
focusing on automation and semantic data integration from a use case of
chemical kinetics models, but deriving requirements for a framework that I would argue
is general enough to apply for any model/experiment research across domains (e.g. systems biology).
The paper also presents a service-oriented architecture to address the
requirements, which has been partially implemented in a prototype.
The prototype is shown by screenshot only (no name, URL or source code cited).
Additional requirements for future work are laid out,
summarising potential and existing methods from literature.
Reasons to accept


Identified requirements are generalizable to any modelling domain
Well-founded reasoning behind arguments
Domain example (kinetic combustion modelling) explained well

Reasons to reject


Language: Several grammar and phrasing issues (see attached PDF)
Length: Some repetition across sections (see PDF)
Confusion between "proposed architecture" and what has been implemented in
Architectural choices not clearLy derived from requirements (e.g. SOA for functions)
No source code or URL provided for developed prototype

Further comments

Overview

The main value I find in this article is that it identifies and describes well requirements for experiment-based model development, and in particular showing the issues that must be addressed when automating and scaling up such research across multiple open data sources. As I think this would apply across domains, I would have liked some citation to similar work in automating modelling work for other fields, for instance in systems biology.
I think this paper should be accepted following a minor revision. Some more concern must be placed on the language.
A detailed annotated PDF is attached in the web version of this review at https://gist.github.com/stain/4cd4cb1763a4ac57f1de270aa6f1a996
Language

The presentation of this article is generally good and well reasoned, however the grammar is of varying quality and so the language can get confusing at places. I have suggested numerous small modifications in the attached PDF (ISO5776 notation), some of which I hope will simplify the text where I identified repetition or unnecessary phrasings.
As I see was pointed out in SAVE-SD 2018 open peer reviews https://save-sd.github.io/2018/papers.html, it is odd to use exponential notation for small numbers. I understand the intention is to show scale rather than actual values or proportions, so I suggest changing them to "scale in the hundreds", "..thousands" and "..hundred thousands".
Architecture section

The wording in the section of "Proposed Architecture" is floating between describing a potential general architecture ("It could be translated in the future") and features of the existing developed prototype
("the database has been designed..to privilege performance").
While I can read between the lines that the architecture was partially derived from the development of the prototype (which is good), this section attempts to give the opposite picture. This means a tension is artificially introduced that confuses the reader as to what parts of the architecture has been realized or not.
I suggest to be more concrete in the architecture section and focus on what has been implemented. The other design ideas are well reasoned and should be kept, but I would move them to a new subsection on future architectural work. This will show more clearly the distinction between features you can prove with the prototype and potential benefits which implementation (e.g exploratory OLAPs) may have hidden pitfalls yet to be discovered.
I have some questions on the choice of Service-Oriented Architecture. I understand the authors wanted to support multiple modelling systems and data formats, and so argue that individual workflow functions should be services to facilitate interoperability.  While I certainly recognize this reasoning (as a developer of the Web Service-based workflow system Apache Taverna), I would also disagree with the argument that simply using SOA means data interoperability is easy.
(Lack of) availability

The article focuses for a large part around development of a prototype. Yet, this prototype seem not to be available except for a couple of screenshots.
From https://datasciencehub.net/content/guidelines-authors

All relevant data that were used or produced for conducting the work presented in a paper must be made FAIR and compliant with the PLOS data availability guidelines prior to submission.

In addition to a URL,  I would highly recommend the authors to provide Open Source code of the developed prototype.
An associated Zenodo DOI https://guides.github.com/activities/citable-code/ can then be used as a Code Citation from the paper.
Comments to the editor (optional)


## ds-paper-549-annotated-ssr.pdf

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              ds-paper-549-annotated-ssr.pdf
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.