Review of "Scholarly data analysis to aid scientific model development" (#549-1529)
- Permalink: https://gist.github.com/stain/4cd4cb1763a4ac57f1de270aa6f1a996 (this review)
- Authors: Gabriele Scalia, Matteo Pelucchi, Alessandro Stagni, Alberto Cuoci, Tiziano Faravelli, Barbara Pernici
- Title: Scholarly data analysis to aid scientific model development
- Submitted to: Data Science (Research Paper)
- Responsible editor: Silvio Peroni
- Reviewed by: Stian Soiland-Reyes (3/3)
- Reviewed on: 2019-02-06
- Submitted version: https://datasciencehub.net/paper/scholarly-data-analysis-aid-scientific-model-development-0 [pdf]
- Dataset/source code: https://doi.org/10.5281/zenodo.2629359 aka https://github.com/sciexpem/sciexpem
- Outcome: Undecided 2019-05-06 (requesting revision)
- Overall impression: Good
- Suggested decision: Accept
- Reviewer's confidence: Medium
Does the work address an important problem within the research fields covered by the journal?
Is the work appropriately based on and connected to the relevant related work?
Does the work provide new insights or new methods of a substantial kind?)
Are the methods adequate for the addressed problem, are they correctly and thoroughly applied, and are their results interpreted in a sound manner?)
Are the text, figures, and tables of the work accessible, pleasant to read, clearly structured, and free of major errors in grammar or style?)
Length of the manuscript
The length of this manuscript is about right
Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Summary of paper in a few sentences
The authors address the development of scientific models from experimental data, focusing on automation and semantic data integration from a use case of chemical kinetics models, but deriving requirements for a framework that I would argue is general enough to apply for any model/experiment research across domains (e.g. systems biology).
The paper also presents a service-oriented architecture to address the requirements, which has been partially implemented in a prototype. The prototype is shown by screenshot only (no name, URL or source code cited). Additional requirements for future work are laid out, summarising potential and existing methods from literature.
Reasons to accept
- Identified requirements are generalizable to any modelling domain
- Well-founded reasoning behind arguments
- Domain example (kinetic combustion modelling) explained well
Reasons to reject
- Language: Several grammar and phrasing issues (see attached PDF)
- Length: Some repetition across sections (see PDF)
- Confusion between "proposed architecture" and what has been implemented in
- Architectural choices not clearLy derived from requirements (e.g. SOA for functions)
- No source code or URL provided for developed prototype
The main value I find in this article is that it identifies and describes well requirements for experiment-based model development, and in particular showing the issues that must be addressed when automating and scaling up such research across multiple open data sources. As I think this would apply across domains, I would have liked some citation to similar work in automating modelling work for other fields, for instance in systems biology.
I think this paper should be accepted following a minor revision. Some more concern must be placed on the language.
A detailed annotated PDF is attached in the web version of this review at https://gist.github.com/stain/4cd4cb1763a4ac57f1de270aa6f1a996
The presentation of this article is generally good and well reasoned, however the grammar is of varying quality and so the language can get confusing at places. I have suggested numerous small modifications in the attached PDF (ISO5776 notation), some of which I hope will simplify the text where I identified repetition or unnecessary phrasings.
As I see was pointed out in SAVE-SD 2018 open peer reviews https://save-sd.github.io/2018/papers.html, it is odd to use exponential notation for small numbers. I understand the intention is to show scale rather than actual values or proportions, so I suggest changing them to "scale in the hundreds", "..thousands" and "..hundred thousands".
The wording in the section of "Proposed Architecture" is floating between describing a potential general architecture ("It could be translated in the future") and features of the existing developed prototype ("the database has been designed..to privilege performance").
While I can read between the lines that the architecture was partially derived from the development of the prototype (which is good), this section attempts to give the opposite picture. This means a tension is artificially introduced that confuses the reader as to what parts of the architecture has been realized or not.
I suggest to be more concrete in the architecture section and focus on what has been implemented. The other design ideas are well reasoned and should be kept, but I would move them to a new subsection on future architectural work. This will show more clearly the distinction between features you can prove with the prototype and potential benefits which implementation (e.g exploratory OLAPs) may have hidden pitfalls yet to be discovered.
I have some questions on the choice of Service-Oriented Architecture. I understand the authors wanted to support multiple modelling systems and data formats, and so argue that individual workflow functions should be services to facilitate interoperability. While I certainly recognize this reasoning (as a developer of the Web Service-based workflow system Apache Taverna), I would also disagree with the argument that simply using SOA means data interoperability is easy.
(Lack of) availability
The article focuses for a large part around development of a prototype. Yet, this prototype seem not to be available except for a couple of screenshots.
All relevant data that were used or produced for conducting the work presented in a paper must be made FAIR and compliant with the PLOS data availability guidelines prior to submission.
In addition to a URL, I would highly recommend the authors to provide Open Source code of the developed prototype.
An associated Zenodo DOI https://guides.github.com/activities/citable-code/ can then be used as a Code Citation from the paper.