Skip to content

Instantly share code, notes, and snippets.

@stain
Last active May 24, 2019 13:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stain/4cd4cb1763a4ac57f1de270aa6f1a996 to your computer and use it in GitHub Desktop.
Save stain/4cd4cb1763a4ac57f1de270aa6f1a996 to your computer and use it in GitHub Desktop.
Review of DataScienceHub submission #549-1529

Review of "Scholarly data analysis to aid scientific model development" (#549-1529)

Evaluation

  • Overall impression: Good
  • Suggested decision: Accept
  • Reviewer's confidence: Medium

Significance

Does the work address an important problem within the research fields covered by the journal?

High significance

Background

Is the work appropriately based on and connected to the relevant related work?

Reasonable

Novelty

Does the work provide new insights or new methods of a substantial kind?)

Limited novelty

Technical Quality

Are the methods adequate for the addressed problem, are they correctly and thoroughly applied, and are their results interpreted in a sound manner?)

Good

Presentation

Are the text, figures, and tables of the work accessible, pleasant to read, clearly structured, and free of major errors in grammar or style?)

Weak

Length of the manuscript

The length of this manuscript is about right

Data availability

Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this

Summary of paper in a few sentences

The authors address the development of scientific models from experimental data, focusing on automation and semantic data integration from a use case of chemical kinetics models, but deriving requirements for a framework that I would argue is general enough to apply for any model/experiment research across domains (e.g. systems biology).

The paper also presents a service-oriented architecture to address the requirements, which has been partially implemented in a prototype. The prototype is shown by screenshot only (no name, URL or source code cited). Additional requirements for future work are laid out, summarising potential and existing methods from literature.

Reasons to accept

  • Identified requirements are generalizable to any modelling domain
  • Well-founded reasoning behind arguments
  • Domain example (kinetic combustion modelling) explained well

Reasons to reject

  • Language: Several grammar and phrasing issues (see attached PDF)
  • Length: Some repetition across sections (see PDF)
  • Confusion between "proposed architecture" and what has been implemented in
  • Architectural choices not clearLy derived from requirements (e.g. SOA for functions)
  • No source code or URL provided for developed prototype

Further comments

Overview

The main value I find in this article is that it identifies and describes well requirements for experiment-based model development, and in particular showing the issues that must be addressed when automating and scaling up such research across multiple open data sources. As I think this would apply across domains, I would have liked some citation to similar work in automating modelling work for other fields, for instance in systems biology.

I think this paper should be accepted following a minor revision. Some more concern must be placed on the language.

A detailed annotated PDF is attached in the web version of this review at https://gist.github.com/stain/4cd4cb1763a4ac57f1de270aa6f1a996

Language

The presentation of this article is generally good and well reasoned, however the grammar is of varying quality and so the language can get confusing at places. I have suggested numerous small modifications in the attached PDF (ISO5776 notation), some of which I hope will simplify the text where I identified repetition or unnecessary phrasings.

As I see was pointed out in SAVE-SD 2018 open peer reviews https://save-sd.github.io/2018/papers.html, it is odd to use exponential notation for small numbers. I understand the intention is to show scale rather than actual values or proportions, so I suggest changing them to "scale in the hundreds", "..thousands" and "..hundred thousands".

Architecture section

The wording in the section of "Proposed Architecture" is floating between describing a potential general architecture ("It could be translated in the future") and features of the existing developed prototype ("the database has been designed..to privilege performance").

While I can read between the lines that the architecture was partially derived from the development of the prototype (which is good), this section attempts to give the opposite picture. This means a tension is artificially introduced that confuses the reader as to what parts of the architecture has been realized or not.

I suggest to be more concrete in the architecture section and focus on what has been implemented. The other design ideas are well reasoned and should be kept, but I would move them to a new subsection on future architectural work. This will show more clearly the distinction between features you can prove with the prototype and potential benefits which implementation (e.g exploratory OLAPs) may have hidden pitfalls yet to be discovered.

I have some questions on the choice of Service-Oriented Architecture. I understand the authors wanted to support multiple modelling systems and data formats, and so argue that individual workflow functions should be services to facilitate interoperability. While I certainly recognize this reasoning (as a developer of the Web Service-based workflow system Apache Taverna), I would also disagree with the argument that simply using SOA means data interoperability is easy.

(Lack of) availability

The article focuses for a large part around development of a prototype. Yet, this prototype seem not to be available except for a couple of screenshots.

From https://datasciencehub.net/content/guidelines-authors

All relevant data that were used or produced for conducting the work presented in a paper must be made FAIR and compliant with the PLOS data availability guidelines prior to submission.

In addition to a URL, I would highly recommend the authors to provide Open Source code of the developed prototype.

An associated Zenodo DOI https://guides.github.com/activities/citable-code/ can then be used as a Code Citation from the paper.

Comments to the editor (optional)

Display the source blob
Display the rendered blob
Raw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment