Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@stain
Last active February 20, 2018 11:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stain/7c74026d8741b4ab74da34b1eca0e9bc to your computer and use it in GitHub Desktop.
Save stain/7c74026d8741b4ab74da34b1eca0e9bc to your computer and use it in GitHub Desktop.
Review: LDOW2016 manuscript 6

Review: LDOW2016 manuscript 6

This review is licensed under a Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/

Overall evaluation

  • 1: Weak Accept

This paper compares several machine learning approaches for semantic labelling of relational data sources like CSV files. Compared to earlier approaches this also explores how to add unknown labelling rather than just classification into existing semantic labels.

The authors give great detail on the learning techniques and approaches to sampling and the result achieved when learning, which they claim give reasonable good results even for simple approaches like character frequencies.

The benchmark is available in GitHub as open source, and I was eventually able to install and execute this. However this required a significant amount of effort to install because of the dependency chain. To improve on this, I recommend the authors make their Python libraries and tools available in PyPi for easier installation, as well as provide the Docker image on the Docker Hub. (A Docker Compose file might be needed to combine the different services for the benchmark.)

The source code repositories and datasets has not been made permanently available, e.g. given a DOI with Zenodo's GitHub integration.

For some reason the authors have used datasets of Weapons databases as part of the training data, I find this highly questionable for ethical reasons, and would request that this is not given any prominence in the event of conference presentation of this work.

The paper does not explore if or how any generated Linked Data can be made available on the web, and no Linked Data datasets have been made available. As far as I can tell the main link to the "Web" here has to do with the CSV files previously scraped from websites. As such I am not fully confident in recommending this work as relevant to the LDOW2018 workshop.

Given the above I'm afraid my recommendation is "Weak Accept" - however I believe this work could be valuable to target a venue focusing on benchmarks and forming semantic from unstructured data, such as the ISWC Resources Track.

Reviewer's confidence

4: High

Open Review

I am happy for LDOW to publish my review online.

This open review is also available at the URL https://gist.github.com/stain/7c74026d8741b4ab74da34b1eca0e9bc

Transparent Review

I am Stian Soiland-Reyes http://orcid.org/0000-0001-9842-9718

Confidential remarks for the program committee

As the other reviewers indicate their this paper of being relevant for the workshop I will not be blocking its acceptance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment