Skip to content

Instantly share code, notes, and snippets.

@tillmo
Last active September 24, 2016 02:11
Show Gist options
  • Save tillmo/dc97c81e7bef75cbe93b85a0a1153d8e to your computer and use it in GitHub Desktop.
Save tillmo/dc97c81e7bef75cbe93b85a0a1153d8e to your computer and use it in GitHub Desktop.
A new architecture for Ontohub

We have decided to re-implement Ontohub from scratch, because

  • even if there is some decoupling into services, the core service (written in Ruby on Rails 3) is still too monolithic and therefore unmanageably complex. It shall be therefore decoupled into a backend (in Ruby on Rails 5) and a frontend (in ember.js).
  • this further decoupling of services follows the OOR architecture
  • we would need to migrate from Ruby on Rails 3 (which will be no longer maintained) to Ruby on Rails 5 in several steps, which could be even more costly than a re-implementation.
  • we could not get along with many problems in using sidekiq as a background service. We hope that RabbitMQ, a true service broker, will be better here. Moreover, servers providing Hets as a service can be registered more easily
  • new programmers could be attracted, because we start from scratch
  • the ember.js frontend, written in Javascript, could attract more programmers than the former Ruby on Rails core, because ember.js is easier to learn.
  • even a re-implementation can re-use parts of the existing code basis
  • last but not least, the user interface could be re-designed in order to better meet usability

#Cornerstones of the new architecture new Ontohub architecture

  • Separation of backend and frontend:
    • We have a full featured JSON API backend, implemented with the current version of Rails.
    • There is a separate application written with EmberJS that servers as the frontend and communicates with the backend.
  • More distributed communication with Hets:
    • Analysis is done by distributed instances of Hets.
    • OMS-Parsing workers are distributed as well.
    • These two are listening on queues that are managed by RabbitMQ: The backend pushes jobs to the queues and as soon as a worker (analysis or parsing) is free, this worker starts the job. The results of the analysis (JSON output of Hets) are pushed back to RabbitMQ such that the parsing can be started. Parsing results are written directly to the database and indexed in Elasticsearch and the backend is notified about it via RabbitMQ such that the result can be pushed to the frontend via websockets.
  • There are several, controversially discussed ways of integrating git, gitlab, github with Ontohub.

##Onothub components and their git repositories

@jjs0sbw
Copy link

jjs0sbw commented Aug 28, 2016

What is the proposed schedule?

@tillmo
Copy link
Author

tillmo commented Aug 29, 2016

The current schedule is:

  • Sep 5, 2016:
    • descriptions and code skeletons for the six Ontohub components (see above)
    • use cases (as issues in the ontohub-frontend repository)
  • end of October 2016: prototype showing the interaction of the components
    • then decisions about the further development and schedule

@baclawski
Copy link

May I suggest using a content management system rather than a source code repository like github? I don't know much about the services that github provides, but it seems to me that a CM system would be a better match for an ontology repository. Full disclosure: I am on the CMIS standards committee (CMIS is the standard for content management systems).

@luanfg
Copy link

luanfg commented Aug 29, 2016

Is it possible to help in the new development?

@jonquet
Copy link

jonquet commented Aug 29, 2016

Hi guys. 2 things.

We are developing the AgroPortal (https://github.com/agroportal) ontology repository reusing NCBO technology. While doing this, we have updated the metadata model for ontologies (added a bunch of new metadata properties to describe ontologies)... and I think you might be interested in looking at hte vocabularies used. Especially to describe the content of the new OntoHub with similar vocabs. We review 21 vocabularies and 5 portals. We came up with 316 possible properties that we reduced to 124 properties implemented in AgroPortal that we mostly automatically populate by extracting metadata from the original ontology file.

Our new model has been put in production online last July and we are still working on the UIs ;) For instance, for the BIOREFINERY ontology
http://data.agroportal.lirmm.fr/ontologies/BIOREFINERY/latest_submission?display=all
And our work is under submission for EKAW. Please contact me for more information. I will be starting a task force on ontology metadata with a bunch of other interested folks around end of 2016.

The second thing is more general: why would you not consider reusing and then enhancing/capitalizing the NCBO technology in the new Ontohub? to be discussed with @graybeal

@knowlengr
Copy link

@baclawski is spot on RE: CMS. Github's discursive value is inversely proportional the degree of abstraction. Code-talk, OK. E.g., so many considerations do not appear in this thread. Why RoR? This introduction is written as though the design decisions have already been made, and the goal is to seek like-minded developers. That's fine if mere recruitment is the objective, but in this API-first, microservices world, it may unintentionally self-select the probable user community.

@tillmo
Copy link
Author

tillmo commented Sep 12, 2016

@baclawski, @knowlengr: actually, the current decision is to use git (not github!) as a repository backend. The reason is that
1 git provides good version control (I am not sure about whether a CMS can really match this)
2. Ontohub has about 250 users that have already created lots of git repositories,
3. the COLORE repository, one of the largest non-OWL ontology repositories I know, uses git,
4. users can work off-line on their git repositories and then push their commits, which is not possible using a CMS.
Indeed, this last point gives me the feeling that git is much more decentralised than a CMS could be. But maybe I do not know the right CMS?

@tillmo
Copy link
Author

tillmo commented Sep 12, 2016

@jonquet: how much is the NCBO technology tied to OWL? Note that Ontohub shall be a multi-logic repository, i.e. also support FOL, CL etc.

@tillmo
Copy link
Author

tillmo commented Sep 12, 2016

@luanfg: yes, help is very welcome.

@rickmurphy
Copy link

Looks like great work here. I thought folks here might be interested to hear (possibly Fabian) that starting from a few different design constraints, last year I mostly completed a multi cast infrastructure to spread data to interested receivers, then aggregate results. The infrastructure demonstrates type-safe information flow control in monadic regions. Of interest here is that this year I hope to integrate my prior work on the constructive proof of the satisfaction condition of an institution and a Kan extension encoding of concepts, objects and representations. Folks can see some of the work on satisfaction here http://rickmurphy.org/Sat.hs and concepts here http://rickmurphy.org/kan-concepts.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment