Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Research Data Metadata Extraction - GSoc 2017

Research Data Metadata Extraction

The overall code are shown as follows.

Here is a zip file containing all patches.

The code is split into three phases.

Phase One

In this phase, I get familiar with the structure of the Zenodo project and build a mock UI.

Phase Two

In this phase, I use the Grobid to implement a pdf metadata extractor and move the code into the invenio-files-processor module.

Phase Three

In this phase, I update the UI and integrate the OpenAIRE mining service to extract funding information from the pdf files.

For the Zenodo part,

For the invenio-files-processor module,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment