Skip to content

Instantly share code, notes, and snippets.

@xiaom
Last active October 13, 2017 12:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xiaom/106b9d111726cc99017b7efe7aead01c to your computer and use it in GitHub Desktop.
Save xiaom/106b9d111726cc99017b7efe7aead01c to your computer and use it in GitHub Desktop.
Research Data Metadata Extraction - GSoc 2017

Research Data Metadata Extraction

The overall code are shown as follows.

Here is a zip file containing all patches.

The code is split into three phases.

Phase One

In this phase, I get familiar with the structure of the Zenodo project and build a mock UI.

Phase Two

In this phase, I use the Grobid to implement a pdf metadata extractor and move the code into the invenio-files-processor module.

Phase Three

In this phase, I update the UI and integrate the OpenAIRE mining service to extract funding information from the pdf files.

For the Zenodo part,

For the invenio-files-processor module,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment