Research Data Metadata Extraction
The overall code are shown as follows.
- Closed pull requests for the Zenodo project
- invenio-files-processor module
Here is a zip file containing all patches.
The code is split into three phases.
In this phase, I get familiar with the structure of the Zenodo project and build a mock UI.
In this phase, I use the Grobid to implement a pdf metadata extractor and move the code into the invenio-files-processor module.
- metadata-extractor: initial grobid integration
- metadata-extractor: fix mistakes in previous commit.
- Update and prepare for the release of the invenio-files-processor module. The commits are here
In this phase, I update the UI and integrate the OpenAIRE mining service to extract funding information from the pdf files.
For the Zenodo part,
- metadata-extractor: update the files-processor endpoint
- metadata-extractor: extractor metadata modal
- deposit-ui: hide a field if the extracted data is null and allow add a single creator/keyword
For the invenio-files-processor module,