xiaom/Zenodo-Metadata-Extraction.md

## Zenodo-Metadata-Extraction.md

      
    Raw
  

              Zenodo-Metadata-Extraction.md
            
          
    Research Data Metadata Extraction

The overall code are shown as follows.

Closed pull requests for the Zenodo project
invenio-files-processor module

Closed pull requests for the invenio-files-processor module


Here is a zip file containing all patches.
The code is split into three phases.
Phase One

In this phase, I get familiar with the structure of the Zenodo project and build a mock UI.

deposit-ui: mockup button for "extraction" of a metadata
create an invenio-files-processor module

Phase Two

In this phase, I use the Grobid to implement a pdf metadata extractor and move
the code into the invenio-files-processor module.

metadata-extractor: initial grobid integration
metadata-extractor: fix mistakes in previous commit.
Update and prepare for the release of the invenio-files-processor module.
The commits are here

Phase Three

In this phase, I update the UI and integrate the OpenAIRE mining service to extract funding information from the pdf files.
For the Zenodo part,

metadata-extractor: update the files-processor endpoint
metadata-extractor: extractor metadata modal
deposit-ui: hide a field if the extracted data is null and allow add a single creator/keyword

For the invenio-files-processor module,

add unit test
global: integrate OpenAIRE mining service
global: check whether affiliations are empty