Final report of the work done for the GG Extraction Project for GFOSS – Open Technologies Alliance as part of GSoC 2018.
The purpose of this project was the identification and extraction of Government Directorates and Divisions
with the responsibilities assigned to them in a machine-readable format, as well as the extraction of related metadata.
To accomplish it, three main sets of functionalities were implemented:
- ML RespA Classifiers (https://github.com/eellak/gsoc2018-GG-extraction/wiki/Implementation#respa-classifiers)
- Unit - RespAs Extraction (https://github.com/eellak/gsoc2018-GG-extraction/wiki/Implementation#extraction-methods)
- Data / Metadata Extraction (https://github.com/eellak/gsoc2018-GG-extraction/wiki/Implementation#metadata-extraction)
Repo: (https://github.com/eellak/gsoc2018-GG-extraction)
A general outline: (https://github.com/eellak/gsoc2018-GG-extraction/blob/master/README.md)
A detailed outline regarding Implementation, Usage and more:
(https://github.com/eellak/gsoc2018-GG-extraction/wiki)
My progress can be found at the Projects tab:
(https://github.com/eellak/gsoc2018-GG-extraction/projects)
As mentioned here: (https://github.com/eellak/gsoc2018-GG-extraction/wiki/Improvement-Ideas)
- Resolve metadata extractor issues (mentioned in Issues)
- Add db support
- Debug and fix signee extraction
- Extend RespA section detection in RespA Decision Issues
(ΦΕΚ Αποφάσεων που περιέχουν αναθέσεις αρμοδιοτήτων/καθηκόντων) - Devise a non-manual detection scheme using only the ML classifiers
- Attempt a merge with one or more great relevant projects such as: