Here's a challenge to the KG Construction CG:
- Take Crunchbase: 10.5M rows, across 18 tables, served as CSV, updated daily.
- The data of some nodes comes from multiple tables (eg Organization from
organizations, org_parents, org_descriptions
) - RDFize and store the total dataset, in under 1-2 hours time
- Using the approach described here, GraphDB 9.11 with OntoRefine takes 76-119 minutes (1.3-2 hours) depending on hardware to produce and load 138M triples (19-30k triples per second)
- Update the data daily, replacing the data of recently updated rows.
- Using the approach described here, it takes about 15 minutes to update all of Crunchbase
- Do it with your favorite RDFization toolkit, and preferably do it declaratively