Skip to content

Instantly share code, notes, and snippets.

View VladimirAlexiev's full-sized avatar

Vladimir Alexiev VladimirAlexiev

View GitHub Profile

Crunchbase Challenge

Here's a challenge to the KG Construction CG:

  • Take Crunchbase: 10.5M rows, across 18 tables, served as CSV, updated daily.
  • The data of some nodes comes from multiple tables (eg Organization from organizations, org_parents, org_descriptions)
  • RDFize and store the total dataset, in under 1-2 hours time
    • Using the approach described here, GraphDB 9.11 with OntoRefine takes 76-119 minutes (1.3-2 hours) depending on hardware to produce and load 138M triples (19-30k triples per second)
  • Update the data daily, replacing the data of recently updated rows.
    • Using the approach described here, it takes about 15 minutes to update all of Crunchbase
  • Do it with your favorite RDFization toolkit, and preferably do it declaratively
@VladimirAlexiev
VladimirAlexiev / README.md
Last active December 9, 2021 07:10
Count VIAF links; get new N6I identifiers

This is related to https://www.wikidata.org/wiki/Wikidata:Property_proposal/National_Library_of_Ireland_ID:

  • discussion whether it's worth creating such WD external-id property (again)

VIAF links per contributing library:

Analyze N6I (National Library of Ireland) in particular:

  • Find long (i.e. recently allocated) VIAF IDs that have a N6I link
curl -o "A BIM-based Building Circularity Assessment tool for the early design stage (LDAC 2021).pdf" https://itc.scix.net/pdfs/w78-2021-paper-049.pdf
curl -o "A Framework for Producing Information Container for linked Document Delivery (ICDD) (LDAC 2021 pres).pdf" https://linkedbuildingdata.net/ldac2021/files/presentations/LDAC2021_ManosArgyris.pdf
curl -o "A Linked Building Data Approach to Site Planning and Managing Temporary Construction Items (LDAC 2021 pres).pdf" https://linkedbuildingdata.net/ldac2021/files/presentations/LDAC2021_Schlachter.pdf
curl -o "A Linked Building Data Approach to Site Planning and Managing Temporary Construction Items (LDAC 2021).pdf" https://linkedbuildingdata.net/ldac2021/files/papers/CIB_W78_2021_pap
@VladimirAlexiev
VladimirAlexiev / get.sh
Last active October 11, 2021 13:30
get individuals for SPARQL Service Description
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/Format > format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/JSON-LsD >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/N3 >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/N-Triples >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/N-Quads >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/LD_Patch >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/microdata >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/OWL_XML >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/OWL_Functional >> format.ttl
curl -LsHaccept:text/turtle http://www.w3.org/ns/formats/OWL_Manchester >> format.ttl
@VladimirAlexiev
VladimirAlexiev / GAIAX-members.tsv
Last active August 24, 2021 13:19
GAIAX members as a table (from https://www.gaia-x.eu/members, 2021-08-24)
Role Country Website Company
Founding Member France https://amadeus.com/fr AMADEUS SAS
Founding Member France https://atos.net/en/ ATOS INTERNATIONAL SAS
Founding Member France https://cispe.cloud/ CLOUD INFRASTRUCTURE SERVICES PROVIDERS IN EUROPE (CISPE) ASBL
Founding Member France https://www.docaposte.com/ DOCAPOSTE SAS
Founding Member France https://www.edf.fr/ ÉLECTRICITÉ DE FRANCE (EDF) SA
Founding Member France https://www.imt.fr/imt/presentation/statut-et-decret/ INSTITUT MINES-TÉLÉCOM EPSCP
Founding Member France https://www.orange-business.com/en ORANGE BUSINESS SERVICES SA
Founding Member France https://en.outscale.com/ OUTSCALE SASU
Founding Member France https://www.ovh.com/world/ OVH SAS
@VladimirAlexiev
VladimirAlexiev / bibtex-sort-fix-names.el
Last active October 30, 2021 13:15
sort bibtex buffer chronologically; replace , ; & with `and` in author/editor fields
(defun my-bibtex-get-year ()
(bibtex-autokey-get-field "year"))
(defun my-bibtex-sort-chronologically (ascending)
"Sort bibtex buffer chronologically.
Use descending order by default; prefix arg ASCENDING specifies ascending order.
Does not work with bibtex-maintain-sorted-entries, that sorts only by key."
(interactive "P")
;; stolen from bibtex-sort-buffer
(bibtex-beginning-of-first-entry) ; Needed by `sort-subr'
@VladimirAlexiev
VladimirAlexiev / model.puml.ttl
Last active June 30, 2021 13:44
Energy Transparency Knowledge Graph, comparing Ontotext `rdfpuml` and Zazuko `spex`
# rdfpuml rendering instructions, not part of the real model
tr:assetType puml:arrow puml:lightblue.
tr:biddingZone puml:arrow puml:up-lightblue.
tr:codeList puml:arrow puml:green.
tr:controlArea puml:arrow puml:up-lightblue.
tr:generatingUnit puml:arrow puml:lightblue.
tr:highVoltageLimit puml:arrow puml:lightblue.
tr:nominalP puml:arrow puml:lightblue.
tr:parentResource puml:arrow puml:up-red.
@VladimirAlexiev
VladimirAlexiev / EUnet4DBPws.bib
Created May 26, 2021 13:40
Bibliographies for LDAC 2012-2020 and EUnet4DBP 2021
@Proceedings{EUnet4DBP-2021,
title = {EUnet4DBP International workshop on Digital Building Permit},
year = 2021,
editor = {TU Delft and EuroSDR and euBIM and buildingSmart RR},
month = mar,
url = {https://drive.google.com/file/d/1K27fql7wqaPlupHJp7mf8V227wpbp7xK/view},
date = {2021-03-25},
}
@InProceedings{EUnet4DBP-2021-pres,
@VladimirAlexiev
VladimirAlexiev / README.md
Created February 5, 2021 14:16
run pylode ontology doc gen on schema.org

https://github.com/RDFLib/pyLODE

RDFLib/pyLODE#108 : better support for SDO constructs

wget https://schema.org/version/latest/schemaorg-current-http.ttl 
# https://github.com/schemaorg/schemaorg/issues/2831 : add owl:Ontology and rdfs:isDefinedBy
diff -bu0 schemaorg-current-http.ttl schema-with-added-ontology.ttl > schema-add-ontology.diff
pip install pylode
time pylode -i schema-with-added-ontology.ttl -c true -o schema.html