johardi/loading-owl-to-neo4j.md

## loading-owl-to-neo4j.md

      
    Raw
  

              loading-owl-to-neo4j.md
            
          
    Loading OWL Axioms Into Neo4j

Date: January 15, 2021
Make sure you have the OWL2LPG-Translator library installed in your local system.
$ pwd
owl2lpg-translator
$ ls
LICENSE				neo4j-plugin-text-analyzer/	owl2lpg-translation-cli/	owl2lpg-translation-exporter/
README.md			owl2lpg-client-api/		owl2lpg-translation-core/	pom.xml

Go to the owl2lpg-translation-cli package and find the application script.
$ cd owl2lpg-translation-cli
$ cd target/owl2lpg-translation-cli-1.0-SNAPSHOT
$ ls
lib/				owl2lpg-translation-cli.jar	run.sh*

If the path is unreachable, you need to unpack the ZIP file first.
$ unzip owl2lpg-translation-cli-1.0-SNAPSHOT-bin.zip

Overview of the translate command

$ ./run.sh translate --help
Usage: owl2lpg translate [-h] [-b=<branchId>] [-d=<ontDocId>] [-f=<format>]
                         [-p=<projectId>] IN_FILE OUT_DIR
      IN_FILE             Input OWL ontology file location
      OUT_DIR             Output directory location
  -b, --branchId=<branchId>
                          Branch identifier
  -d, --documentId=<ontDocId>
                          Ontology document identifier
  -f, --format=<format>   Translation format: bulkcsv, csv (default: bulkcsv)
  -h, --help              Display a help message
  -p, --projectId=<projectId>
                          Project identifier

The command requires an input OWL file and an output directory name.
The --format argument sets the file output format

The "bulkcsv" option will create two CSV files, namely, nodes.csv and edges.csv. Select this format if you are going to use the neo4j-admin tool to load the data offline.
The "csv" option will create 14 CSV files with each represent the different label of nodes and edges. Select this format if you are going to use the APOC.import.csv() method to load the data online.

If the --projectId, --branchId, --documentId arguments are specified then their values will overwrite the default ids.
The program will automatically switch to streaming mode if the input file is larger than 600 MB, otherwise it will use the in-memory mode. However, the streaming mode is available for OBO format only.
Load the CSV files on Neo4j

Two options are available: offline mode or online mode.

The offline mode uses the neo4j-admin tool.

$ neo4j stop
$ neo4j-admin import --database=neo4j --nodes=/tmp/bulkcsv/nodes.csv --relationships=/tmp/bulkcsv/edges.csv --ignore-empty-strings=true --multiline-fields=true


The online mode uses the APOC import library.

CALL apoc.import.csv(
   [{fileName: 'file:/tmp/csv/project.csv', labels: []},
    {fileName: 'file:/tmp/csv/branch.csv', labels: []},
    {fileName: 'file:/tmp/csv/ontology-documents.csv', labels: []},
    {fileName: 'file:/tmp/csv/axioms.csv', labels: []},
    {fileName: 'file:/tmp/csv/cardinality-expressions.csv', labels: []},
    {fileName: 'file:/tmp/csv/entities.csv', labels: []}
    {fileName: 'file:/tmp/csv/anonymous-individuals.csv', labels: []}
    {fileName: 'file:/tmp/csv/literals.csv', labels: []}
    {fileName: 'file:/tmp/csv/iris.csv', labels: []}
    {fileName: 'file:/tmp/csv/other-nodes.csv', labels: []}
   ],
   [{fileName: 'file:/tmp/csv/related-to-edge.csv', type: ''},
    {fileName: 'file:/tmp/csv/next-edges.csv', type: ''},
    {fileName: 'file:/tmp/csv/structural-edges.csv', type: ''}
    {fileName: 'file:/tmp/csv/augmenting-edges.csv', type: ''}
   ],
   {});

Add constraints and indexes on Neo4j

Below is the list of commands to perform such action.
CREATE CONSTRAINT unique_project_id ON (n:Project) ASSERT n.projectId IS UNIQUE;
CREATE CONSTRAINT unique_branch_id ON (n:Branch) ASSERT n.branchId IS UNIQUE;
CREATE CONSTRAINT unique_document_id ON (n:OntologyDocument) ASSERT n.ontologyDocumentId IS UNIQUE;

CREATE CONSTRAINT unique_iri_iri ON (n:IRI) ASSERT n.iri IS UNIQUE;
CREATE CONSTRAINT unique_class_iri ON (n:Class) ASSERT n.iri IS UNIQUE;
CREATE CONSTRAINT unique_data_property_iri ON (n:DataProperty) ASSERT n.iri IS UNIQUE;
CREATE CONSTRAINT unique_object_property_iri ON (n:ObjectProperty) ASSERT n.iri IS UNIQUE;
CREATE CONSTRAINT unique_annotation_property_iri ON (n:AnnotationProperty) ASSERT n.iri IS UNIQUE;
CREATE CONSTRAINT unique_datatype_iri ON (n:Datatype) ASSERT n.iri IS UNIQUE;
CREATE CONSTRAINT unique_individual_iri ON (n:NamedIndividual) ASSERT n.iri IS UNIQUE;

CREATE CONSTRAINT unique_class_oboId ON (n:Class) ASSERT n.oboId IS UNIQUE;
CREATE CONSTRAINT unique_data_property_oboId ON (n:DataProperty) ASSERT n.oboId IS UNIQUE;
CREATE CONSTRAINT unique_object_property_oboId ON (n:ObjectProperty) ASSERT n.oboId IS UNIQUE;
CREATE CONSTRAINT unique_annotation_property_oboId ON (n:AnnotationProperty) ASSERT n.oboId IS UNIQUE;
CREATE CONSTRAINT unique_datatype_oboId ON (n:Datatype) ASSERT n.oboId IS UNIQUE;
CREATE CONSTRAINT unique_individual_oboId ON (n:NamedIndividual) ASSERT n.oboId IS UNIQUE;

CREATE CONSTRAINT unique_axiom_digest ON (n:Axiom) ASSERT n.digest IS UNIQUE;

CREATE INDEX entity_iri_index FOR (n:Entity) ON (n.iri);
CREATE INDEX entity_iriSuffix_index FOR (n:Entity) ON (n.iriSuffix);
CREATE INDEX class_iriSuffix_index FOR (n:Class) ON (n.iriSuffix);
CREATE INDEX data_property_iriSuffix_index FOR (n:DataProperty) ON (n.iriSuffix);
CREATE INDEX object_property_iriSuffix_index FOR (n:ObjectProperty) ON (n.iriSuffix);
CREATE INDEX annotation_property_iriSuffix_index FOR (n:AnnotationProperty) ON (n.iriSuffix);
CREATE INDEX datatype_iriSuffix_index FOR (n:Datatype) ON (n.iriSuffix);
CREATE INDEX individual_iriSuffix_index FOR (n:NamedIndividual) ON (n.iriSuffix);
CREATE INDEX entity_oboId_index FOR (n:Entity) ON (n.oboId);

CREATE INDEX literal_lexicalForm_index FOR (n:Literal) ON (n.lexicalForm);
CREATE INDEX literal_datatype_index FOR (n:Literal) ON (n.datatype);
CREATE INDEX literal_language_index FOR (n:Literal) ON (n.language);

Add the full-text index

Below is the command.
CALL db.index.fulltext.createNodeIndex("annotation_assertion_index",["Literal"],["lexicalForm"], { analyzer: "webprotege-analyzer" });
CALL db.index.fulltext.createNodeIndex("local_name_index",["Entity"],["localName"], { analyzer: "webprotege-analyzer" });
CALL db.index.fulltext.createNodeIndex("prefixed_name_index",["Entity"],["prefixedName"], { analyzer: "webprotege-analyzer" });
CALL db.index.fulltext.createNodeIndex("obo_id_index",["Entity"],["oboId"], { analyzer: "webprotege-analyzer" });

If you get an error message "analyzer not found" then run this command in the terminal.
$ cd neo4j-plugin-text-analyzer
$ cd target
$ cp neo4j-plugin-text-analyzer-1.0-SNAPSHOT.jar NEO4J_HOME/plugins
$ neo4j restart