Skip to content

Instantly share code, notes, and snippets.

@okabasakal88
Forked from dcinzona/rdbmsImport
Created October 4, 2016 13:10
Show Gist options
  • Save okabasakal88/8ae7b1656948fb9da3bd70942adf0f57 to your computer and use it in GitHub Desktop.
Save okabasakal88/8ae7b1656948fb9da3bd70942adf0f57 to your computer and use it in GitHub Desktop.
Neo4j RDBMS Import
= SQL --> Neo4j
The main goal is to be able to import data from SQL (or similar RDBMS) into Neo4j, while maintaining Foreign Keys as relationships, using Cypher. The reason behind using cypher vs. the batch importer is because I want to build a system that allows for multiple imports and updates without having to batch process it.
//console
= Table Structure Requirements:
1. Primary Key column names should use the pattern [TableName]+Id, so for a Contact table, the primary key field should be names ContactId. This is to prevent overlap of Node Labels.
[source,cypher]
----
FOREACH (i in RANGE(0,25) |
CREATE (n:Contact{ContactId:i, ContactChild1Id:round(rand()*(10-1)), ContactChild2Id: round(rand()*(30-1))})
)
FOREACH (i in RANGE(0,10) |
CREATE (n:ContactChild1 {ContactChild1Id:i}))
FOREACH (i in RANGE(0,30) |
CREATE (n:ContactChild2 {ContactChild2Id:i}))
FOREACH (i in RANGE(0,5) |
CREATE (n:ContactParent {ContactParentId:i, ContactId:round(rand()*(25-1))}))
----
= Import Requirements
:Define children as tables related to a parent table by Foreign Key
1. Primary Key properties should be set as unique constraints on the nodes
2. Children always relate to the parent
3. The Relationship Type should match the Parent Node Label
4. Store the node Primary Key properties as properties in the Relationship
Match Contact to Child 1
[source,cypher]
----
MATCH (p:Contact), (c:ContactChild1 {ContactChild1Id:p.ContactChild1Id})
MERGE (c)-[:Contact {ContactId:p.ContactId,ContactChild1Id:c.ContactChild1Id}]->(p)
----
Match Contact to Child 2
[source, cypher]
----
MATCH (p2:Contact), (c2:ContactChild2 {ContactChild2Id:p2.ContactChild2Id})
MERGE (c2)-[:Contact {ContactId:p2.ContactId,ContactChild2Id:c2.ContactChild2Id}]->(p2)
----
Match ContactParent to Contact
[source, cypher]
----
MATCH (cp:ContactParent), (cc:Contact {ContactId:cp.ContactId})
MERGE (cc)-[:ContactParent {ContactParent:cp.ContactParentId,ContactId:cc.ContactId}]->(cp)
----
Return all contacts and their children for ContactParent with Id 1
[source, cypher]
----
MATCH path =(cc)-->(c:Contact)-->(p:ContactParent { ContactParentId:1 })
RETURN p, collect(DISTINCT c), collect(cc)
LIMIT 5
----
Any thoughts on how to expand this, or improve performance when batching are appreciated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment