Skip to content

Instantly share code, notes, and snippets.

@Jim-Salmons
Forked from perival/cg-meta-graph.adoc
Last active February 25, 2021 16:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save Jim-Salmons/7163977 to your computer and use it in GitHub Desktop.
Save Jim-Salmons/7163977 to your computer and use it in GitHub Desktop.
Sandbox for exploring the transition from Curt's first take on "self-descriptive" Neo4j databases.

A Simple Meta-Data Model for a Graph Database (Ver. 1.1)

Original by: Curt Gardner v1.0, 30-Sep-2013

Forked by: Jim Salmons v1.1, 06-Dec-2013 - Same basic document but leveraging Neo4j 2.x labels and tweaking queries accordingly. The most dramatic result of this is the elimination of the need for the explicit 'Admin' elements and all things related to the 'owns' relationship.

Note
Curt’s explanatory comments and diagrams from the 1.0 version of this GG do NOT sync with the refactoring I’ve implemented here. You can mentally replace Curt’s explicit 'ownership-related sets of nodes' idiom with subsets based on 2.0’s new label feature. That’s what you’ll see in the tweaked queries. The most visible result of using this important new feature in Neo4j is that the queries from Curt’s original GG produce results that are ONLY about the structure of the data to be contained in the "regular" database. There are no more 'Admin' nodes nor 'owns' relationships that are really elements specific to the embedded metamodel subgraph of a 'self-descriptive' Neo4j database.

Setting up a Meta-Data Framework

This GraphGist is a quick exploration of a simple meta-data administration which can be used to store the structure of the nodes and relationships in a graph database like neo4j.

The data that is set up here could be used in an application layer to provide tailored UI and validation, or could be used simply with Cypher queries as a kind of meta-data dictionary. One may argue that this is pushing too much structure into a graph database, but I think the concept is worth exploring. How well can you query a graph database if you’re not sure of the structure of the data?

This is a fairly quick stab at this, and it could certainly be taken much further to include more about properties, security aspects and much more. Mostly it was a learning exercise for me, and hopefully will generate some thoughtful criticism. Hopefully I haven’t made too many egregious mistakes!

The basic concept is that each node in the graph will have a NodeType, and that NodeType will be represented itself as an 'admin' node. Likewise each relationship will have a RelType and that RelType will also be represented as an 'admin' Node. Then we can further identify for each RelType what types of nodes it can be used with for both the Start and End. Conceptually the 'admin' model looks like this:

Conceptual diagram

To actually realize the concept, I came up with the following model of nodes and relationships. (Note that in the diagrams, the node name is shown first, with the node’s NodeType below in square brackets).

Admin instance diagram

Setting up new Meta-Data for an application

Once the admin infrastructure is in place, setup for any new desired NodeTypes and RelTypes can be done. In this example, assume we will have Person and Date nodes, and will need to be able to create relationships to support capturing a Date hierarchy, marriage ties, and birthdates.

The necessary setup will involve the creation of two new AdminNodeTypes (Person and Date), three new AdminRelTypes (DateIn, Spouse, and Birthdate), and the relationships necessary to link them together:

  • The Node Type Owner Owns Person and Date

  • The Rel Type Owner Owns DateIn, Spouse, and Birthdate

  • For DateIn, the StartNodeType is Date, and the EndNodeType is Date

  • For Spouse, the StartNodeType is Person, and the EndNodeType is Person

  • For Birthdate, the StartNodeType is Person, and the EndNodeType is Date

The result looks like this:

New data graph diagram
//All data Admin setup

// CREATE (ntOwner:META:MODEL:DEPRECATED {name:'Node Type Owner', descr:'Owns all Node Types'})
// CREATE (rtOwner:META:MODEL:DEPRECATED {name:'Rel Type Owner', descr:'Owns all Rel Types'})
// CREATE (admin:META:MODEL:DEPRECATED   {name:'Admin'})

// I think these will be unnecessary
// CREATE (adminNT:META:MODEL:NODE {name:'NodeType'})
// CREATE (adminRT:META:MODEL:NODE {name:'RelType'})

// TO BE DEPRECATED BY USING LABELED SETS
// CREATE (owns:META:MODEL:RELATIONSHIP    {name:'Owns', descr:'Owns'})
// CREATE (startNT:META:MODEL:RELATIONSHIP {name:'StartNodeType'})
// CREATE (endNT:META:MODEL:RELATIONSHIP   {name:'EndNodeType'})

// TO BE DEPRECATED BY USING LABELED SETS
// CREATE rtOwner-[:Owns]->owns
// CREATE rtOwner-[:Owns]->startNT
// CREATE rtOwner-[:Owns]->endNT
// CREATE ntOwner-[:Owns]->admin
// CREATE ntOwner-[:Owns]->adminNT
// CREATE ntOwner-[:Owns]->adminRT

// CREATE owns-[:StartNodeType]->admin
// CREATE owns-[:EndNodeType]->admin
// CREATE owns-[:EndNodeType]->adminNT
// CREATE owns-[:EndNodeType]->adminRT

// CREATE startNT-[:StartNodeType]->adminRT
// CREATE startNT-[:EndNodeType]->adminNT
// CREATE endNT-[:StartNodeType]->adminRT
// CREATE endNT-[:EndNodeType]->adminNT

// This is what we're really after...
//
CREATE (person:META:MODEL:NODE {name:'Person'})
CREATE (date:META:MODEL:NODE   {name:'Date'})

// TO BE DEPRECATED BY USING LABELED SETS
// CREATE ntOwner-[:Owns]->person
// CREATE ntOwner-[:Owns]->date

CREATE (spouse:META:MODEL:RELATIONSHIP    {name:'Spouse'})
CREATE (dateIn:META:MODEL:RELATIONSHIP    {name:'DateIn'})
CREATE (birthdate:META:MODEL:RELATIONSHIP {name:'Birthdate'})

// TO BE DEPRECATED BY USING LABELED SETS
// CREATE rtOwner-[:Owns]->spouse
// CREATE rtOwner-[:Owns]->dateIn
// CREATE rtOwner-[:Owns]->birthdate

CREATE spouse-[:StartNodeType]->person
CREATE spouse-[:EndNodeType]->person
CREATE dateIn-[:StartNodeType]->date
CREATE dateIn-[:EndNodeType]->date
CREATE birthdate-[:StartNodeType]->person
CREATE birthdate-[:EndNodeType]->date

Now some sample queries using this data

Here’s a console for queries:

Get all valid NodeTypes

MATCH (n:META:MODEL:NODE)
RETURN n.name AS NodeType
ORDER BY n.name

Get valid RelTypes for each NodeType

MATCH (r:META:MODEL:RELATIONSHIP)-[:StartNodeType]->n
RETURN n.name AS NodeType, collect(r.name) AS RelTypes
ORDER BY n.name

Get valid Start NodeTypes for each RelType

MATCH (r:META:MODEL:RELATIONSHIP)-[:StartNodeType]->n
RETURN r.name AS RelType, collect(n.name) AS StartNodeTypes
ORDER BY r.name

Get valid End NodeTypes for each RelType

MATCH (r:META:MODEL:RELATIONSHIP)-[:EndNodeType]->n
RETURN r.name AS RelType, collect(n.name) AS EndNodeTypes
ORDER BY r.name

I did not explicitly connect each node to its NodeType via a Relationship, rather its just an implicit tie using the 'type' property on the node. Not sure if there would be benefit to using a relationship…​

Variations of these queries can be used in the validation of Nodes and particularly Relationships to ensure that they are playing by the rules! I’ve built a simple version of a generic UI (html/javascript) for nodes and relationships using PHP for all database access and validation.

End Curt’s Original GG

ADDED: List Relationship Constraints in the Metamodel

Note
Let’s add an altDate type node so the DateIn relationship can demonstrate more than one node type on its start and end points…​
MATCH (d:META:MODEL:RELATIONSHIP  {name:'DateIn'})
CREATE (altDate:META:MODEL:NODE   {name:'AltDate'})
CREATE d-[:StartNodeType]->altDate
CREATE d-[:EndNodeType]->altDate

And now let’s look at a list of what Relationships are defined in our Metamodel and to which Nodes these Relationships can connect…​

MATCH (nStart)<-[:StartNodeType]-(r:META:MODEL:RELATIONSHIP)-[:EndNodeType]->nEnd
RETURN collect(DISTINCT nStart.name) AS `From Node`, r.name AS Relationship, collect(DISTINCT nEnd.name) AS `To Node`
ORDER BY r.name
Note
The StartNodeType and EndNodeType relationships do not show up here even though they are contained in the overall database. This is because these relationships exist for expressing realtionships between nodes WITHIN the metamodel, not within the "regular" data of the self-describing database.
Note
We’ll explore the ideas Curt started exploring here in a follow-up GraphGist to be submitted as part of the Dec-Jan Domain Model GraphGist Challenge. In this follow-up GraphGist we’ll be exploring a use case related to the FactMiners social-game ecosystem which is part of The Softalk Apple Project (www.SoftalkApple.com).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment