Skip to content

Instantly share code, notes, and snippets.

@brad-jones
Last active September 2, 2015 12:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brad-jones/66eecf428239469545fd to your computer and use it in GitHub Desktop.
Save brad-jones/66eecf428239469545fd to your computer and use it in GitHub Desktop.
Brads Brain Dump on Relationships, Graphs, Objects, etc

Brads Brain Dump on Relationships, Graphs, Objects, etc

How to best store all this info on disk?

I want to build an ORM without the mapping bit. ie: We use objects in our code, we relate objects to one another in our code. So lets store these object as they are and not MAP them into some other form.

NOTE: I use json to describe my data structures but it could be in any format.

One To One Example:

So the following example is great for querying but the relationship info is stored twice. Which means updates would need to touch 2 places.

{
    "/Persons/#1": { "Address": "/Addresses/#1" },
    "/Addresses/#1": { "Person": "/Persons/#1" }
}

What about if we consider the relationship an object is it's own right.

{
    "/Persons/#1": { "Address": "/Relations/#1" },
    "/Addresses/#1": { "Person": "/Relations/#1" },
    "/Relations/#1":
    {
        "Type": "1:1",
        "Person": "/Persons/#1",
        "Address": "/Addresses/#1"
    }
}

Now the relationship info is stored once.

One to Many Example:

{
    "/Cars/#1": { "Wheels": "/Relations/#1" },
    "/Wheels/#1": { "Car": "/Relations/#1" },
    "/Wheels/#2": { "Car": "/Relations/#1" },
    "/Wheels/#3": { "Car": "/Relations/#1" },
    "/Wheels/#4": { "Car": "/Relations/#1" },
    "/Relations/#1":
    {
        "Type": "1:M",
        "Car": "/Cars/#1",
        "Wheels":
        [
            "/Wheels/#1",
            "/Wheels/#2",
            "/Wheels/#3",
            "/Wheels/#4"
        ]
    }
}

Many to Many Example

{
    "/Users/#1": { "Groups": "/Relations/#1" },
    "/Users/#2": { "Groups": "/Relations/#1" },
    "/Groups/#1": { "Users": "/Relations/#1" },
    "/Groups/#2": { "Users": "/Relations/#1" },
    "/Relations/#1":
    {
        "Type": "M:M",
        "Map":
        [
            { "User": "/User/#1", "Group": "/Groups/#1" },
            { "User": "/User/#1", "Group": "/Groups/#2" },
            { "User": "/User/#2", "Group": "/Groups/#2" }
        ]
    }
}

There is no way to store Many to Many relationship (not without expressing the relationship data twice) but like how we traditionally do so in a relational database. We could however create indexes for Many to Many relations like this:

{
    "Type": "M:M",
    "/User/#1":
    [
        "/Groups/#1",
        "/Groups/#2"
    ],
    "/User/#2":
    [
        "/Groups/#2"
    ],
    "/Groups/#1":
    [
        "/User/#1"
    ],
    "/Groups/#2":
    [
        "/User/#1",
        "/User/#2"
    ]
}

Querying Relations

Now that relationships are objects themselves we can perform some interesting queries. For example:

  • Find all 1:1 relations that point to a Person.
  • Find all 1:M relations that have a Wheel of size X.

Remember you can you have multiple relationships that point to the same objects. A Person might have a HomeAddress and a WorkAddress. Because relations are just another object we can add further information to them.

{
    "Type": "1:1",
    "Label": "Home",
    "Person": "/Persons/#1",
    "Address": "/Addresses/#1"
}
{
    "Type": "1:1",
    "Label": "Work",
    "Person": "/Persons/#1",
    "Address": "/Addresses/#2"
}

Now I can ask for all Home Addresses or all Work Addresses.

LIGHT BULB MOMENT

I then realised that these "Relationship" objects that I started describing were staring to look very similar to an "Edge" in a graph database. So lets look at how we might store these relationships using Vertices and Edges.

One to One Example (using graph db structure):

{
    "/Persons/#1": { "Address": "/Edges/#1" },
    "/Addresses/#1": { "LandLord": "/Edges/#2" },
    "/Edges/#1": { "in": "/Persons/#1", "out": "/Addresses/#1" },
    "/Edges/#2": { "in": "/Addresses/#1", "out": "/Persons/#1" }
}

One to Many Example (using graph db structure):

{
    "/Cars/#1": {"Wheels": ["/Edges/#1","/Edges/#2","/Edges/#3","/Edges/#4"] },
    "/Wheels/#1": { "Car": "/Edges/#5" },
    "/Wheels/#2": { "Car": "/Edges/#6" },
    "/Wheels/#3": { "Car": "/Edges/#7" },
    "/Wheels/#4": { "Car": "/Edges/#8" },

    "/Edges/#1": { "in": "/Cars/#1", "out": "/Wheels/#1" },
    "/Edges/#2": { "in": "/Cars/#1", "out": "/Wheels/#2" },
    "/Edges/#3": { "in": "/Cars/#1", "out": "/Wheels/#3" },
    "/Edges/#4": { "in": "/Cars/#1", "out": "/Wheels/#4" },

    "/Edges/#5": { "in": "/Wheels/#1", "out": "/Cars/#1" },
    "/Edges/#6": { "in": "/Wheels/#2", "out": "/Cars/#1" },
    "/Edges/#7": { "in": "/Wheels/#3", "out": "/Cars/#1" },
    "/Edges/#8": { "in": "/Wheels/#4", "out": "/Cars/#1" },
}

Many to Many Example (using graph db structure):

{
    "/Users/#1": {"Groups": ["/Edges/#1", "/Edges/#2"] },
    "/Users/#2": {"Groups": ["/Edges/#3"] },

    "/Groups/#1": {"Users": ["/Edges/#4"] },
    "/Groups/#2": {"Users": ["/Edges/#5", "/Edges/#6"] },

    "/Edges/#1": { "in": "/Users/#1", "out": "/Groups/#1" },
    "/Edges/#2": { "in": "/Users/#1", "out": "/Groups/#2" },
    "/Edges/#3": { "in": "/Users/#2", "out": "/Groups/#2" },

    "/Edges/#4": { "in": "/Groups/#1", "out": "/Users/#1" },
    "/Edges/#5": { "in": "/Groups/#2", "out": "/Users/#1" },
    "/Edges/#6": { "in": "/Groups/#2", "out": "/Users/#2" }
}

Thoughts on Graph Db:

Okay so first point is that if we are not going to add any extra info to an "Edge", like a label, then we may as well just define both side of the relationship on the objects themselves and do away with the middle man.

But again the issue I find with Vertices and Edges (or inlining the relationship links as described above) is that you still end up defining the same info twice.

When you say UserA belongs to the Groups A & B as well as UserB belongs only to Group B. Then a human can easily infer the members of those groups without being told any further information.

9 times out of 10, if I query for a group I am going to want to know it's users and if I query for a user I am going to want to know it's groups. Yes I accept that sometimes we only require a one way relationship. But then in those cases one could argue that perhaps that data could be denormalized into it's parent object and then not have the relationship at all...

So can we make a computer think like a human??? Yes we can, it requires a MAP. But this require the computer to think, just like it requires a human to think I guess.

We either do the thinking at "READ" time or at "WRITE" time. Or we get another person (read: thread / process) to do the thinking (read: indexing) and we are happy to "READ" slightly out of date data.

Neo4j BiDirectional Relations

See: https://dzone.com/articles/modelling-data-neo4j

Since one relationship implies the other, this is wasteful, both in terms of space and traversal time. Neo4j can traverse relationships in both directions. More importantly, thanks to the way Neo4j organizes its data, the speed of traversal does not depend on the direction of the relationships being traversed.

How the hell do they do it?

{
    "/Users/#1": {"Groups": ["/Edges/#1", "/Edges/#2"] },
    "/Users/#2": {"Groups": ["/Edges/#3"] },

    "/Groups/#1": {"Users": ["/Edges/#1"] },
    "/Groups/#2": {"Users": ["/Edges/#2", "/Edges/#3"] },

    "/Edges/#1": { "in": "/Users/#1", "out": "/Groups/#1" },
    "/Edges/#2": { "in": "/Users/#1", "out": "/Groups/#2" },
    "/Edges/#3": { "in": "/Users/#2", "out": "/Groups/#2" }
}

Okay so it's not super hard to see how one might infer the relationship.

Another not so bright Light Bulb Moment

Lets say that we don't store any pointer in the object. For example:

{
    "/Users/#1": {"Groups": ["out"]},
    "/Users/#2": {"Groups": ["out"]},

    "/Groups/#1": {"Users": ["out"]},
    "/Groups/#2": {"Users": ["out"]},

    "/Edges/UsersToGroups":
    [
        { "in": "/Users/#1", "out": "/Groups/#1" },
        { "in": "/Users/#1", "out": "/Groups/#2" },
        { "in": "/Users/#2", "out": "/Groups/#2" }
    ]
}

Firstly ["out"] means that we don't have the data, it's outside somewhere. It is also inside an array, this tells us we are expecting a list of something.

Okay all very well and good but now we are just back to the issue of having to loop through an array to search for our relationship mapping. As the size of database grows, this searching will take longer.

We are just going around in circles now hahaha...

Conclusion

Okay so after much thought I have realised that just like an actual User object in C# (insert other OO language here) has a list of Group objects those groups do not know about the User object, unless you specifically tell them (which my current ORM does automatically).

There is no way to store a bi-directional relationship, maintaining one set of links, without further computation at read time. The relationship is bi-directional it's that simple.

So both sides of the relationship need to be stored somewhere on disk. Yes this kinda sorta means the same data is stored twice. But because each object points directly to it's related objects (okay maybe not directly but via an Edge) it is very efficient to update both sides of the relationship when required.

The final Many to Many example:

{
    "/Users/#1": {"Groups": ["/Groups/#1", "/Groups/#2"] },
    "/Users/#2": {"Groups": ["/Groups/#2"] },

    "/Groups/#1": {"Users": ["/Users/#1"] },
    "/Groups/#2": {"Users": ["/Users/#1", "/Users/#2"] }
}

I don't see much point in having Edge documents:

  • They create an extra layer and make it difficult to read the database as a human.

  • They potentially just add more READ and WRITEs to the filesystem.

  • Any properties stored within an Edge are not easily surfaced in a traditional ORM.

  • What would be an Edges "label" is basically the property name of the navigational properties on the objects anyway.

Next Question: How do we make a NON-traditional ORM that works with a Graph DB

Things to consider:

  • I want to mantain static typing, Intellisense is awesome, not sure how I programmed in PHP before...

  • Although in some situations dynamic typing is still handy.

What would the API look like:

public class User : Graceful.Model<User>
{
    public string Name { get; set; }
    public List<IsAMemberOfEdge> Memberships { get; set; }
    public List<HasAccessEdge> HasAccessTo { get; set; }
}

public class Group : Graceful.Model<Group>
{
    public string Type { get; set; }
    public List<IsAMemberOfEdge> Members { get; set; }
    public List<HasAccessEdge> HasAccessTo { get; set; }
}

public class Access : Graceful.Model<Access>
{
    public int Level { get; set; }
    public List<HasAccessEdge> WhoHasAccess { get; set; }
}

public class IsAMemberOfEdge : Graceful.Edge
{
    // These are like our IN and OUT
    public User User { get; set; }
    public Group Group { get; set; }

    public DateTime DateJoined { get; set; }

    // Setting this value could be like soft deleting the relationship...
    public DateTime DateLeft { get; set; }
}

public class HasAccessEdge : Graceful.Edge
{
    public Access Access { get; set; }
    public User User { get; set; }
    public Group Group { get; set; }

    public DateTime Expires { get; set; }
}

var user = new User();
var AGroupThatIBelongTo = user.Memberships[0].Group;
var AnotherGroupThatIBelongTo = user.Memberships[1].Group;
var MySelf = user.Memberships[0].User;
var MySelf = user.Memberships[1].User;

var group = new Group();
var AUserThatIsInMyGroup = group.Members[0].User;
var AnotherUserThatIsInMyGroup = group.Members[1].User;
var MySelf = group.Members[0].Group;
var MySelf = group.Members[1].Group;

// traversing the graph
var someGroup = user.Memberships[0].Group;
var someEdge = someGroup.Members[5];
var someUser = someEdge.User;

// Get the date that someUser joined someGroup
someEdge.DateJoined;

// LINQ
user.Memberships.Where(edge => edge.Group.Type == "Admin");
user.Memberships.Where(edge => edge.DateLeft < DateTime.Now);

user.HasAccessTo.ShortestPath(edge => edge.Access)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment