Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save simoami/1ed338cd820b11ab2ad1b98e805bd1b6 to your computer and use it in GitHub Desktop.
Save simoami/1ed338cd820b11ab2ad1b98e805bd1b6 to your computer and use it in GitHub Desktop.
SOCIAL ACTIVITY STREAM GRAPH

SOCIAL ACTIVITY STREAM GRAPH

Introduction

In the quest for an efficient representation of an activity stream that is optimal for social media applications, I evaluated multiple graph representations. The goal is to produce a graph model that conforms to the specification in activitystrea.ms.

The following approaches where taken into consideration:

  1. National Geographic: http://natgeo.github.io/activitystreams In their model, each node label is named after the object type. For example a blog node would be labeled Blog, whereas a user node would be labeled User: tfCKKK2

    • In their implementation, entities nodes be cross-linked from many activities. While this helps optimize the number of nodes the system creates, it makes it more difficult to perform a delete operation, since any object (actor, object or target) can be referenced elsewhere, and therefore risk leaving orphan nodes in the system. Also, I’ve not seen a relationship to a parent entity, say a user or a feed group. They are rather isolated activities. I’m not sure how one would query for multiple activities for a given user.

  2. Another common approach is to create a root activity node with actor, object and target as subnodes: fPaNJIi

    • This approach can be useful for cross-referencing objects across multiple activities. But it suffers from the same issue as #1, where each activity can potentially consume up to 4 nodes.

  3. Battlefy: https://blog.battlefy.com/how-battlefy-uses-neo4j-for-activity-feeds-b438d61128ac 0*KdOuW6rvGv4wCdOb

    • This is similar to #2 because activity nodes (called feedItem) have actor and target as subnodes and so it has the same pros/cons. However, what is interesting here is the graph optimisation technique used here. Each activity is linked to the next activity with the latest activity linked to the feed group. As opposed to the conventional approach of linking each activity directly to the user or feed group, this makes scanning and pagination very efficient since all that is needed is to follow the path across [:NEXT] edges. activity insertion or deletion means, breaking the chain to insert the new node to delete an existing one.

  4. GetStream.io Although they don’t expose their graph model, one can conclude that activities are stored as a single node with actor, object and target being reference properties with the format "<ObjectType>:<ObjectId>. For example actor: "User:1" or object: "Activity:24". So an activity node could be entirely represented as the following: (Activity { id: "unique uuid", actor: "User:1", verb: "like", object: "Photo:12", published: 123456789123 }). This makes sense because if an actor was an object with additional metadata, it could quickly become outdated. Stream however has a process called "Enrichement" which it suggests to use in the backend to replace the reference properties with actual objects. You can read more about [Enrichement](https://getstream.io/docs/#enrichment).

The Graph Model

The following graph representation is inspired from getstream.io and https://blog.battlefy.com/how-battlefy-uses-neo4j-for-activity-feeds-b438d61128ac. It uses a single graph node to represent an activity. It also uses the chaining technique to link activities to provide great performance when loading activity feeds without the cost of scanning large sets of nodes.

aONiG3y

Setup

CREATE
  (u1:User {id: 1, name: 'Simo'}),
  (u2:User {id: 2, name: 'Bob'}),
  (u3:User {id: 3, name: 'Carrie'}),
  (u4:User {id: 4, name: 'David'}),
  (u5:User {id: 5, name: 'Emily'}),
  (u1)-[:FOLLOWS]->(u2),
  (u1)-[:FOLLOWS]->(u3),
  (u2)-[:FOLLOWS]->(u4),
  (u2)-[:FOLLOWS]->(u5),
  // Add sample activities
  (a1:Activity {id: 1, title: "Activity 1", verb: "like", actor: "User:1", object: "Comment:1", published: 1482282698056 }),
  (a2:Activity {id: 2, title: "Activity 2", verb: "follow", actor: "User:1", object: "User:2", published: 1482282698054 }),
  (a3:Activity {id: 3, title: "Activity 3", verb: "follow", actor: "User:1", object: "User:3", published: 1482282698027 }),
  (a4:Activity {id: 4, title: "Activity 4", verb: "update-profile", actor: "User:2", object: "Profile:2", published: 1482282698053 }),
  (a5:Activity {id: 5, title: "Activity 5", verb: "follow", actor: "User:2", object: "User:3", published: 1482282698049 }),
  (a6:Activity {id: 6, title: "Activity 6", verb: "follow", actor: "User:2", object: "User:4", published: 1482282698055 }),
  (a7:Activity {id: 7, title: "Activity 7", verb: "add", actor: "User:2", object: "Photo:70", target: "Album:3", published: 1482282698051 }),
  (a8:Activity {id: 8, title: "Activity 8", verb: "comment", actor: "User:3", object: "Comment:7", target: "Post:10", published: 1482282698057 }),
  (a9:Activity {id: 9, title: "Activity 9", verb: "invite", actor: "User:3", object: "User:5", target: "Event:1", published: 1482282698057 }),
  (u1)-[:ACTIVITIES { type: "timeline"}]->(a1)-[:NEXT]->(a2)-[:NEXT]->(a3),
  (u2)-[:ACTIVITIES { type: "timeline"}]->(a4)-[:NEXT]->(a5)-[:NEXT]->(a6)-[:NEXT]->(a7),
  (u3)-[:ACTIVITIES { type: "timeline"}]->(a8)-[:NEXT]->(a9)

RETURN *

Fetch User’s Own Activities (limit of 2 items)

This query demonstrates the use of edge depth parameters for pagination. The [:NEXT*0..1] part can be replaced with dynamic parameters to something like [:NEXT*' + skip + '..' + end + ']. Adjust 0..1 to whatever suits your needs.

MATCH (user:User {id: 1})
MATCH (user)-[:ACTIVITIES]->(latest:Activity)
MATCH (latest)-[:NEXT*0..1]->(item:Activity)
RETURN DISTINCT user.name as Source, item.actor as Actor, item.verb as Verb, item.object as Object, item.target as Target, item.published as Published
ORDER BY Published DESC

Fetch User’s Last 6 Activities from Followings

When fetching data from different users, it is necessary to use the LIMIT clause for pagination since the edge pagination technique only applies to single sources. The trick is to set the edge limits to the same overall page limit.

MATCH (me:User {id: 1})-[:FOLLOWS]->(following:User)
MATCH (user:User)-[:ACTIVITIES {type: "timeline"}]->(latest:Activity)
WHERE user.id = me.id OR user.id = following.id
MATCH (latest)-[:NEXT*..6]->(item:Activity)
RETURN DISTINCT user.name as Source, item.actor as Actor, item.verb as Verb, item.object as Object, item.target as Target, item.published as Published
ORDER BY Published DESC
SKIP 0 LIMIT 6

Insert New Activity

Inserting a new activity can be tricky because you have to create new relashionship and drop old ones not to break the activity chain.

MATCH (me:User {id: 1})
OPTIONAL MATCH (me)-[r:ACTIVITIES {type: "timeline"}]-(secondlatestupdate)
CREATE (me)-[:ACTIVITIES { type: r.type }]->(latest_update:Activity {id: 10, title: "Activity 10", verb: "like", actor: "User:1", object: "Photo:10", published: timestamp() })
DELETE r
WITH latest_update, collect(secondlatestupdate) AS seconds
FOREACH (x IN seconds | CREATE (latest_update)-[:NEXT]->(x))
RETURN latest_update.title AS NewActivity

Inspect the resulting graph:

MATCH (n)
RETURN n

Conclusions

This is by no means the ideal solution but is a good starting point to fulfill the basic requirements for my activity feed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment