Skip to content

Instantly share code, notes, and snippets.

@tekiegirl
Last active May 5, 2016 14:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tekiegirl/def802e048a3b9a3acbc798fb4e6037d to your computer and use it in GitHub Desktop.
Save tekiegirl/def802e048a3b9a3acbc798fb4e6037d to your computer and use it in GitHub Desktop.

Import Distinct Data With Relationships

Import Distinct Data from a CSV file, and create relationships

Here is an example of importing distinct data from a CSV file, and creating relationships using that data.

Graph Population

This method is more efficient than just using MERGE. It never tries to match any duplicates from the csv file as they are filtered out beforehand. It still uses MERGE to ensure that duplicate nodes are not created, but in this situation this would only be required if the csv file was loaded more than once.

CREATE INDEX ON :Person(id);

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/2900504/people2.csv" AS line

WITH DISTINCT line
MERGE (followed:Person {id: toInt(line.followed_id)})
ON CREATE
  SET followed.status = line.status, followed.created_at = line.created_at
ON MATCH
  SET followed.status = line.status, followed.created_at = line.created_at
MERGE (follower:Person {id: toInt(line.follower_id)})
CREATE UNIQUE (follower)-[:Following]->(followed)

Resulting Persons and relationships

MATCH (p1:Person)
OPTIONAL MATCH (p1)-[r]->(p2:Person)
RETURN p1.id AS Person1, type(r) AS Relationship, p2.id AS Person2

Resulting Person nodes

MATCH (p:Person)
RETURN p.id AS Id, p.status AS Status, p.created_at AS CreatedAt

Notes

CREATE INDEX ON :Person(id); provides for faster searching when matching on id.

USING PERIODIC COMMIT 1000 is used to ensure that memory is not filled up before the results of this load are committed to the database.

The ON CREATE and ON MATCH statements do the same thing, so that if a Person node is created as a result of being the follower_id, it still gets its status and created_at data added when that information is loaded from the CSV.

Content of the csv file

id,follower_id,followed_id,created_at,status 1,1,2,1462439147,active 2,2,3,1462439148,active 3,2,1,1462439149,active 1,3,4,1462439150,active 2,4,3,1462439151,active

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment