tekiegirl/ImportDistinctDataWithRelationships.adoc

## ImportDistinctDataWithRelationships.adoc

      
    Raw
  

              ImportDistinctDataWithRelationships.adoc
            
          
    Import Distinct Data With Relationships


Import Distinct Data from a CSV file, and create relationships


Here is an example of importing distinct data from a CSV file, and creating relationships using that data.


Graph Population


This method is more efficient than just using MERGE. It never tries to match any duplicates from the csv file as they are filtered out beforehand. It still uses MERGE to ensure that duplicate nodes are not created, but in this situation this would only be required if the csv file was loaded more than once.


CREATE INDEX ON :Person(id);

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/2900504/people2.csv" AS line

WITH DISTINCT line
MERGE (followed:Person {id: toInt(line.followed_id)})
ON CREATE
  SET followed.status = line.status, followed.created_at = line.created_at
ON MATCH
  SET followed.status = line.status, followed.created_at = line.created_at
MERGE (follower:Person {id: toInt(line.follower_id)})
CREATE UNIQUE (follower)-[:Following]->(followed)


Resulting Persons and relationships


MATCH (p1:Person)
OPTIONAL MATCH (p1)-[r]->(p2:Person)
RETURN p1.id AS Person1, type(r) AS Relationship, p2.id AS Person2


Resulting Person nodes


MATCH (p:Person)
RETURN p.id AS Id, p.status AS Status, p.created_at AS CreatedAt


Notes


CREATE INDEX ON :Person(id); provides for faster searching when matching on id.


USING PERIODIC COMMIT 1000 is used to ensure that memory is not filled up before the results of this load are committed to the database.


The ON CREATE and ON MATCH statements do the same thing, so that if a Person node is created as a result of being the follower_id, it still gets its status and created_at data added when that information is loaded from the CSV.


Content of the csv file


id,follower_id,followed_id,created_at,status
1,1,2,1462439147,active
2,2,3,1462439148,active
3,2,1,1462439149,active
1,3,4,1462439150,active
2,4,3,1462439151,active