-
-
Save tomasonjo/52d231a7e18c1a24aaa18e81764bda44 to your computer and use it in GitHub Desktop.
importing into Neo4j Yelp dataset with apoc.load.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CALL apoc.schema.assert( | |
{Category:['name']}, | |
{Business:['id'],User:['id'],Review:['id']}); | |
CALL apoc.periodic.iterate(" | |
CALL apoc.load.json('file:///home/tomasi/Downloads/dataset/business.json') YIELD value RETURN value | |
"," | |
MERGE (b:Business{id:value.business_id}) | |
SET b += apoc.map.clean(value, ['attributes','hours','business_id','categories','address','postal_code'],[]) | |
WITH b,value.categories as categories | |
UNWIND categories as category | |
MERGE (c:Category{id:category}) | |
MERGE (b)-[:IN_CATEGORY]->(c) | |
",{batchSize: 10000, iterateList: true}); | |
CALL apoc.periodic.iterate(" | |
CALL apoc.load.json('file:///ssd/yelp/dataset/tip.json') YIELD value RETURN value | |
"," | |
MATCH (b:Business{id:value.business_id}) | |
MERGE (u:User{id:value.user_id}) | |
MERGE (u)-[:TIP{date:value.date,likes:value.likes}]->(b) | |
",{batchSize: 20000, iterateList: true}); | |
CALL apoc.periodic.iterate(" | |
CALL apoc.load.json('file:///home/tomasi/Downloads/dataset/review.json') YIELD value RETURN value | |
"," | |
MATCH (b:Business{id:value.business_id}) | |
MERGE (u:User{id:value.user_id}) | |
MERGE (r:Review{id:value.review_id}) | |
MERGE (u)-[:WROTE]->(r) | |
MERGE (r)-[:REVIEWS]->(b) | |
SET r += apoc.map.clean(value, ['business_id','user_id','review_id','text'],["0"]) | |
",{batchSize: 10000, iterateList: true}); | |
CALL apoc.periodic.iterate(" | |
CALL apoc.load.json('file:///ssd/yelp/dataset/user.json') YIELD value RETURN value | |
"," | |
MERGE (u:User{id:value.user_id}) | |
SET u += apoc.map.clean(value, ['friends','user_id'],[]) | |
WITH u,value.friends as friends | |
UNWIND friends as friend | |
MERGE (u1:User{id:friend}) | |
MERGE (u)-[:FRIEND]-(u1) | |
",{batchSize: 100, iterateList: true}); | |
CALL apoc.periodic.iterate( | |
"MATCH (p1:User)-->(r1:Review)-->(:Business)<--(r2:Review)<--(p2:User) | |
where id(p1) < id(p2) | |
RETURN p1,p2,collect(r1.stars) as s1,collect(r2.stars) as s2", | |
"MERGE (p1)-[s:SIMILAR]-(p2) SET s.weight = apoc.algo.euclideanSimilarity(s1,s2)" | |
, {batchSize:10000, parallel:false,iterateList:true}); | |
MATCH (b:User) | |
RETURN avg(apoc.node.degree(b,'FRIEND')) as average_friends, | |
stdev(apoc.node.degree(b,'FRIEND')) as stdev_friends, | |
max(apoc.node.degree(b,'FRIEND')) as max_friends, | |
min(apoc.node.degree(b,'FRIEND')) as min_friends | |
MATCH (b:Business) | |
RETURN avg(apoc.node.degree(b,'REVIEWS')) as average_reviews, | |
stdev(apoc.node.degree(b,'REVIEWS')) as stdev_reviews, | |
max(apoc.node.degree(b,'REVIEWS')) as max_reviews, | |
min(apoc.node.degree(b,'REVIEWS')) as min_reviews | |
Hi,
Hello,
In my opinion there is a mistake in the original code.
On line 45, the address is missing in the MERGE (u)-[:FRIEND]-(u1) relationship.
This makes that in version 4.4.x the loading takes a very long time.
The same happens on line 53:
MERGE (p1)-[s:SIMILAR]-(p2) SET s.weight = apoc.algo.euclideanSimilarity(s1,s2).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Just wanted to say thank you for this. This really clarified how to use
apoc.periodic.iterate
andunwind
for me.