Skip to content

Instantly share code, notes, and snippets.

@ikwattro
Last active August 29, 2015 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ikwattro/1fe47a140709f510a44f to your computer and use it in GitHub Desktop.
Save ikwattro/1fe47a140709f510a44f to your computer and use it in GitHub Desktop.
Import des données RATP dans Neo4j

Creating indexes

CREATE INDEX ON :Stop(id);
CREATE INDEX ON :Route(id);
CREATE INDEX ON :Trip(id);

Loading Stops

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:c:/Users/willemsen.c/Downloads/ratp/sourcedata/full/stops.txt" AS csv
MERGE (:Stop {id:toInt(csv.stop_id), name:replace(csv.stop_name, '"',''), description:replace(csv.stop_desc,'"',''),
lat:toFloat(csv.stop_lat),lon:toFloat(csv.stop_lon)})

Loading Routes

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:c:/Users/willemsen.c/Downloads/ratp/sourcedata/full/routes.txt" AS csv
MERGE (:Route {id:toInt(csv.route_id), short_name:replace(csv.route_short_name,'"',''), long_name:replace(csv.route_long_name,'"',''), type:toInt(csv.route_type)})

Sanitizing Routes Long Names

MATCH (n:Route)
FOREACH (x in [n] | SET x.long_name = replace(x.long_name,'(',''));
MATCH (n:Route)
FOREACH (x in [n] | SET x.long_name = replace(x.long_name,')',''));

Importing Trips and relate them to Routes (Method 1)

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:c:/Users/willemsen.c/Downloads/ratp/sourcedata/full/trips.txt" AS csv
MATCH (r:Route {id:toInt(csv.route_id)})
MERGE (t:Trip {id:toInt(csv.trip_id), short_name:replace(csv.trip_short_name,'"','')})-[:DOING_ROUTE]->(r)

Other method in case of java heap size problems (Method 2)

First creating Trip nodes with a route_id property

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:c:/Users/willemsen.c/Downloads/ratp/sourcedata/full/trips.txt" AS csv
MERGE (t:Trip {id:toInt(csv.trip_id), short_name:replace(csv.trip_short_name,'"',''),route_id:toInt(csv.route_id)})

Then loading a set of nodes and create the relations between Trip and Route based on route_id property

Query doing 50000 iterations, SKIP and LIMIT to be modified on each incrementation

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:c:/Users/willemsen.c/Downloads/ratp/sourcedata/full/trips.txt" AS csv
WITH csv
SKIP 5
LIMIT 50000
MATCH (n:Route {id:toInt(csv.route_id)})
MERGE (t:Trip {id:toInt(csv.trip_id), short_name:replace(csv.trip_short_name,'"',''),route_id:toInt(csv.route_id)})
MERGE (t)-[:DOING_ROUTE]->(n)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment