Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AlessioDeAngelis/8a99043c74317b6c33f677c3dbb7038b to your computer and use it in GitHub Desktop.
Save AlessioDeAngelis/8a99043c74317b6c33f677c3dbb7038b to your computer and use it in GitHub Desktop.
APOC: Awesome Pathfinding (in a Street Graph) Of Course

APOC: Awesome Pathfinding (in a Street Graph) Of Course

It was my entry for the Neo4j GraphGist Challenge back in January 2016. I developed an algorithm in pure Cypher for finding the Shortest Weighted Path with a set of constraints, like time and number of nodes to be visited. The scenario was a street map modeled as a graph that Santa Claus had to traverse in order to deliver gifts to all the kids before they went to sleep. While interviewed by Rik Van Bruggen here http://blog.bruggen.com/2016/10/podcast-interview-with-alessio-de.html I promised him to help Santa Claus in his path findings, letting him know the existence of the brand new APOC procedures!

So here we are with this Graph Gist, let’s explore the Awesome Procedures On Cypher! :D

Maps graph model

The street map is modeled as graph in a similar fashion as the old post: the road junctions are nodes, while an edge is created when two or more of them are directly connected by a segment of a lane with no other intersection taking place. RoadJunction nodes have two additional properties: the latitude and longitude of the actual intersection in the map. In fact, in this Graph Gist we will extract real street map information from an Open Data service: OpenStreetMap. It is an open data map of the world, driven by an open source community of volunteers. It collects geocoded information about streets, mountain paths, cafè, restaurants and much more all over the globe.

A cool innovation that APOC brings to Neo4j is the import of data not only as CSV but from several formats, ranging from json to xml, as well as connecting directly to relational and NoSQL databases like MySQL, MongoDB or Cassandra. In this post we are going to query directly OpenStreetMap [https://www.openstreetmap.org] at the endpoint provided from overpass-api.de [http://overpass-api.de/] . Geographical data is returned as an XML. Such response contains a lot of useful information. For the sake of simplicity, in this Graph Gist we are just going to model the graph with the main ones, leaving the reader the possibility to extend it, going further the api and replicate in toto the real map as shown in the Open Street Map web service.

The xml tree of the document has two important elements among all: node and way. Node [http://wiki.openstreetmap.org/wiki/Node] elements represent a geocoded location in the map, together with its unique id, latitude and longitude. They can be used for modelling a RoadJunction vertex in the graph. An example is the following one:

<node id="257489174" lat="41.9157092" lon="12.4890966" version="5" timestamp="2013-06-03T17:42:13Z" changeset="16409930" uid="217070" user="Davio"/>

A way [http://wiki.openstreetmap.org/wiki/Way], instead, is an ordered list of nodes usually used for shaping streets. As a matter of fact, two close node elements within the way vector are actually joined together in the real world by a segment of a lane. You can therefore leverage on a way element for creating the graph relationship CONNECTED_TO between RoadJunction nodes. An example is the following one:

<way id="203479972" version="2" timestamp="2013-03-17T19:26:36Z" changeset="15400055" uid="217070" user="Davio">
  <nd ref="259282253"/>
  <nd ref="1809491662"/>
  <nd ref="259282254"/>
  <nd ref="1809491634"/>
  <nd ref="611831886"/>
  <nd ref="259282255"/>
  <nd ref="259282256"/>
  <tag k="electrified" v="contact_line"/>
  <tag k="frequency" v="0"/>
  <tag k="gauge" v="1435"/>
  <tag k="highway" v="tertiary"/>
  <tag k="lanes" v="3"/>
  <tag k="lanes:backward" v="1"/>
  <tag k="lanes:forward" v="2"/>
  <tag k="lit" v="yes"/>
  <tag k="maxspeed" v="50"/>
  <tag k="name" v="Viale delle Belle Arti"/>
  <tag k="railway" v="tram"/>
  <tag k="source:maxspeed" v="IT:urban"/>
  <tag k="tracks" v="2"/>
  <tag k="voltage" v="600"/>
</way>

Model setup with APOC in Neo4j

Let’s depict our street map graph in Neo4j taking advantage of APOC and the OpenStreetMap Overpass API! We will limit our analysis to a rectangular area of Rome city center, built around the points 12.4677,41.8883 and 12.4960,41.9026.

http://overpass-api.de/api/map?bbox=12.4677,41.8883,12.4960,41.9026

This region will generate 69589 RoadJunction nodes and 77535 relationships among them. You can enlarge or decrease the box boundaries to process more or less locations. I suggest you starting with a smaller dataset and incrementally add more nodes to the graph, repeating the queries with map regions next to the initial one. In fact, as far as I know, APOC load xml and json functions do not provide yet a mechanism like the one in the LOAD FROM CSV statement or in the jdbc counterpart (given by apoc.iterate) to process an entire dataset by dividing it into smaller portions, probably due to the tree-like nature of those formats that returns the response as a whole. Therefore the response obtained by querying a Web Server endpoint it is processed as a whole in a single transaction and the task of reducing the entire problem in smaller portions to be executed separately is left to the developer.

Firstly you should index the RoadJunction id property to speed up Merge and Match operations.

CREATE INDEX ON :RoadJunction(id)

Then you can run the following Cypher statement for creating RoadJunction nodes.

WITH 'http://overpass-api.de/api/map?bbox=12.4677,41.8883,12.4960,41.9026' as url
CALL apoc.load.xmlSimple(url)  YIELD value AS documentRoot
UNWIND documentRoot._node AS line
MERGE (roadJunction:RoadJunction {id: line.id})
SET roadJunction.longitude = toFloat(line.lon),
roadJunction.latitude = toFloat(line.lat)

Let’s explain the query a little bit.

CALL apoc.load.xmlSimple(url :: STRING?) :: (value :: MAP?) YIELD value AS document

This is the APOC procedure for loading xml data. It requires the datasource url and converts the xml into a single map datastructure, made of arrays of the xml root child elements, each one per type. Therefore the <node> elements will be mapped to the node array, while the <way> elements to the way array. Note that future releases may replace apoc.load.xmlSimple into apoc.load.xml(url, true). The UNWIND operation transforms a list into individual rows. Hence UNWIND documentRoot._node AS line will create as many lines as the elements in the initial list. Each of them will be used for merging the RoadJunction nodes by id, setting their latitude and longitude.

The following Cypher query can instead create the relationship CONNECTED TO between road junctions.

WITH 'http://overpass-api.de/api/map?bbox=12.4677,41.8883,12.4960,41.9026' as url
CALL apoc.load.xml(url) YIELD value AS catalog
UNWIND catalog._children AS line
WITH line
WHERE line._type = 'way'
//extract the ref property when it is present
WITH  line.id AS wayId, [junction IN line._children WHERE junction.ref IS NOT NULL | junction.ref] AS junctionsInTheWay
WITH apoc.coll.pairsMin(junctionsInTheWay) AS junctionCouples
UNWIND junctionCouples AS junctionCouple
WITH junctionCouple[0] AS startNodeId , junctionCouple[1] AS endNodeId
MATCH (junction1:RoadJunction {id: startNodeId})
MATCH (junction2:RoadJunction {id: endNodeId})
MERGE (junction1)-[connection:CONNECTED_TO]-(junction2)
SET connection.distance = apoc.math.round(distance(
point({ longitude: junction1.longitude, latitude: junction1.latitude }),
point({ longitude: junction2.longitude, latitude: junction2.latitude })), 2);

The query deals with the way elements. They provide the information about which geocoded points are connected each other by a lane segment, id est the way children that are close to each other within the list. In order to do that we need to extract the ref property, that corresponds to the RoadJunction id already stored. The provided statement helps in achieving this.

WITH  line.id AS wayId, [junction IN line._children WHERE junction.ref IS NOT NULL | junction.ref] AS junctionsInTheWay

For each way, in a single concise notation provided by Cypher, we are iterating over the children, extracting the ref property and filtering out those that do not have it. A sample result at this execution point is reported in the Table 1.

Table 1. Arrays of road junctions belonging to different ways
wayId junctionsInTheWay

4245319

[25388289, 647882190]

4245321

[25388192, 25388290, 2418920678]

4245322

[25388187, 1417674218, 25388186, 1417674282, 25388185, 1277699809, 1277699812]

4245323

[25388184, 1277699807, 1417674354, 755692836, 1417674559, 388457539, 1417674714, 4024643294, 4024643295, 4024643293, 257612287]

4245329

[25388279,25388184]

WITH apoc.coll.pairsMin(junctionsInTheWay) AS junctionCouples

The line above leverages on an util APOC function on collections. It couples array elements that are next to each other, thus transforming an input collection into an array of pairs. Table 2 lets you understand better the pair transformation applied to the Table 1 collections.

Table 2. Pairs of RoadJunction id that are connected by a segment of a lane without any other intersection happening
wayId junctionsInTheWay

4245319

[[25388289, 647882190]]

4245321

[[25388192, 25388290], [25388290, 2418920678]]

4245322

[[25388187, 1417674218], [1417674218, 25388186], [25388186, 1417674282], [1417674282, 25388185], [25388185, 1277699809], [1277699809, 1277699812]]

4245323

[[25388184, 1277699807], [1277699807, 1417674354], [1417674354, 755692836], [755692836, 1417674559], [1417674559, 388457539], [388457539, 1417674714], [1417674714, 4024643294], [4024643294, 4024643295], [4024643295, 4024643293], [4024643293, 257612287]]

4245329

[[25388279, 25388184]]

UNWIND junctionCouples AS junctionCouple

Let’s unwind those pair arrays in order to have a junction couple per line

WITH junctionCouple[0] AS startNodeId , junctionCouple[1] AS endNodeId

The element at index 0 in the pair is source node id, while the one at position 1 is the target node id.

MATCH (junction1:RoadJunction {id: startNodeId})
MATCH (junction2:RoadJunction {id: endNodeId})
MERGE (junction1)-[connection:CONNECTED_TO]-(junction2)
SET connection.distance = apoc.math.round(distance(
point({ longitude: junction1.longitude, latitude: junction1.latitude }),
point({ longitude: junction2.longitude, latitude: junction2.latitude })), 2);

Finally we can create (MERGE) the relationship CONNECTED TO between all RoadJunction couples. Those edges have a property called distance that represents the actual length in meters of the lane segment connecting. You can calculate it by applying the Haversine distance function to the latitude and longitude of each couple. Such method is provided directly from Cypher. The float result is then rounded down to two decimal positions through the APOC round function.

Now your street map is modelled as a graph!

If you wish to replicate the scenario already occuring in previous Santa’s GraphGist, you can have fun placing some kid in the map.

MATCH (roadJunction:RoadJunction)
WITH collect(roadJunction) as allJunctions
WITH apoc.coll.randomItems(allJunctions, 3) as randomJunctions
UNWIND randomJunctions AS kid
SET kid:Kid, kid.bedtime = 10000
RETURN kid

This statement collects all the RoadJunction nodes (if you have a lot of them you could limit the result) into an array. Thanks to apoc.coll.randomItems you can randomly pick three of them and apply the Kid label. Otherwise place those kids wherever you prefer, the random pick may ne too random! :D In this dataset you could for example place kids at the RoadJunction nodes with id 1553123159 and 1377241284, since there are paths connecting them.

WITH   ['1553123159', '1377241284'] AS ids
UNWIND ids as kidId
MATCH (n:RoadJunction {id : kidId}) SET n:Kid, n.bedtime = 10000 return n;

Thanks to a graph visualization tool like Linkurious you could verify the success of your entire modeling, setting down those geocoded nodes in a real map!

street_graph

Walking the map

APOC library provides a bunch of path finding functions, like Dikjstra or A* to be used directly in your Cypher queries. They used to be only available in Neo4j driver and not in cypher query language. Santa Claus can now take advantage of them for his Christmas Gift deliveries! What I really like of them are the easiness of usage and the execution time for finding the shortest weighted path, especially considering the fact that in the comments the developer said they are not fully optimized yet! I am not going deep into the theory of those algorithm, you can find plenty of credited authors in literature and all over the web. I will just show you how to successfully use them in this Santa Claus path finding scenario!

MATCH (from:Kid), (to:Kid)
CALL apoc.algo.dijkstra(from, to, 'CONNECTED_TO', 'distance') YIELD path AS path, weight AS weight
RETURN path, weight

This algorithm takes a Kid as a source and another one as a target and navigates the street map graph trough the CONNECTED_TO relationship in order to calculate the path between them minimizing the sum of distance property. The shortest path function already offered by Cypher, instead, returns the path that needs less hops to connect two nodes. Therefore, the shortest path may not be the lightest one, since those functions are built on two different graph properties. Calculating the shortest (less hops) path through the following query lets you understand better what I am saying.

MATCH path =
shortestPath((from {id: '1553123159'})-[:CONNECTED_TO*]-(to {id:'1377241284'}))
RETURN path

A* is another notorious path finding algorithm. You can find it in a large variety of contexts: for instance, I once used it in while building a mobile videogame for moving non playable characters in the map!

MATCH (from:Kid), (to:Kid)
CALL apoc.algo.aStar(from, to, 'CONNECTED_TO', 'distance','latitude', 'longitude') YIELD path AS path, weight AS weight
RETURN path, weight

You should pass six parameters to the aStar function: the starting node (from Kid), the ending node (to Kid), the relationship types to traverse (CONNECTED_TO), the property to use for the cost function (distance) and the properties for the estimation function (latitude and longitude since a Geo Estimator function is implemented by default in this A* version).

Conclusions and future works

Unfortunately what I didn’t manage to reach with just the plain APOC library and the provided path finding functions is a shortest weighted path that takes into accounts more constraints than just the time/distance weight. For example in Santa’s Gist the best path had to fulfill some additional requirements, like the fact that the gifts had to be delivered to each kid before they fell asleep (bedtime attribute on Kid nodes) and the traverse had to include all the kids! It is a too fixed scenario, probably letting a developer build the aStar estimate function directly in cypher could be a nice introduction to try solving it!

What I really liked about APOC is the wider range of functions provided, giving Cypher a real boost and expressivity, all in a concise and fast way! For instance, you could look at the pair and randomItems functions reported here in this post.

Thanks for reading up to here!

All the best,

Alessio

@jexp
Copy link

jexp commented Apr 27, 2017

@AlessioDeAngelis Wow this is really amazing. I'm super impressed by your knowledge and application of APOC and Cypher.
Really well done.

Good point about weighted dijkstra, Rik had that request too and we wanted to solve it by passing in an arbitrary cypher expression which is evaluated as cost (possibly with compiled runtime if available).

Something that would be cool to show is how to combine apoc.periodic.iterate with load xml in two ways

  1. just load a larger rectangle but then pass the junctions or pairs to iterate to be batched over
  2. have 2 nested unwind range(0,360,1) as lat control the rectangle that you load (with some minor overlap) and be those rectangles the input for iterate and within iterate do the actual loading of a segment

@AlessioDeAngelis
Copy link
Author

@jexp
Thank you for your comment! I will surely give a try to your suggestion! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment