AlessioDeAngelis/apoc_challenge_gist_apoc_challenge.adoc

## apoc_challenge_gist_apoc_challenge.adoc

      
    Raw
  

              apoc_challenge_gist_apoc_challenge.adoc
            
          
    APOC: Awesome Pathfinding (in a Street Graph) Of Course


Hello everybody!
Do you remember this graphgist?
https://neo4j.com/graphgist/92064364-ae62-405b-bfa7-a46e5e85e22b#listing_category=holidays


It was my entry for the Neo4j GraphGist Challenge back in January 2016. I developed an algorithm in pure Cypher for finding
 the Shortest Weighted Path with a set of constraints, like time and number of nodes to be visited. The scenario was
 a street map modeled as a graph that Santa Claus had to traverse in order to deliver gifts to all the kids before they went to sleep.
While interviewed by Rik Van Bruggen here http://blog.bruggen.com/2016/10/podcast-interview-with-alessio-de.html I promised him
to help Santa Claus in his path findings, letting him know the existence of the brand new APOC procedures!


So here we are with this Graph Gist, let’s explore the Awesome Procedures On Cypher! :D


Maps graph model


The street map is modeled as graph in a similar fashion as the old post:
the road junctions are nodes, while an edge is created when two or more of them are directly connected by a segment
of a lane with no other intersection taking place. RoadJunction nodes have two additional properties: the latitude and longitude
of the actual intersection in the map. In fact, in this Graph Gist we will extract real street map information from an Open Data
service: OpenStreetMap. It is an open data map of the world, driven by an open source community of volunteers.
It collects geocoded information about streets, mountain paths, cafè, restaurants and much more all over the globe.


A cool innovation that APOC brings to Neo4j is the import of data not only as CSV but from several formats,
ranging from json to xml, as well as connecting directly to relational and NoSQL databases like MySQL, MongoDB or Cassandra.
In this post we are going to query directly OpenStreetMap [https://www.openstreetmap.org] at the endpoint provided from overpass-api.de
[http://overpass-api.de/] . Geographical data is returned as an XML. Such response contains a lot of useful information.
For the sake of simplicity, in this Graph Gist we are just going to model the graph with the main ones,
leaving the reader the possibility to extend it, going further the api and replicate in toto the real map as shown in the Open Street Map
web service.


The xml tree of the document has two important elements among all: node and way. Node [http://wiki.openstreetmap.org/wiki/Node] elements represent
a geocoded location in the map, together with its unique id, latitude and longitude.
They can be used for modelling a RoadJunction vertex in the graph.
An example is the following one:


<node id="257489174" lat="41.9157092" lon="12.4890966" version="5" timestamp="2013-06-03T17:42:13Z" changeset="16409930" uid="217070" user="Davio"/>


A way [http://wiki.openstreetmap.org/wiki/Way], instead, is an ordered list of nodes usually used for shaping streets.
As a matter of fact, two close node elements within the way vector are actually joined together in the real world by a segment of a lane.
You can therefore leverage on a way element for creating the graph relationship CONNECTED_TO between RoadJunction nodes.
An example is the following one:


<way id="203479972" version="2" timestamp="2013-03-17T19:26:36Z" changeset="15400055" uid="217070" user="Davio">
  <nd ref="259282253"/>
  <nd ref="1809491662"/>
  <nd ref="259282254"/>
  <nd ref="1809491634"/>
  <nd ref="611831886"/>
  <nd ref="259282255"/>
  <nd ref="259282256"/>
  <tag k="electrified" v="contact_line"/>
  <tag k="frequency" v="0"/>
  <tag k="gauge" v="1435"/>
  <tag k="highway" v="tertiary"/>
  <tag k="lanes" v="3"/>
  <tag k="lanes:backward" v="1"/>
  <tag k="lanes:forward" v="2"/>
  <tag k="lit" v="yes"/>
  <tag k="maxspeed" v="50"/>
  <tag k="name" v="Viale delle Belle Arti"/>
  <tag k="railway" v="tram"/>
  <tag k="source:maxspeed" v="IT:urban"/>
  <tag k="tracks" v="2"/>
  <tag k="voltage" v="600"/>
</way>


Model setup with APOC in Neo4j


Let’s depict our street map graph in Neo4j taking advantage of APOC and the OpenStreetMap Overpass API!
We will limit our analysis to a rectangular area of Rome city center, built around the points 12.4677,41.8883 and 12.4960,41.9026.


http://overpass-api.de/api/map?bbox=12.4677,41.8883,12.4960,41.9026


This region will generate 69589 RoadJunction nodes and 77535 relationships among them. You can enlarge or decrease the box
boundaries to process more or less locations. I suggest you starting with a smaller dataset and incrementally add more nodes
to the graph, repeating the queries with map regions next to the initial one.
In fact, as far as I know, APOC load xml and json functions do not provide yet a mechanism like the one in the
 LOAD FROM CSV statement or in the jdbc counterpart (given by apoc.iterate) to process an entire dataset by dividing it into smaller portions,
 probably due to the tree-like nature of those formats that returns the response as a whole.
Therefore the response obtained by querying a Web Server endpoint it is processed as a whole
in a single transaction and the task of reducing the entire problem in smaller portions to be executed separately is left to the developer.


Firstly you should index the RoadJunction id property to speed up Merge and Match operations.


CREATE INDEX ON :RoadJunction(id)


Then you can run the following Cypher statement for creating RoadJunction nodes.


WITH 'http://overpass-api.de/api/map?bbox=12.4677,41.8883,12.4960,41.9026' as url
CALL apoc.load.xmlSimple(url)  YIELD value AS documentRoot
UNWIND documentRoot._node AS line
MERGE (roadJunction:RoadJunction {id: line.id})
SET roadJunction.longitude = toFloat(line.lon),
roadJunction.latitude = toFloat(line.lat)


Let’s explain the query a little bit.


CALL apoc.load.xmlSimple(url :: STRING?) :: (value :: MAP?) YIELD value AS document


This is the APOC procedure for loading xml data. It requires the datasource url and converts the xml into a single map datastructure,
made of arrays of the xml root child elements, each one per type. Therefore the <node> elements
will be mapped to the node array, while the <way> elements to the way array. Note that future releases may replace apoc.load.xmlSimple
into apoc.load.xml(url, true).
The UNWIND operation transforms a list into individual rows. Hence UNWIND documentRoot._node AS line will create as many
lines as the elements in the initial list. Each of them will be used for merging the RoadJunction nodes by id, setting their
latitude and longitude.


The following Cypher query can instead create the relationship CONNECTED TO between road junctions.


WITH 'http://overpass-api.de/api/map?bbox=12.4677,41.8883,12.4960,41.9026' as url
CALL apoc.load.xml(url) YIELD value AS catalog
UNWIND catalog._children AS line
WITH line
WHERE line._type = 'way'
//extract the ref property when it is present
WITH  line.id AS wayId, [junction IN line._children WHERE junction.ref IS NOT NULL | junction.ref] AS junctionsInTheWay
WITH apoc.coll.pairsMin(junctionsInTheWay) AS junctionCouples
UNWIND junctionCouples AS junctionCouple
WITH junctionCouple[0] AS startNodeId , junctionCouple[1] AS endNodeId
MATCH (junction1:RoadJunction {id: startNodeId})
MATCH (junction2:RoadJunction {id: endNodeId})
MERGE (junction1)-[connection:CONNECTED_TO]-(junction2)
SET connection.distance = apoc.math.round(distance(
point({ longitude: junction1.longitude, latitude: junction1.latitude }),
point({ longitude: junction2.longitude, latitude: junction2.latitude })), 2);


The query deals with the way elements. They provide the information about which geocoded points are connected
each other by a lane segment, id est the way children that are close to each other within the list. In order to do that we need to extract
the ref property, that corresponds to the RoadJunction id already stored.
The provided statement helps in achieving this.


WITH  line.id AS wayId, [junction IN line._children WHERE junction.ref IS NOT NULL | junction.ref] AS junctionsInTheWay


For each way, in a single concise notation provided by Cypher, we are iterating over the children, extracting the ref property and filtering
out those that do not have it.
A sample result at this execution point is reported in the Table 1.


Table 1. Arrays of road junctions belonging to different ways


wayId
junctionsInTheWay


4245319
[25388289, 647882190]


4245321
[25388192, 25388290, 2418920678]


4245322
[25388187, 1417674218, 25388186, 1417674282, 25388185, 1277699809, 1277699812]


4245323
[25388184, 1277699807, 1417674354, 755692836, 1417674559, 388457539, 1417674714, 4024643294, 4024643295, 4024643293, 257612287]


4245329
[25388279,25388184]


WITH apoc.coll.pairsMin(junctionsInTheWay) AS junctionCouples


The line above leverages on an util APOC function on collections. It couples array elements that are next to each other,
thus transforming an input collection into an array of pairs. Table 2 lets you understand better the pair transformation applied
 to the Table 1 collections.


Table 2. Pairs of RoadJunction id that are connected by a segment of a lane without any other intersection happening


wayId
junctionsInTheWay


4245319
[[25388289, 647882190]]


4245321
[[25388192, 25388290], [25388290, 2418920678]]


4245322
[[25388187, 1417674218], [1417674218, 25388186], [25388186, 1417674282], [1417674282, 25388185], [25388185, 1277699809], [1277699809, 1277699812]]


4245323
[[25388184, 1277699807], [1277699807, 1417674354], [1417674354, 755692836], [755692836, 1417674559], [1417674559, 388457539], [388457539, 1417674714], [1417674714, 4024643294], [4024643294, 4024643295], [4024643295, 4024643293], [4024643293, 257612287]]


4245329
[[25388279, 25388184]]


UNWIND junctionCouples AS junctionCouple


Let’s unwind those pair arrays in order to have a junction couple per line


WITH junctionCouple[0] AS startNodeId , junctionCouple[1] AS endNodeId


The element at index 0 in the pair is source node id, while the one at position 1 is the target node id.


MATCH (junction1:RoadJunction {id: startNodeId})
MATCH (junction2:RoadJunction {id: endNodeId})
MERGE (junction1)-[connection:CONNECTED_TO]-(junction2)
SET connection.distance = apoc.math.round(distance(
point({ longitude: junction1.longitude, latitude: junction1.latitude }),
point({ longitude: junction2.longitude, latitude: junction2.latitude })), 2);


Finally we can create (MERGE) the relationship CONNECTED TO between all RoadJunction couples.
Those edges have a property called distance that represents the actual length in meters of the lane segment connecting.
You can calculate it by applying the Haversine distance function to the latitude and longitude of each couple. Such method
is provided directly from Cypher. The float result is then rounded down to two decimal positions through the APOC round function.


Now your street map is modelled as a graph!


If you wish to replicate the scenario already occuring in previous Santa’s GraphGist, you can have fun placing some kid in the map.


MATCH (roadJunction:RoadJunction)
WITH collect(roadJunction) as allJunctions
WITH apoc.coll.randomItems(allJunctions, 3) as randomJunctions
UNWIND randomJunctions AS kid
SET kid:Kid, kid.bedtime = 10000
RETURN kid


This statement collects all the RoadJunction nodes (if you have a lot of them you could limit the result) into an array.
Thanks to apoc.coll.randomItems you can randomly pick three of them and apply the Kid label.
Otherwise place those kids wherever you prefer, the random pick may ne too random! :D
In this dataset you could for example place kids at the RoadJunction nodes with id 1553123159 and 1377241284, since there
are paths connecting them.


WITH   ['1553123159', '1377241284'] AS ids
UNWIND ids as kidId
MATCH (n:RoadJunction {id : kidId}) SET n:Kid, n.bedtime = 10000 return n;


Thanks to a graph visualization tool like Linkurious you could verify the success of your entire modeling, setting down those geocoded nodes
in a real map!


Walking the map


APOC library provides a bunch of path finding functions, like Dikjstra or A* to be used directly in your Cypher
queries. They used to be only available in Neo4j driver and not in cypher query language.
Santa Claus can now take advantage of them for his Christmas Gift deliveries!
What I really like of them are the easiness of usage and the execution time for finding the shortest weighted path, especially
considering the fact that in the comments the developer said they are not fully optimized yet!
I am not going deep into the theory of those algorithm, you can find plenty of credited authors in literature and all over the web.
I will just show you how to successfully use them in this Santa Claus path finding scenario!


MATCH (from:Kid), (to:Kid)
CALL apoc.algo.dijkstra(from, to, 'CONNECTED_TO', 'distance') YIELD path AS path, weight AS weight
RETURN path, weight


This algorithm takes a Kid as a source and another one as a target and navigates the street map graph trough the
CONNECTED_TO relationship in order to calculate the path between them minimizing the sum of distance property.
The shortest path function already offered by Cypher, instead, returns the path that needs less hops to connect two nodes.
Therefore, the shortest path may not be the lightest one, since those functions are built on two different graph properties.
Calculating the shortest (less hops) path through the following query lets you understand better what I am saying.


MATCH path =
shortestPath((from {id: '1553123159'})-[:CONNECTED_TO*]-(to {id:'1377241284'}))
RETURN path


A* is another notorious path finding algorithm. You can find it in a large variety of contexts: for instance, I once used it in
 while building a mobile videogame for moving non playable characters in the map!


MATCH (from:Kid), (to:Kid)
CALL apoc.algo.aStar(from, to, 'CONNECTED_TO', 'distance','latitude', 'longitude') YIELD path AS path, weight AS weight
RETURN path, weight


You should pass six parameters to the aStar function: the starting node (from Kid), the ending node (to Kid), the relationship
types to traverse (CONNECTED_TO),
the property to use for the cost function (distance) and  the properties for the estimation function (latitude and longitude
since a Geo Estimator function is implemented by default in this A* version).

Conclusions and future works


Unfortunately what I didn’t manage to reach with just the plain APOC library and the provided path finding functions
 is a shortest weighted path that takes into accounts
more constraints than just the time/distance weight. For example in Santa’s Gist the best path had to fulfill some additional
requirements, like the fact that the gifts had to be delivered to each kid before they fell asleep (bedtime attribute
on Kid nodes) and the traverse had to include all the kids! It is a too fixed scenario, probably letting a developer build the aStar estimate function
directly in cypher could be a nice introduction to try solving it!


What I really liked about APOC is the wider range of functions provided, giving Cypher a real boost and expressivity,
 all in a concise and fast way! For instance, you could look at the pair and randomItems functions reported here in this post.


Thanks for reading up to here!


All the best,


Alessio
wayId	junctionsInTheWay
4245319	[25388289, 647882190]
4245321	[25388192, 25388290, 2418920678]
4245322	[25388187, 1417674218, 25388186, 1417674282, 25388185, 1277699809, 1277699812]
4245323	[25388184, 1277699807, 1417674354, 755692836, 1417674559, 388457539, 1417674714, 4024643294, 4024643295, 4024643293, 257612287]
4245329	[25388279,25388184]
wayId	junctionsInTheWay
4245319	[[25388289, 647882190]]
4245321	[[25388192, 25388290], [25388290, 2418920678]]
4245322	[[25388187, 1417674218], [1417674218, 25388186], [25388186, 1417674282], [1417674282, 25388185], [25388185, 1277699809], [1277699809, 1277699812]]
4245323	[[25388184, 1277699807], [1277699807, 1417674354], [1417674354, 755692836], [755692836, 1417674559], [1417674559, 388457539], [388457539, 1417674714], [1417674714, 4024643294], [4024643294, 4024643295], [4024643295, 4024643293], [4024643293, 257612287]]
4245329	[[25388279, 25388184]]