johan--/neo4j-contact networks

## neo4j-contact networks
:neo4j-version: 2.2
:author: Marcello Dalponte
:twitter: @m_dalp

## Contact networks

A contact network is one of the typologies of dynamic networks described in the famous paper of Holme and Saramäki[1].
A dynamic network is a mathematical representation of the evolution of the relationships between entities through graphs.
It's interesting to catch this dynamicity because describes the transitivity of the order of the interactions: if A meets B and after B meets C then A can transmit something to C but not the opposite; this is useful to study the spreading of informations or infections.
An interesting project that aims to collect this kind of data is Sociopatterns[2], where Cattuto and others developed a framework to collect contacts data from RFID sensors [3].

### Definition of contact
A contact is the interaction between two actors in a specific time lapse (frame).

### Characteristic measures
These networks can be measured in many ways. Holme and Saramäki show us some [1], like

- connectivity and components,
- distances,
- latencies,
- fastest paths,
- and others.

But there are specific measures that really characterize the dynamic of this kind of network.
In the literature Xuan and others present some of them called journeys [5]. Those are paths with innate notion of time,  where every edge connects nodes from adjacent frames.
Three kind of journeys are:

- fastest journey: the journey taken over all journeys between two nodes such that the journey time is the minimal,
- fastest journeys tree: the collection of fastest journeys between a certain node and all the others,
- foremost journey: the first journey between two nodes where the arrival time is the earliest,
- foremost journeys tree: the collection of foremost journeys between a certain node and all the others.

## Previous model
In the literature there is a work of Cattuto and others that explains how to model a contact network with Neo4j [4].
They represent the network with mainly three kind of nodes: ACTOR, INTERACTION, FRAME. Each actor represents the owner of an RFID, each interaction represents a link between two actors and each frame is a time interval. Interactions are linked with relative Actor nodes and, together with frame nodes, are the key of the representation of a contact. Indeed each contact above defined is a relationship between an interaction and the frame where it was active.
They also presented an interesting way to navigate the frames through a tree of human readable temporal units (hours, days, etc.), but here is not the topic.

### Limits
Such representation is pretty complete and interesting, however I encountered some problems when I wanted to describe characteristic patterns in temporal networks with Cypher queries. For instance I have not been able to find any kind of journey.

## My model
My model solves the problems I encoutered with the previous model. I use two kind of nodes: ACTOR and CONTACT, where the first describes the interacting subject and the second is exactly the definition of contact given before (actor1, actor2, timestamp). Each contact is linked with the actors partecipating and with the latest previous contacts of its actors.
With this representation is possible to track the paths through time.

### Initial Data Setup
'''
//setup
[source,cypher]

----

create index on :`Contact`(`a1`);
create index on :`Contact`(`a2`);
create index on :`Contact`(`t`);
create constraint on (n:`Actor`) assert n.`id` is unique;
create (_12:`Actor` {`id`:0, `name`:"actor0"}),
(_13:`Actor` {`id`:2, `name`:"actor2"}),
(_14:`Contact` {`a1`:0, `a2`:2, `t`:0}),
(_15:`Actor` {`id`:4, `name`:"actor4"}),
(_16:`Contact` {`a1`:0, `a2`:4, `t`:1}),
(_17:`Actor` {`id`:1, `name`:"actor1"}),
(_18:`Contact` {`a1`:1, `a2`:2, `t`:1}),
(_19:`Contact` {`a1`:1, `a2`:4, `t`:2}),
(_20:`Actor` {`id`:3, `name`:"actor3"}),
(_21:`Contact` {`a1`:2, `a2`:3, `t`:2}),
(_22:`Contact` {`a1`:1, `a2`:2, `t`:2}),
(_23:`Contact` {`a1`:2, `a2`:3, `t`:3}),
(_24:`Contact` {`a1`:1, `a2`:3, `t`:4}),
(_25:`Contact` {`a1`:0, `a2`:4, `t`:4}),
(_26:`Contact` {`a1`:1, `a2`:4, `t`:4}),
_12-[:`ACTOR`]->_14,
_12-[:`ACTOR`]->_16,
_12-[:`ACTOR`]->_25,
_13-[:`ACTOR`]->_18,
_13-[:`ACTOR`]->_14,
_13-[:`ACTOR`]->_21,
_13-[:`ACTOR`]->_22,
_13-[:`ACTOR`]->_23,
_14-[:`NEXT_CONTACT`]->_18,
_14-[:`NEXT_CONTACT`]->_16,
_15-[:`ACTOR`]->_26,
_15-[:`ACTOR`]->_25,
_15-[:`ACTOR`]->_19,
_15-[:`ACTOR`]->_16,
_16-[:`NEXT_CONTACT`]->_25,
_16-[:`NEXT_CONTACT`]->_19,
_17-[:`ACTOR`]->_18,
_17-[:`ACTOR`]->_22,
_17-[:`ACTOR`]->_19,
_17-[:`ACTOR`]->_26,
_17-[:`ACTOR`]->_24,
_18-[:`NEXT_CONTACT`]->_22,
_18-[:`NEXT_CONTACT`]->_21,
_18-[:`NEXT_CONTACT`]->_19,
_19-[:`NEXT_CONTACT`]->_24,
_19-[:`NEXT_CONTACT`]->_26,
_19-[:`NEXT_CONTACT`]->_25,
_20-[:`ACTOR`]->_24,
_20-[:`ACTOR`]->_23,
_20-[:`ACTOR`]->_21,
_21-[:`NEXT_CONTACT`]->_23,
_22-[:`NEXT_CONTACT`]->_26,
_22-[:`NEXT_CONTACT`]->_24,
_22-[:`NEXT_CONTACT`]->_23,
_23-[:`NEXT_CONTACT`]->_24

----

'''
Graph is shown as the static image below to enlight the shape of data. The label of each contact is the time frame when it took place:

image::https://www.dropbox.com/s/m51sgf92j78w9by/Contact%20Network%20Model%20-%20neo4j.png?dl=1[]

## Set of useful queries
To show the goodness of this model I selected some example queries where some are specific of temporal networks to find journeys and others are just for exercise.

### Temporal network measures

#### Fastest journey
[source,cypher]

----

MATCH (a1 :Actor{id:4})-[:ACTOR]->(c:Contact),(a2 :Actor{id:3})-[:ACTOR]->(c1 :Contact),
p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
WITH collect(p) as ps, min(length(p)) as l
RETURN FILTER(p in ps WHERE LENGTH(p) = l) as PATHS, l AS LENGTH

----

//table

#### Fastest journeys tree
[source,cypher]

----

MATCH (a1 :Actor{id:0})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
p=shortestPath((c {t:0})-[:NEXT_CONTACT*0..]->(c1))
WHERE a1 <> a2
WITH DISTINCT a2, collect(p) as ps, min(length(p)) as l
RETURN a2.id AS actorReached, FILTER(p in ps WHERE LENGTH(p) = l) as PATHS, l AS LENGTH
ORDER BY LENGTH

----

//table

#### Foremost journey
[source,cypher]

----

MATCH (a1 :Actor{id:1})-[:ACTOR]->(c:Contact),(a2 :Actor{id:3})-[:ACTOR]->(c1 :Contact),
p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
WITH collect(p) as ps, min(c1.t) as t
RETURN FILTER(p in ps WHERE LAST(NODES(p)).t = t) as PATHS, t AS TIME

----

//table

#### Foremost journeys tree
[source,cypher]

----

MATCH (a1 :Actor{id:0})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
p=shortestPath((c {t:0})-[:NEXT_CONTACT*0..]->(c1))
WHERE a1 <> a2
WITH DISTINCT a2, collect(p) as ps, min(c1.t) as t
RETURN a2.id AS actorReached, FILTER(p in ps WHERE LAST(NODES(p)).t = t) as PATHS, t AS TIME
ORDER BY TIME

----

//table

### Others
#### All actors infected by an actor
[source,cypher]

----

MATCH (a1:Actor {id:2})-[:ACTOR]->(c:Contact)-[:NEXT_CONTACT*0..]->(:Contact)<-[:ACTOR]-(a2:Actor)
WHERE a1<>a2
RETURN DISTINCT a2

----

//graph_result

#### All actors infected by an actor starting from a certain frame
[source,cypher]

----

MATCH (a1 :Actor{id:2})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
p=shortestPath((c {t:3})-[:NEXT_CONTACT*0..]->(c1))
WHERE a1 <> a2
RETURN DISTINCT a2

----

//graph_result

#### Smallest number of contacts to infect all others starting from a specific actor
Here I show the _id_ of the actor reached and the number (_minFrame_) of the first frame
in which he could be infected.
[source,cypher]

----

MATCH (a1 :Actor{id:1})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
WHERE a1 <> a2
RETURN a2.id AS actorReached, min(c1.t) as minFrame
ORDER BY minFrame

----

//table

#### Average number of contacts separing a specific actor from those he can reach, starting from a specific frame
Here is returned the number of actors reached by a specific one and the average length of the paths.
[source,cypher]

----

MATCH (a1 :Actor{id:4})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
WHERE a1 <> a2 AND c.t >= 0
WITH DISTINCT a2, collect(p) as ps, min(length(p)) as l
RETURN COUNT(a2) as NumReached, AVG(l) AS AvgLENGTH

----

//table

#### Count of the actors reachable from each actor and the average length of the contact path
[source,cypher]

----

MATCH (a1 :Actor)-[:ACTOR]->(c :Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
WHERE a1 <> a2
WITH DISTINCT a1, a2, MIN(LENGTH(p)) AS minStep
RETURN a1.id AS Actor, COUNT(a2) AS actorReached, AVG(minStep) as avgStep

----

//table


## Extensions

My work is a nice way to represent contact networks with Neo4j. I think that would be interesting to extend my work in many directions.

### Use of INTERACTION nodes like Cattuto and others [4]

My model is really useful to investigate temporal features of networks but I cannot be able to define queries to study the features of aggregated portions of the graph. I think that the only way to do this would be to introduce the use of INTERACTION nodes to represent the relationship between two actors in the whole network. These are linked to all the contacts where the interaction took place.

### Performance analysis

I tryied this model on my old battle laptop and I did not considered performances because it would be meaningless. However I think that a performance study would be great to show its usability.

### Real data examples

To enrich the value of this work would be interesting to perform analysis on real data.

### Extension of the query set

The set of query that I presented is not exhaustive and would be nice to extend the query set with other useful measures from literature.

### Visualization

To show this data I used the Neo4j visualization that is great
to display bidimensional graphs but not precise dealing with
"multidimensional graph" like temporal networks.
It would be interesting to implement a visualization to aggregate graphs
according to some definition to help displaying data. For instance
in my case could be nice to aggregate per frame.

[1]: http://arxiv.org/pdf/1108.1780v2.pdf "Temporal Networks"

[2]: http://www.sociopatterns.org/ "Sociopatterns"

[3]: http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011596&representation=PDF "Sociopatterns framework description"

[4]: http://event.cwi.nl/grades2013/11-averbuch.pdf "Time-varying networks in neo4j"

[5]: https://hal.inria.fr/inria-00071996/document "Computing shortest, fastest, and foremost journeys in dynamic networks"
	:neo4j-version: 2.2
	:author: Marcello Dalponte
	:twitter: @m_dalp

	## Contact networks

	A contact network is one of the typologies of dynamic networks described in the famous paper of Holme and Saramäki[1].
	A dynamic network is a mathematical representation of the evolution of the relationships between entities through graphs.
	It's interesting to catch this dynamicity because describes the transitivity of the order of the interactions: if A meets B and after B meets C then A can transmit something to C but not the opposite; this is useful to study the spreading of informations or infections.
	An interesting project that aims to collect this kind of data is Sociopatterns[2], where Cattuto and others developed a framework to collect contacts data from RFID sensors [3].

	### Definition of contact
	A contact is the interaction between two actors in a specific time lapse (frame).

	### Characteristic measures
	These networks can be measured in many ways. Holme and Saramäki show us some [1], like

	- connectivity and components,
	- distances,
	- latencies,
	- fastest paths,
	- and others.

	But there are specific measures that really characterize the dynamic of this kind of network.
	In the literature Xuan and others present some of them called journeys [5]. Those are paths with innate notion of time, where every edge connects nodes from adjacent frames.
	Three kind of journeys are:

	- fastest journey: the journey taken over all journeys between two nodes such that the journey time is the minimal,
	- fastest journeys tree: the collection of fastest journeys between a certain node and all the others,
	- foremost journey: the first journey between two nodes where the arrival time is the earliest,
	- foremost journeys tree: the collection of foremost journeys between a certain node and all the others.

	## Previous model
	In the literature there is a work of Cattuto and others that explains how to model a contact network with Neo4j [4].
	They represent the network with mainly three kind of nodes: ACTOR, INTERACTION, FRAME. Each actor represents the owner of an RFID, each interaction represents a link between two actors and each frame is a time interval. Interactions are linked with relative Actor nodes and, together with frame nodes, are the key of the representation of a contact. Indeed each contact above defined is a relationship between an interaction and the frame where it was active.
	They also presented an interesting way to navigate the frames through a tree of human readable temporal units (hours, days, etc.), but here is not the topic.

	### Limits
	Such representation is pretty complete and interesting, however I encountered some problems when I wanted to describe characteristic patterns in temporal networks with Cypher queries. For instance I have not been able to find any kind of journey.

	## My model
	My model solves the problems I encoutered with the previous model. I use two kind of nodes: ACTOR and CONTACT, where the first describes the interacting subject and the second is exactly the definition of contact given before (actor1, actor2, timestamp). Each contact is linked with the actors partecipating and with the latest previous contacts of its actors.
	With this representation is possible to track the paths through time.

	### Initial Data Setup
	'''
	//setup
	[source,cypher]

	----

	create index on :`Contact`(`a1`);
	create index on :`Contact`(`a2`);
	create index on :`Contact`(`t`);
	create constraint on (n:`Actor`) assert n.`id` is unique;
	create (_12:`Actor` {`id`:0, `name`:"actor0"}),
	(_13:`Actor` {`id`:2, `name`:"actor2"}),
	(_14:`Contact` {`a1`:0, `a2`:2, `t`:0}),
	(_15:`Actor` {`id`:4, `name`:"actor4"}),
	(_16:`Contact` {`a1`:0, `a2`:4, `t`:1}),
	(_17:`Actor` {`id`:1, `name`:"actor1"}),
	(_18:`Contact` {`a1`:1, `a2`:2, `t`:1}),
	(_19:`Contact` {`a1`:1, `a2`:4, `t`:2}),
	(_20:`Actor` {`id`:3, `name`:"actor3"}),
	(_21:`Contact` {`a1`:2, `a2`:3, `t`:2}),
	(_22:`Contact` {`a1`:1, `a2`:2, `t`:2}),
	(_23:`Contact` {`a1`:2, `a2`:3, `t`:3}),
	(_24:`Contact` {`a1`:1, `a2`:3, `t`:4}),
	(_25:`Contact` {`a1`:0, `a2`:4, `t`:4}),
	(_26:`Contact` {`a1`:1, `a2`:4, `t`:4}),
	_12-[:`ACTOR`]->_14,
	_12-[:`ACTOR`]->_16,
	_12-[:`ACTOR`]->_25,
	_13-[:`ACTOR`]->_18,
	_13-[:`ACTOR`]->_14,
	_13-[:`ACTOR`]->_21,
	_13-[:`ACTOR`]->_22,
	_13-[:`ACTOR`]->_23,
	_14-[:`NEXT_CONTACT`]->_18,
	_14-[:`NEXT_CONTACT`]->_16,
	_15-[:`ACTOR`]->_26,
	_15-[:`ACTOR`]->_25,
	_15-[:`ACTOR`]->_19,
	_15-[:`ACTOR`]->_16,
	_16-[:`NEXT_CONTACT`]->_25,
	_16-[:`NEXT_CONTACT`]->_19,
	_17-[:`ACTOR`]->_18,
	_17-[:`ACTOR`]->_22,
	_17-[:`ACTOR`]->_19,
	_17-[:`ACTOR`]->_26,
	_17-[:`ACTOR`]->_24,
	_18-[:`NEXT_CONTACT`]->_22,
	_18-[:`NEXT_CONTACT`]->_21,
	_18-[:`NEXT_CONTACT`]->_19,
	_19-[:`NEXT_CONTACT`]->_24,
	_19-[:`NEXT_CONTACT`]->_26,
	_19-[:`NEXT_CONTACT`]->_25,
	_20-[:`ACTOR`]->_24,
	_20-[:`ACTOR`]->_23,
	_20-[:`ACTOR`]->_21,
	_21-[:`NEXT_CONTACT`]->_23,
	_22-[:`NEXT_CONTACT`]->_26,
	_22-[:`NEXT_CONTACT`]->_24,
	_22-[:`NEXT_CONTACT`]->_23,
	_23-[:`NEXT_CONTACT`]->_24

	----

	'''
	Graph is shown as the static image below to enlight the shape of data. The label of each contact is the time frame when it took place:

	image::https://www.dropbox.com/s/m51sgf92j78w9by/Contact%20Network%20Model%20-%20neo4j.png?dl=1[]

	## Set of useful queries
	To show the goodness of this model I selected some example queries where some are specific of temporal networks to find journeys and others are just for exercise.

	### Temporal network measures

	#### Fastest journey
	[source,cypher]

	----

	MATCH (a1 :Actor{id:4})-[:ACTOR]->(c:Contact),(a2 :Actor{id:3})-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
	WITH collect(p) as ps, min(length(p)) as l
	RETURN FILTER(p in ps WHERE LENGTH(p) = l) as PATHS, l AS LENGTH

	----

	//table

	#### Fastest journeys tree
	[source,cypher]

	----

	MATCH (a1 :Actor{id:0})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c {t:0})-[:NEXT_CONTACT*0..]->(c1))
	WHERE a1 <> a2
	WITH DISTINCT a2, collect(p) as ps, min(length(p)) as l
	RETURN a2.id AS actorReached, FILTER(p in ps WHERE LENGTH(p) = l) as PATHS, l AS LENGTH
	ORDER BY LENGTH

	----

	//table

	#### Foremost journey
	[source,cypher]

	----

	MATCH (a1 :Actor{id:1})-[:ACTOR]->(c:Contact),(a2 :Actor{id:3})-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
	WITH collect(p) as ps, min(c1.t) as t
	RETURN FILTER(p in ps WHERE LAST(NODES(p)).t = t) as PATHS, t AS TIME

	----

	//table

	#### Foremost journeys tree
	[source,cypher]

	----

	MATCH (a1 :Actor{id:0})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c {t:0})-[:NEXT_CONTACT*0..]->(c1))
	WHERE a1 <> a2
	WITH DISTINCT a2, collect(p) as ps, min(c1.t) as t
	RETURN a2.id AS actorReached, FILTER(p in ps WHERE LAST(NODES(p)).t = t) as PATHS, t AS TIME
	ORDER BY TIME

	----

	//table

	### Others
	#### All actors infected by an actor
	[source,cypher]

	----

	MATCH (a1:Actor {id:2})-[:ACTOR]->(c:Contact)-[:NEXT_CONTACT*0..]->(:Contact)<-[:ACTOR]-(a2:Actor)
	WHERE a1<>a2
	RETURN DISTINCT a2

	----

	//graph_result

	#### All actors infected by an actor starting from a certain frame
	[source,cypher]

	----

	MATCH (a1 :Actor{id:2})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c {t:3})-[:NEXT_CONTACT*0..]->(c1))
	WHERE a1 <> a2
	RETURN DISTINCT a2

	----

	//graph_result

	#### Smallest number of contacts to infect all others starting from a specific actor
	Here I show the _id_ of the actor reached and the number (_minFrame_) of the first frame
	in which he could be infected.
	[source,cypher]

	----

	MATCH (a1 :Actor{id:1})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
	WHERE a1 <> a2
	RETURN a2.id AS actorReached, min(c1.t) as minFrame
	ORDER BY minFrame

	----

	//table

	#### Average number of contacts separing a specific actor from those he can reach, starting from a specific frame
	Here is returned the number of actors reached by a specific one and the average length of the paths.
	[source,cypher]

	----

	MATCH (a1 :Actor{id:4})-[:ACTOR]->(c:Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
	WHERE a1 <> a2 AND c.t >= 0
	WITH DISTINCT a2, collect(p) as ps, min(length(p)) as l
	RETURN COUNT(a2) as NumReached, AVG(l) AS AvgLENGTH

	----

	//table

	#### Count of the actors reachable from each actor and the average length of the contact path
	[source,cypher]

	----

	MATCH (a1 :Actor)-[:ACTOR]->(c :Contact),(a2 :Actor)-[:ACTOR]->(c1 :Contact),
	p=shortestPath((c)-[:NEXT_CONTACT*0..]->(c1))
	WHERE a1 <> a2
	WITH DISTINCT a1, a2, MIN(LENGTH(p)) AS minStep
	RETURN a1.id AS Actor, COUNT(a2) AS actorReached, AVG(minStep) as avgStep

	----

	//table


	## Extensions

	My work is a nice way to represent contact networks with Neo4j. I think that would be interesting to extend my work in many directions.

	### Use of INTERACTION nodes like Cattuto and others [4]

	My model is really useful to investigate temporal features of networks but I cannot be able to define queries to study the features of aggregated portions of the graph. I think that the only way to do this would be to introduce the use of INTERACTION nodes to represent the relationship between two actors in the whole network. These are linked to all the contacts where the interaction took place.

	### Performance analysis

	I tryied this model on my old battle laptop and I did not considered performances because it would be meaningless. However I think that a performance study would be great to show its usability.

	### Real data examples

	To enrich the value of this work would be interesting to perform analysis on real data.

	### Extension of the query set

	The set of query that I presented is not exhaustive and would be nice to extend the query set with other useful measures from literature.

	### Visualization

	To show this data I used the Neo4j visualization that is great
	to display bidimensional graphs but not precise dealing with
	"multidimensional graph" like temporal networks.
	It would be interesting to implement a visualization to aggregate graphs
	according to some definition to help displaying data. For instance
	in my case could be nice to aggregate per frame.

	[1]: http://arxiv.org/pdf/1108.1780v2.pdf "Temporal Networks"

	[2]: http://www.sociopatterns.org/ "Sociopatterns"

	[3]: http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011596&representation=PDF "Sociopatterns framework description"

	[4]: http://event.cwi.nl/grades2013/11-averbuch.pdf "Time-varying networks in neo4j"

	[5]: https://hal.inria.fr/inria-00071996/document "Computing shortest, fastest, and foremost journeys in dynamic networks"