dhimmel/cypher-edge-swap.adoc

## cypher-edge-swap.adoc

      
    Raw
  

              cypher-edge-swap.adoc
            
          
    Degree-Preserving Edge-Swap


Background


We designed a hetnet for drug repurposing that contains 50 thousand nodes (of 10 labels) and 3 million relationships (of 27 types). And we’ve chosen neo4j for handling network storage and interaction.


In the past, we’ve used randomized edge swaps to permute the newtork. In other words, we shuffle network relationships in a way that preserves node degree but destroys relationship specificity. Using this method, we can see whether the signals we observe are due to uninteresting degree effects or meaningful relationships.


Randomization by edge swapping is an established operation on homonets. However, we must be careful when extending the concept to networks with directionality and multiple relationship types. When swapping directed relationships, we should maintain directionality, so that post-permutation the in and out-degrees of each node remain unchanged. Next, we should only swap relationships of the same type. In some instances we may want to permute the entire network by going through each rel_type, but in other cases we may only want to distrub a single relationship type at a time.


Previousely, we permuted hetnets using our python module. Now we would like to implement hetnet permutation by randomized edge swaps in cypher.


Create the graph


Below for an example, we create a directed homonet.


CREATE
 // create nodes
 (n1:Node {name: 'n1'}),
 (n2:Node {name: 'n2'}),
 (n3:Node {name: 'n3'}),
 (n4:Node {name: 'n4'}),
 (n5:Node {name: 'n5'}),
 (n6:Node {name: 'n6'}),
 // create relationships
 (n1)-[:REL {id: 0}]->(n2),
 (n3)-[:REL {id: 1}]->(n4),
 (n5)-[:REL {id: 2}]->(n6),
 (n5)-[:REL {id: 3}]->(n2),
 (n6)-[:REL {id: 4}]->(n2)


Node degree table


MATCH (n:Node)
RETURN
 n.name AS name,
 size((n)<-[:REL]-()) AS in_degree,
 size((n)-[:REL]->()) AS out_degree


The Hunger method


Michael Hunger got us starting by suggesting a method (see his tweet and graphgist). Hunger’s example initiates the query by choosing a random node. We modified his solution to initiate with selecting a random relationship, since we’re interested in randomized relationships and thus want each relationship to have a similar chance of being randomized at any given iteration. The Hunger method below performs a single swap (iteration):


// Retrieve a random relationship
MATCH (u)-[r1:REL]->(v)
 WITH * ORDER BY rand() LIMIT 1
// Retrieve a second random relationship that is eligible for swapping
MATCH (x)-[r2:REL]->(y)
 WHERE u <> x AND v <> y
 AND NOT exists((u)-[:REL]->(x))
 AND NOT exists((v)-[:REL]->(y))
 WITH * ORDER BY rand() LIMIT 1
// Execute the swap
DELETE r1, r2
CREATE (u)-[:REL]->(x)
CREATE (v)-[:REL]->(y)


The above method picks a random relationship from all relationships. Then all other relationships are identified that are eligible for a swap, and a random one is chosen. This method works, but will not scale well.


Optimized methods


Searching through all relationships in order to choose a single random relationship is inneficient. Instead we can generate random indexes to choose two random relationships. Once we have the index, we need quick relationship lookup. This can be done using using relationships IDs or a constraint on a relationship property. The ID method is desirable because we don’t have to create any relationship properties. However, since we’re deleting and creating relationships, IDs are subject to change and must therefore be tracked.


Once we have two random indexes, we can do a swap. The query below uses the ID method and swaps relationships with IDs 1 and 3:


MATCH (u)-[r1:REL]-(v)
MATCH (x)-[r2:REL]-(y)
WHERE id(r1) = 1 AND id(r2) = 3
AND u <> x AND v <> y
AND NOT exists((u)-[:REL]->(x))
AND NOT exists((v)-[:REL]->(y))
WITH * LIMIT 1
DELETE r1, r2
CREATE (u)-[nr1:REL]->(x)
CREATE (v)-[nr2:REL]->(y)
RETURN id(nr1), id(nr2)


Questions


Is there a way to transfer the properties of a deleted relationship to a new relationship?


Can relationships have constrained properties (UNIQUE if possible), so we can quickly lookup edges based on random integers? The following command fails CREATE CONSTRAINT ON ()-[r:REL]-() ASSERT exists(r.id) despite a similar example in the documentation.


Is there a way to modify the query above, so that if no swap is performed the IDs of the input relationships are returned?


In the previous query, I added WITH * LIMIT 1 since multiple rows were being returned. Why are multiple rows returned?


Will it be possible to write a cypher loop to repeatedly perform swaps?


Any help answering these questions will be much appreciated! We’ve created a Thinklab discussion in case anyone would like to reply there to recieve credit for their contributions.