{{ message }}

Instantly share code, notes, and snippets.

Last active Dec 28, 2018
Randomized edge swaps in cypher

# Degree-Preserving Edge-Swap

## Background

We designed a hetnet for drug repurposing that contains 50 thousand nodes (of 10 labels) and 3 million relationships (of 27 types). And we’ve chosen neo4j for handling network storage and interaction.

In the past, we’ve used randomized edge swaps to permute the newtork. In other words, we shuffle network relationships in a way that preserves node degree but destroys relationship specificity. Using this method, we can see whether the signals we observe are due to uninteresting degree effects or meaningful relationships.

Randomization by edge swapping is an established operation on homonets. However, we must be careful when extending the concept to networks with directionality and multiple relationship types. When swapping directed relationships, we should maintain directionality, so that post-permutation the in and out-degrees of each node remain unchanged. Next, we should only swap relationships of the same type. In some instances we may want to permute the entire network by going through each rel_type, but in other cases we may only want to distrub a single relationship type at a time.

Previousely, we permuted hetnets using our python module. Now we would like to implement hetnet permutation by randomized edge swaps in cypher.

### Create the graph

Below for an example, we create a directed homonet.

``````CREATE
// create nodes
(n1:Node {name: 'n1'}),
(n2:Node {name: 'n2'}),
(n3:Node {name: 'n3'}),
(n4:Node {name: 'n4'}),
(n5:Node {name: 'n5'}),
(n6:Node {name: 'n6'}),
// create relationships
(n1)-[:REL {id: 0}]->(n2),
(n3)-[:REL {id: 1}]->(n4),
(n5)-[:REL {id: 2}]->(n6),
(n5)-[:REL {id: 3}]->(n2),
(n6)-[:REL {id: 4}]->(n2)``````

### Node degree table

``````MATCH (n:Node)
RETURN
n.name AS name,
size((n)<-[:REL]-()) AS in_degree,
size((n)-[:REL]->()) AS out_degree``````

### The Hunger method

Michael Hunger got us starting by suggesting a method (see his tweet and graphgist). Hunger’s example initiates the query by choosing a random node. We modified his solution to initiate with selecting a random relationship, since we’re interested in randomized relationships and thus want each relationship to have a similar chance of being randomized at any given iteration. The Hunger method below performs a single swap (iteration):

```// Retrieve a random relationship
MATCH (u)-[r1:REL]->(v)
WITH * ORDER BY rand() LIMIT 1
// Retrieve a second random relationship that is eligible for swapping
MATCH (x)-[r2:REL]->(y)
WHERE u <> x AND v <> y
AND NOT exists((u)-[:REL]->(x))
AND NOT exists((v)-[:REL]->(y))
WITH * ORDER BY rand() LIMIT 1
// Execute the swap
DELETE r1, r2
CREATE (u)-[:REL]->(x)
CREATE (v)-[:REL]->(y)```

The above method picks a random relationship from all relationships. Then all other relationships are identified that are eligible for a swap, and a random one is chosen. This method works, but will not scale well.

### Optimized methods

Searching through all relationships in order to choose a single random relationship is inneficient. Instead we can generate random indexes to choose two random relationships. Once we have the index, we need quick relationship lookup. This can be done using using relationships IDs or a constraint on a relationship property. The ID method is desirable because we don’t have to create any relationship properties. However, since we’re deleting and creating relationships, IDs are subject to change and must therefore be tracked.

Once we have two random indexes, we can do a swap. The query below uses the ID method and swaps relationships with IDs `1` and `3`:

``````MATCH (u)-[r1:REL]-(v)
MATCH (x)-[r2:REL]-(y)
WHERE id(r1) = 1 AND id(r2) = 3
AND u <> x AND v <> y
AND NOT exists((u)-[:REL]->(x))
AND NOT exists((v)-[:REL]->(y))
WITH * LIMIT 1
DELETE r1, r2
CREATE (u)-[nr1:REL]->(x)
CREATE (v)-[nr2:REL]->(y)
RETURN id(nr1), id(nr2)``````

## Questions

1. Is there a way to transfer the properties of a deleted relationship to a new relationship?

2. Can relationships have constrained properties (`UNIQUE` if possible), so we can quickly lookup edges based on random integers? The following command fails `CREATE CONSTRAINT ON ()-[r:REL]-() ASSERT exists(r.id)` despite a similar example in the documentation.

3. Is there a way to modify the query above, so that if no swap is performed the IDs of the input relationships are returned?

4. In the previous query, I added `WITH * LIMIT 1` since multiple rows were being returned. Why are multiple rows returned?

5. Will it be possible to write a cypher loop to repeatedly perform swaps?

Any help answering these questions will be much appreciated! We’ve created a Thinklab discussion in case anyone would like to reply there to recieve credit for their contributions.

### jexp commented Dec 22, 2015

 Answers: yes `SET nr1=r1` Property Existence Constraints in Neo4j Enterprise, you should be eligible for a Startup License The id's are changed for newly created relationships, there is currently no way of changing the start-/end-node of a relationship You didn't specify a direction, so one path per direction is returned, and as you have 2 rels this is a combination of 4 paths in total You could pass in a collection of id-pairs and loop over that with UNWIND (this should work for all except your LIMIT 1), so you will have to add a direction

### dhimmel commented Dec 26, 2015

 @jexp, thanks, I think we're a few steps closer. In the query below, I've used `UNWIND` to set the number of iterations (attempted swaps). Random integers corresponding to relationship identifiers (stored here as a relationship property `id`) are generated using `toInt(rand() * { n_rels })`. ``````// generate pairs of random relationship identifiers UNWIND range(1, 10) AS iteration WITH toInt(rand() * 5) AS i1, toInt(rand() * 5) AS i2 WHERE i1 <> i2 // evaluate relationship eligibility for XSwap MATCH (u)-[r1:REL]->(v) MATCH (x)-[r2:REL]->(y) WHERE r1.id = i1 AND r2.id = i2 AND u <> x AND v <> y AND NOT exists((u)-[:REL]-(y)) AND NOT exists((x)-[:REL]-(v)) // perform the swap CREATE (u)-[nr1:REL]->(y) CREATE (x)-[nr2:REL]->(v) SET nr1.id = r1.id SET nr2.id = r2.id DELETE r1, r2 `````` However, we're running into the following error (id can change): `Error: org.neo4j.cypher.CypherExecutionException: Unable to load RELATIONSHIP with id 8.` I suspect we're running into the following snag. Our method is designed for nested query execution where each iteration triggers: 1) extracting the two relationships specified by their identifiers and 2) evaluating swap eligibility and potentially performing the swap. However, rather than executing `1, 2, 1, 2, 1, 2`, what's happening is `1, 1, 1, 2, 2, 2`. First, pairs of relationships are extracted for swapping. Then write operations delete relationships, so some of the previously identified relationship pairs contain deleted relationships. Is there a way to switch from a sequential to nested query? Basically, we just need a way to run a single query many times, with each iteration executed separately.

### jexp commented Jan 4, 2016

 This relates to the Eagerness behavior of Cypher. If you run your query with EXPLAIN prefix you'll see that it uses the EAGER operator to make sure that visibility of changes is guaranteed by executing the first part of the query to it's entirety. As you delete relationships under the hood they will not be accessible later. I think it makes more sense to run the loop externally. Also your relationship by id-property match is quite expensive. What you could try is to filter duplicates/overlaps first. Or to get your eligible relationships into a list of uniques before splitting them out for swapping. Currently the inequality is only checked per pair but not globally.

### dhimmel commented Jan 6, 2016

 @jexp thanks for the continued help. I wrote up our progress on Thinklab. I think I now understand what's possible in cypher and what's not. So I can probably take it solo from here. To respond to some of the specifics: This relates to the Eagerness behavior of Cypher Got it—there is no way to override the eagerness of cypher to perform an iterative query. your relationship by id-property match is quite expensive I was only planning to use this method if we could put a property existence constraint on `rel.id` for efficient lookup. What you could try is to filter duplicates/overlaps first. Or to get your eligible relationships into a list of uniques before splitting them out for swapping. I'm not confident I understand the functional effects such a move would have on the randomization. I'd rather stick with the more traditional iterative method.

### DKroot commented Dec 27, 2018

 Nice write-up! I'm starting from the Hunger method implementation and it turns out that the Cypher code above for it has a bug that messes up degrees. The correct swap is to create (u)->(y) + (x)->(v).

### DKroot commented Dec 28, 2018 • edited

 On #5: yes, you can execute swaps repeatedly using the `APOC` plug-in. I played with `UNWIND`, but it didn't seem to work. You can do a fixed number of swaps or a dynamic number of swaps, e.g.: ``````// The Hunger method: x swaps (x = the number of relationships) // Assumes all relationships of the same type CALL apoc.periodic.iterate( "MATCH ()-[r]->() RETURN r", "// Retrieve a random relationship MATCH (u)-[r1]->(v) WITH * ORDER BY rand() LIMIT 1 // Retrieve a second random relationship that is eligible for swapping MATCH (x)-[r2]->(y) WHERE x <> u AND y <> v AND NOT exists((u)-->(y)) AND NOT exists((x)-->(v)) WITH * ORDER BY rand() LIMIT 1 // Execute the swap DELETE r1, r2 WITH * CALL apoc.create.relationship(u, type(r1), {}, y) YIELD rel WITH x, r1, v, rel AS nr1 CALL apoc.create.relationship(x, type(r1), {}, v) YIELD rel RETURN nr1, rel AS nr2", {} ); ``````

### DKroot commented Dec 28, 2018 • edited

 Implemented the optimized algorithm with the number of swaps = the number of relationships. ``````// Optimized swap using ids: x swaps (x = the number of relationships) // Assumes all relationships are of the same type CALL apoc.periodic.iterate( 'MATCH ()-[r]->() WITH collect(id(r)) AS r_ids UNWIND r_ids AS id RETURN *', '// Retrieve 2 random edges eligible for swapping MATCH (u)-[r1]->(v), (x)-[r2]->(y) WHERE id(r1) = r_ids[toInteger(rand() * size(r_ids))] AND id(r2) = r_ids[toInteger(rand() * size(r_ids))] AND x <> u AND y <> v AND NOT exists((u)-->(y)) AND NOT exists((x)-->(v)) WITH * LIMIT 1 // Execute the swap DELETE r1, r2 WITH * CALL apoc.create.relationship(u, type(r1), {}, y) YIELD rel WITH x, r1, v, rel AS nr1 CALL apoc.create.relationship(x, type(r1), {}, v) YIELD rel RETURN nr1, rel AS nr2', {} ); ``````