Skip to content

Instantly share code, notes, and snippets.

@dhimmel
Last active December 28, 2018 23:38
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dhimmel/f69730d8bdfb880c15ed to your computer and use it in GitHub Desktop.
Save dhimmel/f69730d8bdfb880c15ed to your computer and use it in GitHub Desktop.
Randomized edge swaps in cypher

Degree-Preserving Edge-Swap

Background

We designed a hetnet for drug repurposing that contains 50 thousand nodes (of 10 labels) and 3 million relationships (of 27 types). And we’ve chosen neo4j for handling network storage and interaction.

In the past, we’ve used randomized edge swaps to permute the newtork. In other words, we shuffle network relationships in a way that preserves node degree but destroys relationship specificity. Using this method, we can see whether the signals we observe are due to uninteresting degree effects or meaningful relationships.

Randomization by edge swapping is an established operation on homonets. However, we must be careful when extending the concept to networks with directionality and multiple relationship types. When swapping directed relationships, we should maintain directionality, so that post-permutation the in and out-degrees of each node remain unchanged. Next, we should only swap relationships of the same type. In some instances we may want to permute the entire network by going through each rel_type, but in other cases we may only want to distrub a single relationship type at a time.

Previousely, we permuted hetnets using our python module. Now we would like to implement hetnet permutation by randomized edge swaps in cypher.

Create the graph

Below for an example, we create a directed homonet.

CREATE
 // create nodes
 (n1:Node {name: 'n1'}),
 (n2:Node {name: 'n2'}),
 (n3:Node {name: 'n3'}),
 (n4:Node {name: 'n4'}),
 (n5:Node {name: 'n5'}),
 (n6:Node {name: 'n6'}),
 // create relationships
 (n1)-[:REL {id: 0}]->(n2),
 (n3)-[:REL {id: 1}]->(n4),
 (n5)-[:REL {id: 2}]->(n6),
 (n5)-[:REL {id: 3}]->(n2),
 (n6)-[:REL {id: 4}]->(n2)

Node degree table

MATCH (n:Node)
RETURN
 n.name AS name,
 size((n)<-[:REL]-()) AS in_degree,
 size((n)-[:REL]->()) AS out_degree

The Hunger method

Michael Hunger got us starting by suggesting a method (see his tweet and graphgist). Hunger’s example initiates the query by choosing a random node. We modified his solution to initiate with selecting a random relationship, since we’re interested in randomized relationships and thus want each relationship to have a similar chance of being randomized at any given iteration. The Hunger method below performs a single swap (iteration):

// Retrieve a random relationship
MATCH (u)-[r1:REL]->(v)
 WITH * ORDER BY rand() LIMIT 1
// Retrieve a second random relationship that is eligible for swapping
MATCH (x)-[r2:REL]->(y)
 WHERE u <> x AND v <> y
 AND NOT exists((u)-[:REL]->(x))
 AND NOT exists((v)-[:REL]->(y))
 WITH * ORDER BY rand() LIMIT 1
// Execute the swap
DELETE r1, r2
CREATE (u)-[:REL]->(x)
CREATE (v)-[:REL]->(y)

The above method picks a random relationship from all relationships. Then all other relationships are identified that are eligible for a swap, and a random one is chosen. This method works, but will not scale well.

Optimized methods

Searching through all relationships in order to choose a single random relationship is inneficient. Instead we can generate random indexes to choose two random relationships. Once we have the index, we need quick relationship lookup. This can be done using using relationships IDs or a constraint on a relationship property. The ID method is desirable because we don’t have to create any relationship properties. However, since we’re deleting and creating relationships, IDs are subject to change and must therefore be tracked.

Once we have two random indexes, we can do a swap. The query below uses the ID method and swaps relationships with IDs 1 and 3:

MATCH (u)-[r1:REL]-(v)
MATCH (x)-[r2:REL]-(y)
WHERE id(r1) = 1 AND id(r2) = 3
AND u <> x AND v <> y
AND NOT exists((u)-[:REL]->(x))
AND NOT exists((v)-[:REL]->(y))
WITH * LIMIT 1
DELETE r1, r2
CREATE (u)-[nr1:REL]->(x)
CREATE (v)-[nr2:REL]->(y)
RETURN id(nr1), id(nr2)

Questions

  1. Is there a way to transfer the properties of a deleted relationship to a new relationship?

  2. Can relationships have constrained properties (UNIQUE if possible), so we can quickly lookup edges based on random integers? The following command fails CREATE CONSTRAINT ON ()-[r:REL]-() ASSERT exists(r.id) despite a similar example in the documentation.

  3. Is there a way to modify the query above, so that if no swap is performed the IDs of the input relationships are returned?

  4. In the previous query, I added WITH * LIMIT 1 since multiple rows were being returned. Why are multiple rows returned?

  5. Will it be possible to write a cypher loop to repeatedly perform swaps?

Any help answering these questions will be much appreciated! We’ve created a Thinklab discussion in case anyone would like to reply there to recieve credit for their contributions.

@dhimmel
Copy link
Author

dhimmel commented Dec 26, 2015

@jexp, thanks, I think we're a few steps closer.

In the query below, I've used UNWIND to set the number of iterations (attempted swaps). Random integers corresponding to relationship identifiers (stored here as a relationship property id) are generated using toInt(rand() * { n_rels }).

// generate pairs of random relationship identifiers
UNWIND range(1, 10) AS iteration
WITH toInt(rand() * 5) AS i1, toInt(rand() * 5) AS i2
WHERE i1 <> i2
// evaluate relationship eligibility for XSwap
MATCH (u)-[r1:REL]->(v)
MATCH (x)-[r2:REL]->(y)
WHERE r1.id = i1 AND r2.id = i2 AND u <> x AND v <> y
AND NOT exists((u)-[:REL]-(y))
AND NOT exists((x)-[:REL]-(v))
// perform the swap
CREATE (u)-[nr1:REL]->(y)
CREATE (x)-[nr2:REL]->(v)
SET nr1.id = r1.id
SET nr2.id = r2.id
DELETE r1, r2

However, we're running into the following error (id can change):

Error: org.neo4j.cypher.CypherExecutionException: Unable to load RELATIONSHIP with id 8.

I suspect we're running into the following snag. Our method is designed for nested query execution where each iteration triggers: 1) extracting the two relationships specified by their identifiers and 2) evaluating swap eligibility and potentially performing the swap. However, rather than executing 1, 2, 1, 2, 1, 2, what's happening is 1, 1, 1, 2, 2, 2. First, pairs of relationships are extracted for swapping. Then write operations delete relationships, so some of the previously identified relationship pairs contain deleted relationships.

Is there a way to switch from a sequential to nested query? Basically, we just need a way to run a single query many times, with each iteration executed separately.

@jexp
Copy link

jexp commented Jan 4, 2016

This relates to the Eagerness behavior of Cypher. If you run your query with EXPLAIN prefix you'll see that it uses the EAGER operator to make sure that visibility of changes is guaranteed by executing the first part of the query to it's entirety. As you delete relationships under the hood they will not be accessible later.

I think it makes more sense to run the loop externally.

Also your relationship by id-property match is quite expensive.

What you could try is to filter duplicates/overlaps first.
Or to get your eligible relationships into a list of uniques before splitting them out for swapping. Currently the inequality is only checked per pair but not globally.

@dhimmel
Copy link
Author

dhimmel commented Jan 6, 2016

@jexp thanks for the continued help. I wrote up our progress on Thinklab. I think I now understand what's possible in cypher and what's not. So I can probably take it solo from here. To respond to some of the specifics:

This relates to the Eagerness behavior of Cypher

Got it—there is no way to override the eagerness of cypher to perform an iterative query.

your relationship by id-property match is quite expensive

I was only planning to use this method if we could put a property existence constraint on rel.id for efficient lookup.

What you could try is to filter duplicates/overlaps first. Or to get your eligible relationships into a list of uniques before splitting them out for swapping.

I'm not confident I understand the functional effects such a move would have on the randomization. I'd rather stick with the more traditional iterative method.

@DKroot
Copy link

DKroot commented Dec 27, 2018

Nice write-up! I'm starting from the Hunger method implementation and it turns out that the Cypher code above for it has a bug that messes up degrees. The correct swap is to create (u)->(y) + (x)->(v).

@DKroot
Copy link

DKroot commented Dec 28, 2018

On #5: yes, you can execute swaps repeatedly using the APOC plug-in. I played with UNWIND, but it didn't seem to work. You can do a fixed number of swaps or a dynamic number of swaps, e.g.:

// The Hunger method: x swaps (x = the number of relationships)
// Assumes all relationships of the same type
CALL apoc.periodic.iterate(
  "MATCH ()-[r]->() RETURN r",
  "// Retrieve a random relationship
  MATCH (u)-[r1]->(v)
  WITH *
  ORDER BY rand()
  LIMIT 1
  // Retrieve a second random relationship that is eligible for swapping
  MATCH (x)-[r2]->(y)
  WHERE x <> u AND y <> v
    AND NOT exists((u)-->(y))
    AND NOT exists((x)-->(v))
  WITH *
  ORDER BY rand()
  LIMIT 1
  // Execute the swap
  DELETE r1, r2
  WITH *
  CALL apoc.create.relationship(u, type(r1), {}, y) YIELD rel
  WITH x, r1, v, rel AS nr1
  CALL apoc.create.relationship(x, type(r1), {}, v) YIELD rel
  RETURN nr1, rel AS nr2",
  {}
);

@DKroot
Copy link

DKroot commented Dec 28, 2018

Implemented the optimized algorithm with the number of swaps = the number of relationships.

// Optimized swap using ids: x swaps (x = the number of relationships)
// Assumes all relationships are of the same type
CALL apoc.periodic.iterate(
'MATCH ()-[r]->()
WITH collect(id(r)) AS r_ids
UNWIND r_ids AS id
RETURN *',
'// Retrieve 2 random edges eligible for swapping
MATCH (u)-[r1]->(v), (x)-[r2]->(y)
  WHERE id(r1) = r_ids[toInteger(rand() * size(r_ids))]
  AND id(r2) = r_ids[toInteger(rand() * size(r_ids))]
  AND x <> u AND y <> v
  AND NOT exists((u)-->(y))
  AND NOT exists((x)-->(v))
WITH *
  LIMIT 1
// Execute the swap
DELETE r1, r2
WITH *
CALL apoc.create.relationship(u, type(r1), {}, y) YIELD rel
WITH x, r1, v, rel AS nr1
CALL apoc.create.relationship(x, type(r1), {}, v) YIELD rel
RETURN nr1, rel AS nr2',
{}
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment