Skip to content

Instantly share code, notes, and snippets.

@ComFreek
Last active September 24, 2019 11:47
Show Gist options
  • Save ComFreek/a82665eea16884d8fabfd820aa89f912 to your computer and use it in GitHub Desktop.
Save ComFreek/a82665eea16884d8fabfd820aa89f912 to your computer and use it in GitHub Desktop.
Computations and statistics on weakly connected components (wcc's) in Neo4J and Cypher

Cypher query to count the number of wcc's of size >= 2:

CALL algo.unionFind.stream('User', null, {})
YIELD nodeId, setId
WITH setId, count(*) AS componentSize
WHERE componentSize >= 2
RETURN count(*) as numberOfComponentsOfSize2OrGreater

It's important to have nodeId, setId and setId, count(*) in there even though we neither use nodeId nor setId. It prevents collapsing identical result rows into one, which would make the query miscount! See also Neo4J's article on aggregation and grouping attributes being implicit.

Cypher query to output a histogram of wcc sizes (aka cluster sizes):

CALL algo.unionFind.stream('PubKey', 'COOCCURRED_AS_INPUTS', {})
YIELD nodeId, setId
WITH setId, count(*) AS componentSize
RETURN componentSize AS clusterSize, count(*) AS occurrences
ORDER BY clusterSize ASC

Sample output

clusterSize    occurrences
         1     1517200
         2     114804
         3     44027
       ...     ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment