Skip to content

Instantly share code, notes, and snippets.

@Taresin
Last active February 5, 2025 03:50
Show Gist options
  • Save Taresin/2f150e4daa652d0eb5d58f15727f3df6 to your computer and use it in GitHub Desktop.
Save Taresin/2f150e4daa652d0eb5d58f15727f3df6 to your computer and use it in GitHub Desktop.
Write a python script that creates a baseline betweenness centrality scores for the Sydney trains network

Scripting Story

I’ll need to interact with the Neo 4J database using the Python driver.

The package that I used was the neo4J library.

Install that and import the graph database driver.

from neo4j import GraphDatabase

Using this driver, I connect to the database using the credentials that I've set up for my Neo4J instance.

NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"
NEO4J_DATABASE = "trains"

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database=NEO4J_DATABASE)

Now we run the driver to execute a query.

In this particular query, I create a projection for the baseline.

query = f"""
        MATCH (s1:Stop)-[r:NEXT_STOP]->(s2:Stop)
        WITH gds.graph.project('{projection_name}', s1, s2) AS g
        RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS relationships
    """
session.run(query)

I then use this projection to calculate the baseline BC scores.

# Calculate the betweenness centrality on the baseline projection
query = f"""
    CALL gds.betweenness.stream('{projection_name}')
    YIELD nodeId, score
    RETURN gds.util.asNode(nodeId).name AS station, score
"""
result = session.run(query)
baseline = pd.Series({record['station']: record['score'] for record in result})

A little bit of forward thinking. Since I'll be using the pandas library to make statistical calculations on these scores, I would need to create a data frame or series and return it as part of my function.

Gotchas

Here are some of the gotchas that I ran into whilst developing the script.

It seems like there is a deprecated function to create cypher projections.

The new function to call it is: gds.graph.project

The documentation of it is here:

https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/graph-project-cypher-projection/

The queries create an error whenever you want to create a projection with the same name as one that exists.

An error also occurs when you try to drop a projection that does not exist.

This means that when I create or drop the projection, I would need to wrap it up in an existence check.

CALL gds.graph.exists('{projection_name}') YIELD exists
WHERE exists
CALL gds.graph.drop('{projection_name}') YIELD graphName
RETURN graphName;
0
Hurstville 1292.0
Allawah 1350.0
Banksia 1562.0
Arncliffe 1610.0
Chatswood 2718.0
Artarmon 1143.0
Croydon 624.0
Ashfield 550.0
Mount Colah 152.0
Asquith 225.0
Lidcombe 1005.0
Auburn 481.0
Rockdale 1512.0
Martin Place 2342.0
Barangaroo 2074.0
Bexley North 1206.0
Bardwell Park 1259.0
Pennant Hills 196.0
Beecroft 216.0
Norwest 376.0
Bella Vista 285.0
Regents Park 245.0
Berala 239.0
Narwee 1035.0
Beverly Hills 1094.0
Kingsgrove 1151.0
Yagoona 18.0
Birrong 34.0
Doonside 49.0
Blacktown 108.0
Marayong 63.0
Edgecliff 83.0
Bondi Junction 0.0
Redfern 1497.0
Burwood 1423.0
Warwick Farm 253.0
Cabramatta 264.0
Macarthur 0.0
Campbelltown 103.0
Canley Vale 123.0
Woolooware 87.0
Caringbah 172.0
Carlton 1406.0
Carramar 182.0
Cherrybrook 637.0
Castle Hill 552.0
Glenfield 784.0
Casula 225.0
Green Square 180.0
Central 2643.5
Town Hall 811.5
Waterloo 1920.0
Crows Nest 2030.0
Roseville 650.0
Cheltenham 234.0
Epping 1555.0
Leightonfield 194.0
Chester Hill 197.0
Wynyard 1011.0
Circular Quay 95.5
East Richmond 15.0
Clarendon 28.0
Clyde 356.0
Jannali 972.0
Como 1040.0
Rhodes 445.0
Concord West 361.0
Victoria Cross 2052.0
Eastwood 761.0
Denistone 685.0
International Airport 138.0
Domestic Airport 154.0
Rooty Hill 48.0
Holsworthy 639.0
East Hills 710.0
Richmond 0.0
Kings Cross 164.0
Leppington 0.0
Edmondson Park 99.0
Heathcote 170.0
Engadine 252.0
Macquarie University 1359.0
St Peters 48.0
Erskineville 20.0
Fairfield 105.0
Flemington 150.0
Gadigal 1998.0
Macquarie Fields 495.0
Pymble 446.0
Gordon 500.0
Granville 229.0
Mascot 168.0
Yennora 85.0
Guildford 63.0
Miranda 255.0
Gymea 336.0
Harris Park 113.0
Merrylands 39.0
Waterfall 86.0
Hills Showground 465.0
Homebush 32.0
Hornsby 296.0
Penshurst 1232.0
Minto 303.0
Ingleburn 400.0
Wolli Creek 1761.0
Sutherland 902.0
Kellyville 192.0
Killara 552.0
Penrith 13.0
Kingswood 24.0
Kirrawee 415.0
Kogarah 1460.0
Villawood 189.0
Leumeah 204.0
Summer Hill 476.0
Lewisham 402.0
Strathfield 924.0
Lindfield 602.0
Liverpool 240.0
Loftus 332.0
Newtown 180.0
Macdonaldtown 106.0
North Ryde 1471.0
Macquarie Park 1416.0
Quakers Hill 64.0
West Ryde 607.0
Meadowbank 527.0
North Sydney 1055.0
Milsons Point 1033.0
Oatley 1106.0
Mortdale 1170.0
Mount Kuring-gai 77.0
St Marys 40.0
Mount Druitt 45.0
Berowra 0.0
Windsor 39.0
Mulgrave 48.0
Museum 0.0
St James 14.5
Riverwood 974.0
Stanmore 254.0
Normanhurst 150.0
North Strathfield 275.0
Waverton 1077.0
Olympic Park 0.0
Revesby 846.0
Padstow 911.0
Panania 779.0
Parramatta 0.0
Westmead 23.0
Toongabbie 80.0
Pendle Hill 63.0
Thornleigh 174.0
Emu Plains 0.0
Petersham 328.0
Turramurra 390.0
Schofields 63.0
Sydenham 2871.0
Sefton 198.0
Vineyard 55.0
Riverstone 60.0
Rouse Hill 97.0
Seven Hills 95.0
St Leonards 1121.0
Werrington 33.0
Tempe 1600.0
Turrella 1310.0
Tallawong 0.0
Warrawee 332.0
Waitara 210.0
Wahroonga 272.0
Helensburgh 0.0
Wollstonecraft 1099.0
Wentworthville 44.0
Cronulla 0.0
Bankstown 0.0
import pandas as pd
from neo4j import GraphDatabase
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"
NEO4J_DATABASE = "trains"
BASELINE_PROJECTION = "baselineProjection"
TEMP_PROJECTION = "tempProjection"
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database=NEO4J_DATABASE)
def get_betweenness_scores(session, projection_name):
# Delete the baseline projection if it exists
query = f"""
CALL gds.graph.exists('{projection_name}') YIELD exists
WHERE exists
CALL gds.graph.drop('{projection_name}') YIELD graphName
RETURN graphName;
"""
session.run(query)
# Create a new projection with all stops and their existing :NEXT_STOP relationships
query = f"""
MATCH (s1:Stop)-[r:NEXT_STOP]->(s2:Stop)
WITH gds.graph.project('{projection_name}', s1, s2) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS relationships
"""
session.run(query)
# Calculate the betweenness centrality on the baseline projection
query = f"""
CALL gds.betweenness.stream('{projection_name}')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS station, score
"""
result = session.run(query)
baseline = pd.Series({record['station']: record['score'] for record in result})
# Delete the baseline projection as cleanup
query = f"""
CALL gds.graph.drop('{projection_name}') YIELD graphName
"""
session.run(query)
return baseline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment