Taresin/__sydney_trains_baseline_script_story.md

## __sydney_trains_baseline_script_story.md

      
    Raw
  

              __sydney_trains_baseline_script_story.md
            
          
    Scripting Story

I’ll need to interact with the Neo 4J database using the Python driver.
The package that I used was the neo4J library.
Install that and import the graph database driver.
from neo4j import GraphDatabase
Using this driver, I connect to the database using the credentials that I've set up for my Neo4J instance.
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"
NEO4J_DATABASE = "trains"

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database=NEO4J_DATABASE)
Now we run the driver to execute a query.
In this particular query, I create a projection for the baseline.
query = f"""
        MATCH (s1:Stop)-[r:NEXT_STOP]->(s2:Stop)
        WITH gds.graph.project('{projection_name}', s1, s2) AS g
        RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS relationships
    """
session.run(query)
I then use this projection to calculate the baseline BC scores.
# Calculate the betweenness centrality on the baseline projection
query = f"""
    CALL gds.betweenness.stream('{projection_name}')
    YIELD nodeId, score
    RETURN gds.util.asNode(nodeId).name AS station, score
"""
result = session.run(query)
baseline = pd.Series({record['station']: record['score'] for record in result})
A little bit of forward thinking. Since I'll be using the pandas library to make statistical calculations on these scores, I would need to create a data frame or series and return it as part of my function.
Gotchas

Here are some of the gotchas that I ran into whilst developing the script.
It seems like there is a deprecated function to create cypher projections.
The new function to call it is: gds.graph.project
The documentation of it is here:
https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/graph-project-cypher-projection/
The queries create an error whenever you want to create a projection with the same name as one that exists.
An error also occurs when you try to drop a projection that does not exist.
This means that when I create or drop the projection, I would need to wrap it up in an existence check.
CALL gds.graph.exists('{projection_name}') YIELD exists
WHERE exists
CALL gds.graph.drop('{projection_name}') YIELD graphName
RETURN graphName;

  
## baseline_results.csv

          
            0

            
              Hurstville
              1292.0

            
              Allawah
              1350.0

            
              Banksia
              1562.0

            
              Arncliffe
              1610.0

            
              Chatswood
              2718.0

            
              Artarmon
              1143.0

            
              Croydon
              624.0

            
              Ashfield
              550.0

            
              Mount Colah
              152.0

            
              Asquith
              225.0

            
              Lidcombe
              1005.0

            
              Auburn
              481.0

            
              Rockdale
              1512.0

            
              Martin Place
              2342.0

            
              Barangaroo
              2074.0

            
              Bexley North
              1206.0

            
              Bardwell Park
              1259.0

            
              Pennant Hills
              196.0

            
              Beecroft
              216.0

            
              Norwest
              376.0

            
              Bella Vista
              285.0

            
              Regents Park
              245.0

            
              Berala
              239.0

            
              Narwee
              1035.0

            
              Beverly Hills
              1094.0

            
              Kingsgrove
              1151.0

            
              Yagoona
              18.0

            
              Birrong
              34.0

            
              Doonside
              49.0

            
              Blacktown
              108.0

            
              Marayong
              63.0

            
              Edgecliff
              83.0

            
              Bondi Junction
              0.0

            
              Redfern
              1497.0

            
              Burwood
              1423.0

            
              Warwick Farm
              253.0

            
              Cabramatta
              264.0

            
              Macarthur
              0.0

            
              Campbelltown
              103.0

            
              Canley Vale
              123.0

            
              Woolooware
              87.0

            
              Caringbah
              172.0

            
              Carlton
              1406.0

            
              Carramar
              182.0

            
              Cherrybrook
              637.0

            
              Castle Hill
              552.0

            
              Glenfield
              784.0

            
              Casula
              225.0

            
              Green Square
              180.0

            
              Central
              2643.5

            
              Town Hall
              811.5

            
              Waterloo
              1920.0

            
              Crows Nest
              2030.0

            
              Roseville
              650.0

            
              Cheltenham
              234.0

            
              Epping
              1555.0

            
              Leightonfield
              194.0

            
              Chester Hill
              197.0

            
              Wynyard
              1011.0

            
              Circular Quay
              95.5

            
              East Richmond
              15.0

            
              Clarendon
              28.0

            
              Clyde
              356.0

            
              Jannali
              972.0

            
              Como
              1040.0

            
              Rhodes
              445.0

            
              Concord West
              361.0

            
              Victoria Cross
              2052.0

            
              Eastwood
              761.0

            
              Denistone
              685.0

            
              International Airport
              138.0

            
              Domestic Airport
              154.0

            
              Rooty Hill
              48.0

            
              Holsworthy
              639.0

            
              East Hills
              710.0

            
              Richmond
              0.0

            
              Kings Cross
              164.0

            
              Leppington
              0.0

            
              Edmondson Park
              99.0

            
              Heathcote
              170.0

            
              Engadine
              252.0

            
              Macquarie University
              1359.0

            
              St Peters
              48.0

            
              Erskineville
              20.0

            
              Fairfield
              105.0

            
              Flemington
              150.0

            
              Gadigal
              1998.0

            
              Macquarie Fields
              495.0

            
              Pymble
              446.0

            
              Gordon
              500.0

            
              Granville
              229.0

            
              Mascot
              168.0

            
              Yennora
              85.0

            
              Guildford
              63.0

            
              Miranda
              255.0

            
              Gymea
              336.0

            
              Harris Park
              113.0

            
              Merrylands
              39.0

            
              Waterfall
              86.0

            
              Hills Showground
              465.0

            
              Homebush
              32.0

            
              Hornsby
              296.0

            
              Penshurst
              1232.0

            
              Minto
              303.0

            
              Ingleburn
              400.0

            
              Wolli Creek
              1761.0

            
              Sutherland
              902.0

            
              Kellyville
              192.0

            
              Killara
              552.0

            
              Penrith
              13.0

            
              Kingswood
              24.0

            
              Kirrawee
              415.0

            
              Kogarah
              1460.0

            
              Villawood
              189.0

            
              Leumeah
              204.0

            
              Summer Hill
              476.0

            
              Lewisham
              402.0

            
              Strathfield
              924.0

            
              Lindfield
              602.0

            
              Liverpool
              240.0

            
              Loftus
              332.0

            
              Newtown
              180.0

            
              Macdonaldtown
              106.0

            
              North Ryde
              1471.0

            
              Macquarie Park
              1416.0

            
              Quakers Hill
              64.0

            
              West Ryde
              607.0

            
              Meadowbank
              527.0

            
              North Sydney
              1055.0

            
              Milsons Point
              1033.0

            
              Oatley
              1106.0

            
              Mortdale
              1170.0

            
              Mount Kuring-gai
              77.0

            
              St Marys
              40.0

            
              Mount Druitt
              45.0

            
              Berowra
              0.0

            
              Windsor
              39.0

            
              Mulgrave
              48.0

            
              Museum
              0.0

            
              St James
              14.5

            
              Riverwood
              974.0

            
              Stanmore
              254.0

            
              Normanhurst
              150.0

            
              North Strathfield
              275.0

            
              Waverton
              1077.0

            
              Olympic Park
              0.0

            
              Revesby
              846.0

            
              Padstow
              911.0

            
              Panania
              779.0

            
              Parramatta
              0.0

            
              Westmead
              23.0

            
              Toongabbie
              80.0

            
              Pendle Hill
              63.0

            
              Thornleigh
              174.0

            
              Emu Plains
              0.0

            
              Petersham
              328.0

            
              Turramurra
              390.0

            
              Schofields
              63.0

            
              Sydenham
              2871.0

            
              Sefton
              198.0

            
              Vineyard
              55.0

            
              Riverstone
              60.0

            
              Rouse Hill
              97.0

            
              Seven Hills
              95.0

            
              St Leonards
              1121.0

            
              Werrington
              33.0

            
              Tempe
              1600.0

            
              Turrella
              1310.0

            
              Tallawong
              0.0

            
              Warrawee
              332.0

            
              Waitara
              210.0

            
              Wahroonga
              272.0

            
              Helensburgh
              0.0

            
              Wollstonecraft
              1099.0

            
              Wentworthville
              44.0

            
              Cronulla
              0.0

            
              Bankstown
              0.0

## baselines.py
import pandas as pd
from neo4j import GraphDatabase

NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"
NEO4J_DATABASE = "trains"

BASELINE_PROJECTION = "baselineProjection"
TEMP_PROJECTION = "tempProjection"

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database=NEO4J_DATABASE)


def get_betweenness_scores(session, projection_name):

    # Delete the baseline projection if it exists
    query = f"""
        CALL gds.graph.exists('{projection_name}') YIELD exists
        WHERE exists
        CALL gds.graph.drop('{projection_name}') YIELD graphName
        RETURN graphName;
    """
    session.run(query)

    # Create a new projection with all stops and their existing :NEXT_STOP relationships
    query = f"""
        MATCH (s1:Stop)-[r:NEXT_STOP]->(s2:Stop)
        WITH gds.graph.project('{projection_name}', s1, s2) AS g
        RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS relationships
    """
    session.run(query)

    # Calculate the betweenness centrality on the baseline projection
    query = f"""
        CALL gds.betweenness.stream('{projection_name}')
        YIELD nodeId, score
        RETURN gds.util.asNode(nodeId).name AS station, score
    """
    result = session.run(query)
    baseline = pd.Series({record['station']: record['score'] for record in result})

    # Delete the baseline projection as cleanup
    query = f"""
        CALL gds.graph.drop('{projection_name}') YIELD graphName
    """
    session.run(query)

    return baseline
		0
	Hurstville	1292.0
	Allawah	1350.0
	Banksia	1562.0
	Arncliffe	1610.0
	Chatswood	2718.0
	Artarmon	1143.0
	Croydon	624.0
	Ashfield	550.0
	Mount Colah	152.0
	Asquith	225.0
	Lidcombe	1005.0
	Auburn	481.0
	Rockdale	1512.0
	Martin Place	2342.0
	Barangaroo	2074.0
	Bexley North	1206.0
	Bardwell Park	1259.0
	Pennant Hills	196.0
	Beecroft	216.0
	Norwest	376.0
	Bella Vista	285.0
	Regents Park	245.0
	Berala	239.0
	Narwee	1035.0
	Beverly Hills	1094.0
	Kingsgrove	1151.0
	Yagoona	18.0
	Birrong	34.0
	Doonside	49.0
	Blacktown	108.0
	Marayong	63.0
	Edgecliff	83.0
	Bondi Junction	0.0
	Redfern	1497.0
	Burwood	1423.0
	Warwick Farm	253.0
	Cabramatta	264.0
	Macarthur	0.0
	Campbelltown	103.0
	Canley Vale	123.0
	Woolooware	87.0
	Caringbah	172.0
	Carlton	1406.0
	Carramar	182.0
	Cherrybrook	637.0
	Castle Hill	552.0
	Glenfield	784.0
	Casula	225.0
	Green Square	180.0
	Central	2643.5
	Town Hall	811.5
	Waterloo	1920.0
	Crows Nest	2030.0
	Roseville	650.0
	Cheltenham	234.0
	Epping	1555.0
	Leightonfield	194.0
	Chester Hill	197.0
	Wynyard	1011.0
	Circular Quay	95.5
	East Richmond	15.0
	Clarendon	28.0
	Clyde	356.0
	Jannali	972.0
	Como	1040.0
	Rhodes	445.0
	Concord West	361.0
	Victoria Cross	2052.0
	Eastwood	761.0
	Denistone	685.0
	International Airport	138.0
	Domestic Airport	154.0
	Rooty Hill	48.0
	Holsworthy	639.0
	East Hills	710.0
	Richmond	0.0
	Kings Cross	164.0
	Leppington	0.0
	Edmondson Park	99.0
	Heathcote	170.0
	Engadine	252.0
	Macquarie University	1359.0
	St Peters	48.0
	Erskineville	20.0
	Fairfield	105.0
	Flemington	150.0
	Gadigal	1998.0
	Macquarie Fields	495.0
	Pymble	446.0
	Gordon	500.0
	Granville	229.0
	Mascot	168.0
	Yennora	85.0
	Guildford	63.0
	Miranda	255.0
	Gymea	336.0
	Harris Park	113.0
	Merrylands	39.0
	Waterfall	86.0
	Hills Showground	465.0
	Homebush	32.0
	Hornsby	296.0
	Penshurst	1232.0
	Minto	303.0
	Ingleburn	400.0
	Wolli Creek	1761.0
	Sutherland	902.0
	Kellyville	192.0
	Killara	552.0
	Penrith	13.0
	Kingswood	24.0
	Kirrawee	415.0
	Kogarah	1460.0
	Villawood	189.0
	Leumeah	204.0
	Summer Hill	476.0
	Lewisham	402.0
	Strathfield	924.0
	Lindfield	602.0
	Liverpool	240.0
	Loftus	332.0
	Newtown	180.0
	Macdonaldtown	106.0
	North Ryde	1471.0
	Macquarie Park	1416.0
	Quakers Hill	64.0
	West Ryde	607.0
	Meadowbank	527.0
	North Sydney	1055.0
	Milsons Point	1033.0
	Oatley	1106.0
	Mortdale	1170.0
	Mount Kuring-gai	77.0
	St Marys	40.0
	Mount Druitt	45.0
	Berowra	0.0
	Windsor	39.0
	Mulgrave	48.0
	Museum	0.0
	St James	14.5
	Riverwood	974.0
	Stanmore	254.0
	Normanhurst	150.0
	North Strathfield	275.0
	Waverton	1077.0
	Olympic Park	0.0
	Revesby	846.0
	Padstow	911.0
	Panania	779.0
	Parramatta	0.0
	Westmead	23.0
	Toongabbie	80.0
	Pendle Hill	63.0
	Thornleigh	174.0
	Emu Plains	0.0
	Petersham	328.0
	Turramurra	390.0
	Schofields	63.0
	Sydenham	2871.0
	Sefton	198.0
	Vineyard	55.0
	Riverstone	60.0
	Rouse Hill	97.0
	Seven Hills	95.0
	St Leonards	1121.0
	Werrington	33.0
	Tempe	1600.0
	Turrella	1310.0
	Tallawong	0.0
	Warrawee	332.0
	Waitara	210.0
	Wahroonga	272.0
	Helensburgh	0.0
	Wollstonecraft	1099.0
	Wentworthville	44.0
	Cronulla	0.0
	Bankstown	0.0
	import pandas as pd
	from neo4j import GraphDatabase

	NEO4J_URI = "bolt://localhost:7687"
	NEO4J_USER = "neo4j"
	NEO4J_PASSWORD = "password"
	NEO4J_DATABASE = "trains"

	BASELINE_PROJECTION = "baselineProjection"
	TEMP_PROJECTION = "tempProjection"

	driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database=NEO4J_DATABASE)


	def get_betweenness_scores(session, projection_name):

	# Delete the baseline projection if it exists
	query = f"""
	CALL gds.graph.exists('{projection_name}') YIELD exists
	WHERE exists
	CALL gds.graph.drop('{projection_name}') YIELD graphName
	RETURN graphName;
	"""
	session.run(query)

	# Create a new projection with all stops and their existing :NEXT_STOP relationships
	query = f"""
	MATCH (s1:Stop)-[r:NEXT_STOP]->(s2:Stop)
	WITH gds.graph.project('{projection_name}', s1, s2) AS g
	RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS relationships
	"""
	session.run(query)

	# Calculate the betweenness centrality on the baseline projection
	query = f"""
	CALL gds.betweenness.stream('{projection_name}')
	YIELD nodeId, score
	RETURN gds.util.asNode(nodeId).name AS station, score
	"""
	result = session.run(query)
	baseline = pd.Series({record['station']: record['score'] for record in result})

	# Delete the baseline projection as cleanup
	query = f"""
	CALL gds.graph.drop('{projection_name}') YIELD graphName
	"""
	session.run(query)

	return baseline