This is how you might model Premier League managers tenures at different clubs in Neo4j:
The date modeling is based on an approach described in more detail in Return partly shared path ranges.
Let’s use a dummy data set with some made up appointments and dismissals:
CREATE (year2013:Year { year: 2013 })
CREATE (january2013:Month { name: "January" })
CREATE (january012013:Day { name: "1st" })
CREATE (january022013:Day { name: "2nd" })
CREATE (january032013:Day { name: "3rd" })
CREATE (january042013:Day { name: "4th" })
CREATE (january052013:Day { name: "5th" })
CREATE (year2013)-[:`January`]->(january2013)
CREATE (january2013)-[:`01`]->(january012013)
CREATE (january2013)-[:`02`]->(january022013)
CREATE (january2013)-[:`03`]->(january032013)
CREATE (january2013)-[:`04`]->(january042013)
CREATE (january2013)-[:`05`]->(january052013)
CREATE (january012013)-[:NEXT]->(january022013)
CREATE (january022013)-[:NEXT]->(january032013)
CREATE (january032013)-[:NEXT]->(january042013)
CREATE (january042013)-[:NEXT]->(january052013)
CREATE (chelsea:Team { name: "Chelsea" })
CREATE (joseMourinho { name: "Jose Mourinho"})
CREATE (mourinhoChelsea { name: "Mourinho tenure at Chelsea" })
CREATE (manUtd:Team { name: "Manchester United" })
CREATE (davidMoyes { name: "David Moyes"})
CREATE (davidMoyesUnited { name: "Moyes tenure at Manchester United" })
CREATE (mourinhoChelsea)-[:HIRED_ON {date: "January 1st 2013"}]->(january012013)
CREATE (mourinhoChelsea)-[:MANAGER]->(joseMourinho)
CREATE (mourinhoChelsea)-[:TEAM]->(chelsea)
CREATE (mourinhoChelsea)-[:FIRED_ON]->(january032013)
CREATE (davidMoyesUnited)-[:HIRED_ON {date: "January 2nd 2013"}]->(january022013)
CREATE (davidMoyesUnited)-[:MANAGER]->(davidMoyes)
CREATE (davidMoyesUnited)-[:TEAM]->(manUtd)
This is how our data set looks:
Now that we have some data in place, let’s write a query to show us who was the manager for a team on a specific date.
MATCH (team:Team),
(year:Year),
(year)-[:`January`]->()-[:`05`]->(date),
(date)<-[:NEXT*0..]-()<-[hire:HIRED_ON]-(tenure)-[:MANAGER]->(manager),
(tenure)-[:TEAM]->(team)
WHERE year.year = 2013
WITH team, manager, hire, tenure
MATCH (tenure)-[fired?:FIRED_ON]->(dateFired)
RETURN team.name, manager.name, hire.date, dateFired
The query starts from January 5th, then gets all the previous dates and looks for a HIRED_ON
relationship and then follows that to get the manager and the team for which it applies to.
We then traverse an optional FIRED_ON
relationship as well because we don’t want to say a manager is currently at a club if they’ve been fired.
It returns the following:
In this data set Jose Mourinho gets fired on the 3rd January so Chelsea shouldn’t have a manager on the 5th January.
One way to exclude him is to collect all the dates that our NEXT
relationship takes us to and then check if the dateFired
is in that collection. If it is then the manager has been fired and we shouldn’t return them:
MATCH (year:Year),
(year)-[:`January`]->()-[:`05`]->(startDate),
(startDate)<-[:NEXT*0..]-(day)
WHERE year.year = 2013
WITH startDate, day, collect(day) AS dates
MATCH (team:Team),
(day)<-[hire:HIRED_ON]-(tenure)-[:MANAGER]->(manager),
(tenure)-[:TEAM]->(team)
WITH team, manager, hire, tenure, dates
MATCH (tenure)-[?:FIRED_ON]-(dateFired)
WHERE (dateFired) IS NULL OR NOT (dateFired IN dates)
RETURN team.name, manager.name, hire.date, dateFired, dates
That returns the following:
Unfortunately we now don’t get a row for Chelsea because the WHERE
clause filters Mourinho out.
Using Neo4j 2.0 and the CASE
statement makes this very easy.
The solution is the following query which does the job:
MATCH (year:Year),
(year)-[:`January`]->()-[:`02`]->(startDate),
(startDate)<-[:NEXT*0..]-(day)
WHERE year.year = 2013
WITH startDate, collect(day) AS dates
MATCH (team:Team),
(startDate)<-[:NEXT*0..]-(day)<-[hire:HIRED_ON]-(tenure)-[:MANAGER]->(manager),
(tenure)-[:TEAM]->(team)
WITH team, manager, hire, tenure, dates
MATCH (tenure)-[?:FIRED_ON]-(dateFired)
RETURN DISTINCT team.name,
CASE WHEN (dateFired) IS NULL THEN manager.name
WHEN (dateFired) IN dates THEN NULL
ELSE manager.name END AS managerName,
CASE WHEN (dateFired) IS NULL THEN hire.date
WHEN (dateFired) IN dates THEN NULL
ELSE hire.date END AS hireDate
Here we’ve introduced a CASE
clause which works pretty similarly to how the SQL CASE
clause works so it should be somehow familiar. That query returns the following:
This is exactly what we want.