Skip to content

Instantly share code, notes, and snippets.

@rvanbruggen
Last active October 12, 2023 06:49
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rvanbruggen/5c27e0022801a01638a1a856746f5a3a to your computer and use it in GitHub Desktop.
Save rvanbruggen/5c27e0022801a01638a1a856746f5a3a to your computer and use it in GitHub Desktop.
Analysing a google calendar in Neo4j

How to analyse my crazy 2020 calendar in Neo4j

Alternatively: here’s a sample file to play with!

Local file possible as well for your personal calendar of course!

Creating the events from the CSV file

Importing the basic events into Neo4j.

load csv with headers from "file:/calendarfull.csv" as csv
// load csv with headers from "https://docs.google.com/spreadsheets/d/13nvxbfsQKet3bELgf04_C6SkHz-ABoAGAgnMw-YO3oc/export?format=csv&id=13nvxbfsQKet3bELgf04_C6SkHz-ABoAGAgnMw-YO3oc&gid=0" as csv
create (e:Event)
set e = csv
set e.starttime = datetime(e.starttime)
set e.endtime = datetime(e.endtime)
set e.createdtime = datetime(e.createdtime)
set e.lastmodifiedtime = datetime(e.lastmodifiedtime);

Adding indexes

Indexes to speed up later operations:

create index on :Event(starttime);
create index on :Event(endtime);
create index on :Event(createdtime);
create index on :Event(lastmodifiedtime);
create index on :Event(summary);
create index on :Event(description);
create index on :Company(name);
create constraint on (p:Person)
assert p.email is unique;
create index on :ConferenceProvider(name);

Some data wrangling

Just keep the recent events, 2020 only:

MATCH (e:Event)
WHERE e.starttime.year < 2020 OR e.calendartype = "personal"
DELETE e;

Events that don’t have an organiser, will receive me as organiser:

match (e:Event)
where e.organizer is null
set e.organizer = "mailto:rik@neo4j.com";

Clean up the email addresses in the events:

match (e:Event)
set e.organizer = replace(e.organizer,"mailto:","")
set e.attendee = replace(e.attendee,"mailto:","")
set e.organizer = replace(e.organizer,"@neotechnology.com", "@neo4j.com")
set e.organizer = replace(e.organizer,"@neo4j.org", "@neo4j.com")
set e.attendee = replace(e.attendee,"@neotechnology.com", "@neo4j.com")
set e.attendee = replace(e.attendee,"@neo4j.org", "@neo4j.com");

Start graphifying the dataset

Extracting the relationships from the Event data.

Create the people and link them to the events

match (e:Event)
merge (p:Person {email: e.organizer})
create (p)-[:ORGANIZES]->(e);

Remove the attendees from the events and connect them

match (e:Event)
with e, split(e.attendee,";") as CollectionOfAttendees
unwind CollectionOfAttendees as IndividualAttendee
merge (p:Person {email: IndividualAttendee})
merge (p)-[:ATTENDS]->(e);

Extract the companies from the Person nodes

match (p:Person)
WITH distinct split(p.email,"@")[1] as domain, p
WHERE not domain in ['email.com','163.com','free.fr','gmail.com','group.calendar.google.com','resource.calendar.google.com','qq.com','hotmail.com','mail.com','outlook.com','skynet.be','vanbruggen.be']
merge (c:Company {name: toUpper(domain)})
merge (p)-[:WORKS_FOR]->(c);

Rename the companies, by removing the TLDN:

match (c:Company)
set c.name = replace(c.name,".COM","")
set c.name = replace(c.name,".ORG","")
set c.name = replace(c.name,".BE","")
set c.name = replace(c.name,".NL","")
set c.name = replace(c.name,".FR","")
set c.name = replace(c.name,".CO.UK","")
set c.name = replace(c.name,".GOV.UK","")
set c.name = replace(c.name,".CO","")
set c.name = replace(c.name,".DK","")
set c.name = replace(c.name,".NO","")
set c.name = replace(c.name,".IO","");

Because of previous step, we may have multiple accounts with same name (from different TLDNs). So we need to merge companies with same name ("DELOITTE" as example) with APOC procedure.

match (c1:Company), (c2:Company)
where id(c1) < id(c2)
and c1.name = c2.name
with collect([c1,c2]) as companies
unwind companies as company
call apoc.refactor.mergeNodes(company,{properties:"discard",mergeRels:true}) yield node
return node;

Further graphification: conferencing providers

Take a look at the conference providers mentioned in the events:

match (e:Event)
where e.description contains "meet.google.com"
merge (cp:ConferenceProvider {name:"Google Meet"})
merge (e)-[:USES_CP]->(cp);

match (e:Event)
where e.description contains "neo4j.zoom.us"
merge (cp:ConferenceProvider {name:"Zoom"})
merge (e)-[:USES_CP]->(cp);

match (e:Event)
where e.description contains "teams.microsoft.com"
merge (cp:ConferenceProvider {name:"Microsoft Teams"})
merge (e)-[:USES_CP]->(cp);

match (e:Event)
where toUpper(e.description) contains "GOTOMEETING"
merge (cp:ConferenceProvider {name:"LogMeIn GoToMeeting"})
merge (e)-[:USES_CP]->(cp);

match (e:Event)
where toUpper(e.description) contains "WEBEX"
merge (cp:ConferenceProvider {name:"Cisco WebEx"})
merge (e)-[:USES_CP]->(cp);

Some interesting queries ensue

Meetings per conference provider:

match (e:Event)--(cp:ConferenceProvider)
return cp.name, count(cp);

Meetings without conference providers:

match (e:Event)
where not ((e)--(:ConferenceProvider))
return count(e);

Nr of Confcalls per month:

match (e:Event)--(cp:ConferenceProvider)
return distinct e.starttime.month as month, count(cp)
order by month asc;

Or per week:

match (e:Event)--(cp:ConferenceProvider)
return distinct e.starttime.week as week, count(cp)
order by week asc;

Days with most meetings:

match (e:Event) return distinct date(e.starttime), count(e) order by
count(e) desc limit 10;

Inferring meetings between meeting attendees

Using apoc.iterate as there could be a lot of links:

call apoc.periodic.iterate(
    "match (p1:Person)-[:ATTENDS]->(e:Event)<-[:ATTENDS]-(p2:Person)
        where id(p1)<id(p2)
        return p1,p2,e",
    "merge (p1)-[mwr:MEETS_WITH]->(p2)
        on create set mwr.frequency = 1
        on match set mwr.frequency = mwr.frequency + 1",
        {batchsize:1000, parallel:false});

Then we can easily find the most frequent meeters:

match (p1:Person)-[mw:MEETS_WITH]->(p2:Person) return p1.email,
mw.frequency, p2.email order by mw.frequency desc limit 10;

Please do reach out if you want to discuss more:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment