Skip to content

Instantly share code, notes, and snippets.

@mneedham
Created June 24, 2012 16:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mneedham/2983831 to your computer and use it in GitHub Desktop.
Save mneedham/2983831 to your computer and use it in GitHub Desktop.
neo4j query to find thoughtworks colleagues of colleagues
START n = node(*)
MATCH n-[r:colleagues*1..2]->c, n-[r2:member_of]->office
WHERE n.type? = 'person' and has(n.name) and (not(has(r2.end_date))) and office.name = 'London - UK South' and (not(has(c.thoughtquitter)))
RETURN n.name, count(distinct(c)) AS connections, office.name
ORDER BY connections DESC
@jexp
Copy link

jexp commented Jun 24, 2012

Hmm this is a graph global query, can you transform it to an index lookup for the office:
n.type='person' should be implied or?
You can try to see how many nodes this touches by leaving off the where clause and just returning count(c)
Ideally we are pulling the predicates into the match, I'm not 100% sure if that happens here.

START office=node:offices(name='London - UK South')
MATCH office<-[r2:member_of]-n-[r:colleagues*1..2]->c 
WHERE n.type? = 'person' and has(n.name) and (not(has(r2.end_date))) and (not(has(c.thoughtquitter)))
RETURN n.name, count(distinct(c)) AS connections, office.name 
ORDER BY connections DESC

@mneedham
Copy link
Author

That worked much faster - took 60 seconds to work out the result that time. Obviously the query is faster but I think my result yesterday might have been a bit influenced by my machine being generally crap! I ended up restarting it in the evening to get it at least be a little bit responsive.

@jexp
Copy link

jexp commented Jun 25, 2012

Oh that's sad, get yourself a new retina MBP or just rent a AWS high memory instance :)

@jexp
Copy link

jexp commented Jun 25, 2012

And I'd be interested in the numbers anyway. Shouldn't take that long after all. If you can share the dataset privately I'd love to do some profiling. Btw. are you using neography against neo4j server? Might be interesting do do a raw query in the server console and/or the streaming option for comparison.

@mneedham
Copy link
Author

While we're here, this query isn't working as I expect it to:

START office=node:offices(name='London - UK South')
MATCH office<-[r:member_of]-person, person-[r2?:member_of]->other
WHERE (NOT(HAS(r.end_date)))
RETURN person, COUNT(DISTINCT(other))

Basically people who have only worked in London have a count of 0 where I would expect that it should come back as 1? Am I doing something wrong or is it working incorrectly?

@mneedham
Copy link
Author

And yeh maybe I should just run it on an AWS instance and see how it works - just a bit wary of putting the data on there cause normally it's behind CAS...although you could easily argue that having it on my hard drive is probably equally risky!

@mneedham
Copy link
Author

Those numbers that I gave you were from running the query through the web console actually. I haven't tried the streaming option - will read about it and give it a try

@jexp
Copy link

jexp commented Jun 25, 2012

other is then null and null values are not counted by count()

@jexp
Copy link

jexp commented Jun 25, 2012

I meant the numbers of nodes touched :) ah, interesting, so it should be pretty fast, did you run the query once or multiple times (so you get rid of the parsing costs (and loading scala for the first query) and also have hotter caches.

@mneedham
Copy link
Author

Ah ok - in which case I change my question - why is other null? I would expect that it should come up with the London office?

@mneedham
Copy link
Author

I tried that query a couple more times back to back, 44.7 seconds, 44.1 seconds, 44.9 seconds

@jexp
Copy link

jexp commented Jun 26, 2012

Other is null becase, you have the london office already bound to the office node and there is an optional relationship which means if it cannot be satisified all the nodes dangling from that will be null as well (like an outer JOIN in SQL).

@mneedham
Copy link
Author

Ahhh ok. What I actually want to do is find out how many offices people who are currently working in the London office have worked at.

If I add a relationship called 'current_office' and then have the query:

START office=node:offices(name='London - UK South')
MATCH office<-[r:current_office]-person, person-[r2?:member_of]->other
WHERE (NOT(HAS(r.end_date)))
RETURN person, COUNT(DISTINCT(other))

Will that return a different result or will it be the same because the office & person nodes are still bound?

If that's the case is there any way you can think for me to do this query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment