Created

Embed URL

HTTPS clone URL

SSH clone URL

You can clone with HTTPS or SSH.

Download Gist

neo4j query to find thoughtworks colleagues of colleagues

View gist:2983831
1 2 3 4 5
START n = node(*)
MATCH n-[r:colleagues*1..2]->c, n-[r2:member_of]->office
WHERE n.type? = 'person' and has(n.name) and (not(has(r2.end_date))) and office.name = 'London - UK South' and (not(has(c.thoughtquitter)))
RETURN n.name, count(distinct(c)) AS connections, office.name
ORDER BY connections DESC
jexp commented

Hmm this is a graph global query, can you transform it to an index lookup for the office:
n.type='person' should be implied or?
You can try to see how many nodes this touches by leaving off the where clause and just returning count(c)
Ideally we are pulling the predicates into the match, I'm not 100% sure if that happens here.

START office=node:offices(name='London - UK South')
MATCH office<-[r2:member_of]-n-[r:colleagues*1..2]->c 
WHERE n.type? = 'person' and has(n.name) and (not(has(r2.end_date))) and (not(has(c.thoughtquitter)))
RETURN n.name, count(distinct(c)) AS connections, office.name 
ORDER BY connections DESC
Owner

That worked much faster - took 60 seconds to work out the result that time. Obviously the query is faster but I think my result yesterday might have been a bit influenced by my machine being generally crap! I ended up restarting it in the evening to get it at least be a little bit responsive.

jexp commented

Oh that's sad, get yourself a new retina MBP or just rent a AWS high memory instance :)

jexp commented

And I'd be interested in the numbers anyway. Shouldn't take that long after all. If you can share the dataset privately I'd love to do some profiling. Btw. are you using neography against neo4j server? Might be interesting do do a raw query in the server console and/or the streaming option for comparison.

Owner

While we're here, this query isn't working as I expect it to:

START office=node:offices(name='London - UK South')
MATCH office<-[r:member_of]-person, person-[r2?:member_of]->other
WHERE (NOT(HAS(r.end_date)))
RETURN person, COUNT(DISTINCT(other))

Basically people who have only worked in London have a count of 0 where I would expect that it should come back as 1? Am I doing something wrong or is it working incorrectly?

Owner

And yeh maybe I should just run it on an AWS instance and see how it works - just a bit wary of putting the data on there cause normally it's behind CAS...although you could easily argue that having it on my hard drive is probably equally risky!

Owner

Those numbers that I gave you were from running the query through the web console actually. I haven't tried the streaming option - will read about it and give it a try

jexp commented

other is then null and null values are not counted by count()

jexp commented

I meant the numbers of nodes touched :) ah, interesting, so it should be pretty fast, did you run the query once or multiple times (so you get rid of the parsing costs (and loading scala for the first query) and also have hotter caches.

Owner

Ah ok - in which case I change my question - why is other null? I would expect that it should come up with the London office?

Owner

I tried that query a couple more times back to back, 44.7 seconds, 44.1 seconds, 44.9 seconds

jexp commented

Other is null becase, you have the london office already bound to the office node and there is an optional relationship which means if it cannot be satisified all the nodes dangling from that will be null as well (like an outer JOIN in SQL).

Owner

Ahhh ok. What I actually want to do is find out how many offices people who are currently working in the London office have worked at.

If I add a relationship called 'current_office' and then have the query:

START office=node:offices(name='London - UK South')
MATCH office<-[r:current_office]-person, person-[r2?:member_of]->other
WHERE (NOT(HAS(r.end_date)))
RETURN person, COUNT(DISTINCT(other))

Will that return a different result or will it be the same because the office & person nodes are still bound?

If that's the case is there any way you can think for me to do this query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.