Skip to content

@mneedham /gist:2983831
Created

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
neo4j query to find thoughtworks colleagues of colleagues
START n = node(*)
MATCH n-[r:colleagues*1..2]->c, n-[r2:member_of]->office
WHERE n.type? = 'person' and has(n.name) and (not(has(r2.end_date))) and office.name = 'London - UK South' and (not(has(c.thoughtquitter)))
RETURN n.name, count(distinct(c)) AS connections, office.name
ORDER BY connections DESC
@jexp

Hmm this is a graph global query, can you transform it to an index lookup for the office:
n.type='person' should be implied or?
You can try to see how many nodes this touches by leaving off the where clause and just returning count(c)
Ideally we are pulling the predicates into the match, I'm not 100% sure if that happens here.

START office=node:offices(name='London - UK South')
MATCH office<-[r2:member_of]-n-[r:colleagues*1..2]->c 
WHERE n.type? = 'person' and has(n.name) and (not(has(r2.end_date))) and (not(has(c.thoughtquitter)))
RETURN n.name, count(distinct(c)) AS connections, office.name 
ORDER BY connections DESC
@mneedham
Owner

That worked much faster - took 60 seconds to work out the result that time. Obviously the query is faster but I think my result yesterday might have been a bit influenced by my machine being generally crap! I ended up restarting it in the evening to get it at least be a little bit responsive.

@jexp

Oh that's sad, get yourself a new retina MBP or just rent a AWS high memory instance :)

@jexp

And I'd be interested in the numbers anyway. Shouldn't take that long after all. If you can share the dataset privately I'd love to do some profiling. Btw. are you using neography against neo4j server? Might be interesting do do a raw query in the server console and/or the streaming option for comparison.

@mneedham
Owner

While we're here, this query isn't working as I expect it to:

START office=node:offices(name='London - UK South')
MATCH office<-[r:member_of]-person, person-[r2?:member_of]->other
WHERE (NOT(HAS(r.end_date)))
RETURN person, COUNT(DISTINCT(other))

Basically people who have only worked in London have a count of 0 where I would expect that it should come back as 1? Am I doing something wrong or is it working incorrectly?

@mneedham
Owner

And yeh maybe I should just run it on an AWS instance and see how it works - just a bit wary of putting the data on there cause normally it's behind CAS...although you could easily argue that having it on my hard drive is probably equally risky!

@mneedham
Owner

Those numbers that I gave you were from running the query through the web console actually. I haven't tried the streaming option - will read about it and give it a try

@jexp

other is then null and null values are not counted by count()

@jexp

I meant the numbers of nodes touched :) ah, interesting, so it should be pretty fast, did you run the query once or multiple times (so you get rid of the parsing costs (and loading scala for the first query) and also have hotter caches.

@mneedham
Owner

Ah ok - in which case I change my question - why is other null? I would expect that it should come up with the London office?

@mneedham
Owner

I tried that query a couple more times back to back, 44.7 seconds, 44.1 seconds, 44.9 seconds

@jexp

Other is null becase, you have the london office already bound to the office node and there is an optional relationship which means if it cannot be satisified all the nodes dangling from that will be null as well (like an outer JOIN in SQL).

@mneedham
Owner

Ahhh ok. What I actually want to do is find out how many offices people who are currently working in the London office have worked at.

If I add a relationship called 'current_office' and then have the query:

START office=node:offices(name='London - UK South')
MATCH office<-[r:current_office]-person, person-[r2?:member_of]->other
WHERE (NOT(HAS(r.end_date)))
RETURN person, COUNT(DISTINCT(other))

Will that return a different result or will it be the same because the office & person nodes are still bound?

If that's the case is there any way you can think for me to do this query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.