- When store a join table structure in a graph database you just drop the join table and have a direct relationship
- Usually start with drawing the graph on a whiteboard and then work from there
- in general for speed want to avoid millions of relationships off of nodes, prefer lots of nodes over this. (But sometimes this is just necessary) (might be solved in 2.1)
- Commenting on something has
(person) -commented-> (comment) -on-> (thing)
- cypher is the sql like query language -> designed entirely for graphs with pattern matching for nodes and relationships
- node is a circle and relationship is an arrow, so query is like ascii art for this
(a) --> (b)
START a=node(*)
MATCH (a) --> (b)
RETURN a,b;
-
a
is bound to all nodes andb
is unbound until you scan through something and then it becomes bound to a matched node. * before matcha
has meaning butb
does not -
parentheses are optional except if leaving the node unnamed
-
can specify a relationship in brackets
(a) -[r]-> ()
START a=node(*)
MATCH (a) -[r]-> ()
RETURN a.name, type(r);
- above will return the
type
of the relationshipr
- can specify a type of the relationship with
(a) -[:ACTED_IN]-> (m)
START a=node(*)
MATCH (a) -[:ACTED_IN]-> (m)
RETURN a.name, m.title;
- above: return actors and movies that they
ACTED_IN
- can add an identifier to the relationship by putting it before the
:
START a=node(*)
MATCH (a) -[r:ACTED_IN]-> (m)
RETURN a.name, r.roles, m.title;
START a=node(*)
MATCH (a) -->(b)-->(c)
RETURN a,b,c;
- can have paths coming into a node with
(a)-->(b)<--(c)
- can obviously be more specific
START a=node(*)
MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, m.title, d.name;
- can alias the columns (
AS
is required)
START a=node(*)
MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name AS actor, m.title AS movie, d.name AS director;
- can separate the paths out, it becomes useful for more complex patterns
START a=node(*)
MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN a.name AS actor, m.title AS movie, d.name AS director;
- No execution difference between this and the other way, just broken up for complex queries
START a=node(*)
MATCH p=(a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN p;
- gives you the path as a variable where you can do things to this light
length(p)
orrel(p)
ornodes(p)
- can have multiple paths in your match
- some actors are in multiples movies that were directed by same movies
START a=node(*)
MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN a.name, d.name, count(*);
- automatically aggreates based off of the other fields that are being returned (good and bad)
- can also cound on identifier
count(m)
, can also havedistinct
in aggregation ie:count(distinct d)
START a=node(*)
MATCH a-[:ACTED_IN]->(m)<-[:DIRECTED]-(a)
RETURN a.name, m.title;
or maybe, but I think they're the same
START d=node(*)
MATCH d-[:DIRECTED]->(m)<-[:ACTED_IN]-(d)
RETURN d.name, m.title;
- others:
count(x)
,min(x)
,max(x)
,collect(x)
START a=node(*)
MATCH a-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, d.name, coutn(*) AS count
ORDER BY count DESC
LIMIT 5;
-
all nodes is
a=node(*)
which is not ideal with large data -
we want a specific node, if just use
WHERE
still not super efficient -
so we use
index
START tom=node:node_auto_index(name="Tom Hanks")
-
trick to get auto-indexer to pick up (not necessary in 2.0) if create index after
start n=node(*) where has(n.name) set n.name=n.name;
-
start with multiple nodes
-
precursor to 6 degrees of Kevin Bacon
START tom=node:node_auto_index(name="Tom Hanks"),
kevin=node:node_auto_index(name="Kevin Bacon")
MATCH tom-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(kevin)
RETURN DISTINCT movie.title;
- can do path check in
WHERE
:WHERE (n)-[:DIRECTED]->()
START a=node:node_auto_index(name="Gene Hackman")
MATCH (a)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(n)
WHERE (n)-[:DIRECTED]->()
RETURN n.name;
-
above: people who have acted with Gene Hackman and have also directed a movie (any)
-
usually doing something liek there can only be worse in
WHERE
-
TIP: add
profile
before running a query to see what it is doing (try to keep_db_hits
low, when comparing 2) -
can have a
NOT
in aWHERE
START a=node:node_auto_index(name="Keanu Reeves"),
hugo=node:node_auto_index(name="Hugo Weaving")
MATCH (a)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(n)
WHERE NOT((hugo)-[:ACTED_IN]->(m))
RETURN n.name, m.title;
START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) AS count ORDER BY count DESC LIMIT 5;
START keanu=node:node_auto_index(name="Keanu Reeves"), actor=node(*)
MATCH (actor)-[:ACTED_IN]->(m)
WHERE NOT((m)<-[:ACTED_IN]-(keanu))
RETURN DISTINCT actor.name LIMIT 3;
START keanu=node:node_auto_index(name="Keanu Reeves"),
actor=node(*)
MATCH (actor)-[:ACTED_IN]->(m)
WHERE NOT((m)<-[:ACTED_IN]-(keanu))
RETURN DISTINCT actor.name, count(m) AS num_movies
ORDER BY num_movies DESC
LIMIT 3;
- solution: actors that have acted with Keanu and actors they've acted with
START keanu=node:node_auto_index(name="Keanu Reeves"),
actor=node(*)
MATCH (keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(c),
(c)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc)
WHERE NOT((keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc))
AND coc <> keanu
RETURN coc.name, count(coc)
ORDER BY count(coc) DESC
LIMIT 3;