Skip to content

Instantly share code, notes, and snippets.

@joelash
Last active December 18, 2015 10:18
Show Gist options
  • Save joelash/5767172 to your computer and use it in GitHub Desktop.
Save joelash/5767172 to your computer and use it in GitHub Desktop.
notes from Neo4j tutorial on 6/12/2013

Neo4j in general

  • When store a join table structure in a graph database you just drop the join table and have a direct relationship
  • Usually start with drawing the graph on a whiteboard and then work from there
  • in general for speed want to avoid millions of relationships off of nodes, prefer lots of nodes over this. (But sometimes this is just necessary) (might be solved in 2.1)

Cypher

  • Commenting on something has (person) -commented-> (comment) -on-> (thing)
  • cypher is the sql like query language -> designed entirely for graphs with pattern matching for nodes and relationships
  • node is a circle and relationship is an arrow, so query is like ascii art for this (a) --> (b)
START a=node(*)
MATCH (a) --> (b)
RETURN a,b;
  • a is bound to all nodes and b is unbound until you scan through something and then it becomes bound to a matched node. * before match a has meaning but b does not

  • parentheses are optional except if leaving the node unnamed

  • can specify a relationship in brackets (a) -[r]-> ()

START a=node(*)
MATCH (a) -[r]-> ()
RETURN a.name, type(r);
  • above will return the type of the relationship r
  • can specify a type of the relationship with (a) -[:ACTED_IN]-> (m)
START a=node(*)
MATCH (a) -[:ACTED_IN]-> (m)
RETURN a.name, m.title;
  • above: return actors and movies that they ACTED_IN
  • can add an identifier to the relationship by putting it before the :
START a=node(*)
MATCH (a) -[r:ACTED_IN]-> (m)
RETURN a.name, r.roles, m.title;

More than 2 node

START a=node(*)
MATCH (a) -->(b)-->(c)
RETURN a,b,c;
  • can have paths coming into a node with (a)-->(b)<--(c)
  • can obviously be more specific
START a=node(*)
MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, m.title, d.name;
  • can alias the columns (AS is required)
START a=node(*)
MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name AS actor, m.title AS movie, d.name AS director;
  • can separate the paths out, it becomes useful for more complex patterns
START a=node(*)
MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN a.name AS actor, m.title AS movie, d.name AS director;
  • No execution difference between this and the other way, just broken up for complex queries

Paths in matches

START a=node(*)
MATCH p=(a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN p;
  • gives you the path as a variable where you can do things to this light length(p) or rel(p) or nodes(p)
  • can have multiple paths in your match

Aggregation

  • some actors are in multiples movies that were directed by same movies
START a=node(*)
MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN a.name, d.name, count(*);
  • automatically aggreates based off of the other fields that are being returned (good and bad)
  • can also cound on identifier count(m), can also have distinct in aggregation ie: count(distinct d)

LAB: which directors also acted in their movies

START a=node(*)
MATCH a-[:ACTED_IN]->(m)<-[:DIRECTED]-(a)
RETURN a.name, m.title;

or maybe, but I think they're the same

START d=node(*)
MATCH d-[:DIRECTED]->(m)<-[:ACTED_IN]-(d)
RETURN d.name, m.title;
  • others: count(x), min(x), max(x), collect(x)

Sort and Limit

START a=node(*)
MATCH a-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, d.name, coutn(*) AS count
ORDER BY count DESC
LIMIT 5;

Starting somewhere

  • all nodes is a=node(*) which is not ideal with large data

  • we want a specific node, if just use WHERE still not super efficient

  • so we use index START tom=node:node_auto_index(name="Tom Hanks")

  • trick to get auto-indexer to pick up (not necessary in 2.0) if create index after start n=node(*) where has(n.name) set n.name=n.name;

  • start with multiple nodes

  • precursor to 6 degrees of Kevin Bacon

START tom=node:node_auto_index(name="Tom Hanks"),
kevin=node:node_auto_index(name="Kevin Bacon")
MATCH tom-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(kevin)
RETURN DISTINCT movie.title;

constraints based on patters

  • can do path check in WHERE: WHERE (n)-[:DIRECTED]->()
START a=node:node_auto_index(name="Gene Hackman")
MATCH (a)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(n)
WHERE (n)-[:DIRECTED]->()
RETURN n.name;
  • above: people who have acted with Gene Hackman and have also directed a movie (any)

  • usually doing something liek there can only be worse in WHERE

  • TIP: add profile before running a query to see what it is doing (try to keep _db_hits low, when comparing 2)

  • can have a NOT in a WHERE

START a=node:node_auto_index(name="Keanu Reeves"),
hugo=node:node_auto_index(name="Hugo Weaving")
MATCH (a)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(n)
WHERE NOT((hugo)-[:ACTED_IN]->(m))
RETURN n.name, m.title;

LAB: who are the 5 busiest actors

START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) AS count ORDER BY count DESC LIMIT 5;

LAB: recommend 3 actors that Keanu Reeces should work with, but has not.

START keanu=node:node_auto_index(name="Keanu Reeves"), actor=node(*)
MATCH (actor)-[:ACTED_IN]->(m)
WHERE NOT((m)<-[:ACTED_IN]-(keanu))
RETURN DISTINCT actor.name LIMIT 3;
START keanu=node:node_auto_index(name="Keanu Reeves"),
actor=node(*)
MATCH (actor)-[:ACTED_IN]->(m)
WHERE NOT((m)<-[:ACTED_IN]-(keanu))
RETURN DISTINCT actor.name, count(m) AS num_movies
ORDER BY num_movies DESC
LIMIT 3;
  • solution: actors that have acted with Keanu and actors they've acted with
START keanu=node:node_auto_index(name="Keanu Reeves"),
actor=node(*)
MATCH (keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(c),
(c)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc)
WHERE NOT((keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc))
AND coc <> keanu
RETURN coc.name, count(coc)
ORDER BY count(coc) DESC
LIMIT 3;

Updating with Cypher

  • create a node CREATE ({title: "Mystic River", release: 1993});
  • update a node START m=node:node_auto_index(title="Mystic River") SET m.tagline = "We bury our sins here, Dave." RETURN m;
  • create unique relationships between two nodes
START movie=node:node_auto_index(title="Mystic River"),
kevin=node:node_auto_index(name="Kevin Bacon")
CREATE UNIQUE (kevin)-[r:ACTED_IN {roles:["Sean"]}]->(movie)
RETURN kevin;

LAB: Change Kevin Bacon's roles in Mystic River from "Sean" to "Sean Devine"

  • easy way, just re-setting the collection
START movie=node:node_auto_index(title="Mystic River"),
kevin=node:node_auto_index(name="Kevin Bacon")
MATCH (kevin)-[r:ACTED_IN]->(movie)
SET r.roles = ["Sean Devine"]
RETURN r;
  • what about without removing bad ones
START movie=node:node_auto_index(title="Mystic River"),
kevin=node:node_auto_index(name="Kevin Bacon")
MATCH (kevin)-[r:ACTED_IN]->(movie)
SET r.roles = filter(n in r.roles : n <> "Sean") + "Sean Devine"
RETURN r;
  • filter will return roles that match the predicated

LAB: Add Clint Eastwood as the Director of Mystic River

START movie=node:node_auto_index(title="Mystic River"),
clint=node:node_auto_index(name="Clint Eastwood")
CREATE UNIQUE (clint)-[r:DIRECTED]->(movie)
RETURN r;

LAB: List all the characters in the movie "The Matrix"

START movie=node:node_auto_index(title="The Matrix")
MATCH (movie)<-[r:ACTED_IN]-()
RETURN r.roles;
  • remove a node
# won't work because has relationships
START emil=node:node_auto_index(name="Emil Eifrem")
DELETE emil;

SO

START emil=node:node_auto_index(name="Emil Eifrem")
MATCH (emil)-[r?]->()
DELETE r, emil;
  • [r?] will get optional relationships, meaning he doesn't have to have any

Lab: Add KNOWS relationships between all actors who were in the same movie

START a=node(*)
MATCH (a)-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
CREATE UNIQUE (a)-[:KNOWS]->(b);
  • adds 2 a->b and b->a, if leave arrow off it has no direction, but cannot MATCH with an arrow

  • add directors into the KNOWS relations: START a=node(*) MATCH (a)-[:ACTED_IN|DIRECTED]->()<-[:ACTED_IN|DIRECTED]-(b) CREATE UNIQUE (a)-[:KNOWS]->(b);

Variable Length paths

  • query for veriable length (a)-[*1..3]->(b)
  • Friends of friends
START keanu=node:node_auto_index(name="Keanu Reeves")
MATCH (keanu)-[:KNOWS*2]->(fof)
RETURN DISTINCT fof.name;
  • exactly 2 is *2, * is unlimited

Lab: Return friends-of-firends who are not immediate friends

START a=node:node_auto_index(name="Keanu Reeves")
MATCH (a)-[:KNOWS*2]->(b)
WHERE NOT((a)-[:KNOWS]-(b))
AND a<>b
RETURN DISTINCT b.name;
  • bacon number for charlize
START bacon=node:node_auto_index(name="Kevin Bacon"),
charlize=node:node_auto_index(name="Charlize Theron")
MATCH p=shortestPath((charlize)-[:KNOWS*]->(bacon))
RETURN length(p);

Lab: return the names of people joinin Charlize to kevin

  • my soltuion
START bacon=node:node_auto_index(name="Kevin Bacon"),
other=node:node_auto_index(name="Charlize Theron")
MATCH p=shortestPath((other)-[:KNOWS*]->(bacon))
RETURN extract(n in nodes(p) : n.name);

actual solution, actually not real just the solution with movie titles too:

START bacon=node:node_auto_index(name="Kevin Bacon"),
other=node:node_auto_index(name="Charlize Theron")
MATCH p=shortestPath((other)-[:ACTED_IN|DIRECTED*]-(bacon))
RETURN extract(n in nodes(p) : coalesce(n.title?, n.name?));
  • leaving off > in ACTED_IN|DIRECTED allows the query to go either direction over that relationship
  • in WHERE: name? defaults to true when doesn't exist, name! defaults to false
  • in RETURN it doesn't matter
  • can also use has(n.name) which is like n.name!

More about Neo4j

APIs

  1. Cypher
  2. REST
  3. Plugin API for special cases

Licensing options

  • community
  • advanced
  • partnership

get involved

  • stack overflow
  • google group
  • GitHub issues
  • meetups

2.0 overview

labels

  • can have more than one per node
  • similar to a tag concept
  • can index on labels, so don't need to specify automatic indexes
  • labels must be strings

"Schema" indexes

  • can create on the fly
  • queries take advantage of these automatically

Cypher syntax

query

  • old syntax
START a = node:node_auto_index("name:*")
MATCH a-[:ACTED_IN]->m
RETURN a.name, m.title;
  • new
MATCH a:Actor-[:ACTED_IN]->m:Movie
RETURN a.name, m.title;
  • Actor and Movie are labels on the nodes

index syntaz

  • old
STATT a=node:node_auto_index(name="Andres")
RETURN a;
  • new
MATCH a:Actor
WHERE a.name = 'Andres'
RETURn a;
  • can still use "legacy" indexes if you want, but cannot mix with labels
MATCH a:Actor-[:ACTED_IN]->m:Movie<-[:DIRECTED]-d:Director
WHERE a.name = "" and d.name = ""
USING INDEX d:Director(name)
RETURN a,m,d;
  • USING INDEX is a hint telling it where to start because we know it's quicker/smaller index
  • creating an index CREATE INDEX on :Actor(name);
  • creates in the background

Adding a label

MATCH a-[r:ACTED_IN]->m
SET a:Actor, m:Movie;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment