joelash/gist:5767172

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Neo4j in general


When store a join table structure in a graph database you just drop the join table and have a direct relationship
Usually start with drawing the graph on a whiteboard and then work from there
in general for speed want to avoid millions of relationships off of nodes, prefer lots of nodes over this. (But sometimes this is just necessary) (might be solved in 2.1)

Cypher


Commenting on something has (person) -commented-> (comment) -on-> (thing)
cypher is the sql like query language -> designed entirely for graphs with pattern matching for nodes and relationships
node is a circle and relationship is an arrow, so query is like ascii art for this (a) --> (b)

START a=node(*)
MATCH (a) --> (b)
RETURN a,b;


a is bound to all nodes and b is unbound until you scan through something and then it becomes bound to a matched node. * before match a has meaning but b does not


parentheses are optional except if leaving the node unnamed


can specify a relationship in brackets (a) -[r]-> ()


START a=node(*)
MATCH (a) -[r]-> ()
RETURN a.name, type(r);


above will return the type of the relationship r
can specify a type of the relationship with (a) -[:ACTED_IN]-> (m)

START a=node(*)
MATCH (a) -[:ACTED_IN]-> (m)
RETURN a.name, m.title;


above: return actors and movies that they ACTED_IN
can add an identifier to the relationship by putting it before the :

START a=node(*)
MATCH (a) -[r:ACTED_IN]-> (m)
RETURN a.name, r.roles, m.title;

More than 2 node

START a=node(*)
MATCH (a) -->(b)-->(c)
RETURN a,b,c;


can have paths coming into a node with (a)-->(b)<--(c)
can obviously be more specific

START a=node(*)
MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, m.title, d.name;


can alias the columns (AS is required)

START a=node(*)
MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name AS actor, m.title AS movie, d.name AS director;


can separate the paths out, it becomes useful for more complex patterns

START a=node(*)
MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN a.name AS actor, m.title AS movie, d.name AS director;


No execution difference between this and the other way, just broken up for complex queries

Paths in matches

START a=node(*)
MATCH p=(a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN p;


gives you the path as a variable where you can do things to this light length(p) or rel(p) or nodes(p)
can have multiple paths in your match

Aggregation


some actors are in multiples movies that were directed by same movies

START a=node(*)
MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN a.name, d.name, count(*);


automatically aggreates based off of the other fields that are being returned (good and bad)
can also cound on identifier count(m), can also have distinct in aggregation ie: count(distinct d)

LAB: which directors also acted in their movies

START a=node(*)
MATCH a-[:ACTED_IN]->(m)<-[:DIRECTED]-(a)
RETURN a.name, m.title;

or maybe, but I think they're the same
START d=node(*)
MATCH d-[:DIRECTED]->(m)<-[:ACTED_IN]-(d)
RETURN d.name, m.title;


others: count(x), min(x), max(x), collect(x)

Sort and Limit

START a=node(*)
MATCH a-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, d.name, coutn(*) AS count
ORDER BY count DESC
LIMIT 5;

Starting somewhere


all nodes is a=node(*) which is not ideal with large data


we want a specific node, if just use WHERE still not super efficient


so we use index START tom=node:node_auto_index(name="Tom Hanks")


trick to get auto-indexer to pick up (not necessary in 2.0) if create index after start n=node(*) where has(n.name) set n.name=n.name;


start with multiple nodes


precursor to 6 degrees of Kevin Bacon


START tom=node:node_auto_index(name="Tom Hanks"),
kevin=node:node_auto_index(name="Kevin Bacon")
MATCH tom-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(kevin)
RETURN DISTINCT movie.title;

constraints based on patters


can do path check in WHERE: WHERE (n)-[:DIRECTED]->()

START a=node:node_auto_index(name="Gene Hackman")
MATCH (a)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(n)
WHERE (n)-[:DIRECTED]->()
RETURN n.name;


above: people who have acted with Gene Hackman and have also directed a movie (any)


usually doing something liek there can only be worse in WHERE


TIP: add profile before running a query to see what it is doing (try to keep _db_hits low, when comparing 2)


can have a NOT in a WHERE


START a=node:node_auto_index(name="Keanu Reeves"),
hugo=node:node_auto_index(name="Hugo Weaving")
MATCH (a)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(n)
WHERE NOT((hugo)-[:ACTED_IN]->(m))
RETURN n.name, m.title;

LAB: who are the 5 busiest actors

START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) AS count ORDER BY count DESC LIMIT 5;

LAB: recommend 3 actors that Keanu Reeces should work with, but has not.

START keanu=node:node_auto_index(name="Keanu Reeves"), actor=node(*)
MATCH (actor)-[:ACTED_IN]->(m)
WHERE NOT((m)<-[:ACTED_IN]-(keanu))
RETURN DISTINCT actor.name LIMIT 3;

START keanu=node:node_auto_index(name="Keanu Reeves"),
actor=node(*)
MATCH (actor)-[:ACTED_IN]->(m)
WHERE NOT((m)<-[:ACTED_IN]-(keanu))
RETURN DISTINCT actor.name, count(m) AS num_movies
ORDER BY num_movies DESC
LIMIT 3;


solution: actors that have acted with Keanu and actors they've acted with

START keanu=node:node_auto_index(name="Keanu Reeves"),
actor=node(*)
MATCH (keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(c),
(c)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc)
WHERE NOT((keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc))
AND coc <> keanu
RETURN coc.name, count(coc)
ORDER BY count(coc) DESC
LIMIT 3;


## gistfile2.md

      
    Raw
  

              gistfile2.md
            
          
    Updating with Cypher


create a node CREATE ({title: "Mystic River", release: 1993});
update a node START m=node:node_auto_index(title="Mystic River") SET m.tagline = "We bury our sins here, Dave." RETURN m;
create unique relationships between two nodes

START movie=node:node_auto_index(title="Mystic River"),
kevin=node:node_auto_index(name="Kevin Bacon")
CREATE UNIQUE (kevin)-[r:ACTED_IN {roles:["Sean"]}]->(movie)
RETURN kevin;

LAB: Change Kevin Bacon's roles in Mystic River from "Sean" to "Sean Devine"


easy way, just re-setting the collection

START movie=node:node_auto_index(title="Mystic River"),
kevin=node:node_auto_index(name="Kevin Bacon")
MATCH (kevin)-[r:ACTED_IN]->(movie)
SET r.roles = ["Sean Devine"]
RETURN r;


what about without removing bad ones

START movie=node:node_auto_index(title="Mystic River"),
kevin=node:node_auto_index(name="Kevin Bacon")
MATCH (kevin)-[r:ACTED_IN]->(movie)
SET r.roles = filter(n in r.roles : n <> "Sean") + "Sean Devine"
RETURN r;


filter will return roles that match the predicated

LAB: Add Clint Eastwood as the Director of Mystic River

START movie=node:node_auto_index(title="Mystic River"),
clint=node:node_auto_index(name="Clint Eastwood")
CREATE UNIQUE (clint)-[r:DIRECTED]->(movie)
RETURN r;

LAB: List all the characters in the movie "The Matrix"

START movie=node:node_auto_index(title="The Matrix")
MATCH (movie)<-[r:ACTED_IN]-()
RETURN r.roles;


remove a node

# won't work because has relationships
START emil=node:node_auto_index(name="Emil Eifrem")
DELETE emil;

SO
START emil=node:node_auto_index(name="Emil Eifrem")
MATCH (emil)-[r?]->()
DELETE r, emil;


[r?] will get optional relationships, meaning he doesn't have to have any

Lab: Add KNOWS relationships between all actors who were in the same movie

START a=node(*)
MATCH (a)-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
CREATE UNIQUE (a)-[:KNOWS]->(b);


adds 2 a->b and b->a, if leave arrow off it has no direction, but cannot MATCH with an arrow


add directors into the KNOWS relations: START a=node(*) MATCH (a)-[:ACTED_IN|DIRECTED]->()<-[:ACTED_IN|DIRECTED]-(b) CREATE UNIQUE (a)-[:KNOWS]->(b);


Variable Length paths


query for veriable length (a)-[*1..3]->(b)
Friends of friends

START keanu=node:node_auto_index(name="Keanu Reeves")
MATCH (keanu)-[:KNOWS*2]->(fof)
RETURN DISTINCT fof.name;


exactly 2 is *2, * is unlimited

Lab: Return friends-of-firends who are not immediate friends

START a=node:node_auto_index(name="Keanu Reeves")
MATCH (a)-[:KNOWS*2]->(b)
WHERE NOT((a)-[:KNOWS]-(b))
AND a<>b
RETURN DISTINCT b.name;


bacon number for charlize

START bacon=node:node_auto_index(name="Kevin Bacon"),
charlize=node:node_auto_index(name="Charlize Theron")
MATCH p=shortestPath((charlize)-[:KNOWS*]->(bacon))
RETURN length(p);

Lab: return the names of people joinin Charlize to kevin


my soltuion

START bacon=node:node_auto_index(name="Kevin Bacon"),
other=node:node_auto_index(name="Charlize Theron")
MATCH p=shortestPath((other)-[:KNOWS*]->(bacon))
RETURN extract(n in nodes(p) : n.name);

actual solution, actually not real just the solution with movie titles too:
START bacon=node:node_auto_index(name="Kevin Bacon"),
other=node:node_auto_index(name="Charlize Theron")
MATCH p=shortestPath((other)-[:ACTED_IN|DIRECTED*]-(bacon))
RETURN extract(n in nodes(p) : coalesce(n.title?, n.name?));


leaving off > in ACTED_IN|DIRECTED allows the query to go either direction over that relationship
in WHERE: name? defaults to true when doesn't exist, name! defaults to false
in RETURN it doesn't matter
can also use has(n.name) which is like n.name!


## gistfile3.md

      
    Raw
  

              gistfile3.md
            
          
    More about Neo4j

APIs


Cypher
REST
Plugin API for special cases

Licensing options


community
advanced
partnership

get involved


stack overflow
google group
GitHub issues
meetups

2.0 overview


http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf

labels


can have more than one per node
similar to a tag concept
can index on labels, so don't need to specify automatic indexes
labels must be strings

"Schema" indexes


can create on the fly
queries take advantage of these automatically

Cypher syntax

query


old syntax

START a = node:node_auto_index("name:*")
MATCH a-[:ACTED_IN]->m
RETURN a.name, m.title;


new

MATCH a:Actor-[:ACTED_IN]->m:Movie
RETURN a.name, m.title;


Actor and Movie are labels on the nodes

index syntaz


old

STATT a=node:node_auto_index(name="Andres")
RETURN a;


new

MATCH a:Actor
WHERE a.name = 'Andres'
RETURn a;


can still use "legacy" indexes if you want, but cannot mix with labels

MATCH a:Actor-[:ACTED_IN]->m:Movie<-[:DIRECTED]-d:Director
WHERE a.name = "" and d.name = ""
USING INDEX d:Director(name)
RETURN a,m,d;


USING INDEX is a hint telling it where to start because we know it's quicker/smaller index
creating an index CREATE INDEX on :Actor(name);
creates in the background

Adding a label

MATCH a-[r:ACTED_IN]->m
SET a:Actor, m:Movie;