jexp/cypher-vs-sql.adoc Secret

## cypher-vs-sql.adoc

      
    Raw
  

              cypher-vs-sql.adoc
            
          
    Cypher vs. SQL


10 Things that are easier with Cypher than SQL


This guide is meant to highlight some of the powers of Cypher, not to belittle/diminish SQL, which is a very powerful and the most widely used database query language.


For specific questions and use-cases specific languages are often a better fit, so pick the right tool for the job.


In the guide we will mostly look at Cypher, showing sometimes the SQL equivalent as comparison.


SQL


Declarative query language for relational databases (no relationships) that is being developed since the 70s to also allow less-technical users to interact with databases.


Originally for single table operations, JOINs later added, has been evolving since with support for geo, json, xml, analytics etc.


SELECT p.product_name, count(*) as cnt
FROM customers c
JOIN orders o ON (c.customer_id = o.customer_id)
JOIN order_details od on (od.order_id = o.order_id)
JOIN products p ON (od.product_id = p.product_id)
WHERE p.product_name LIKE '%G%'
GROUP BY p.product_name
ORDER BY cnt DESC LIMIT 5;


      product_name      | cnt
------------------------+-----
 Guaraná Fantástica     |  51
 Gorgonzola Telino      |  51
 Gnocchi di nonna Alice |  50
 Mozzarella di Giovanni |  38
 Gumbär Gummibärchen    |  32


The relational model is based on tables (relations) as the main building block.
A fixed schema defines columns, their types and additional contraints and indexes.
References to other tables are foreign keys, many-to-many connections as join/link-tables.


Cypher


Declarative query language for graphs, based on graph pattern syntax (ascii-art).
It focuses on readability and efficient syntax.


Here is an example.
Later on you will be able to execute these examples by clicking on them and running them from the command-bar on top.


MATCH (a:Person)-[:ACTED_IN]->(m:Movie)

WITH a, count(m) as movies WHERE movies > 2

MATCH (a)-[:DIRECTED]->(m)

RETURN a.name, collect(m.title) as movies, count(*) as count
ORDER BY count DESC LIMIT 5;


The property graph model is based on representing entities and their connections
Its building blocks are nodes and relationships each of which can hold arbitrary properties.
Graphs are often schema free or schema optional to allow for a more flexible evolution of the data model.


Datasets


We pick two simple datasets, IMDB (Movies) and Northwind (Retail) as they don’t require you to learn a new domain and we can focus on the query langauge specifics.


Note


See how to import them below.


To play along, you can run the :play movies command and execute the create script on slide 2.
And also the :play northwind command and execute all the load commands of the guide.


Even better, you can also click and run this box to import all the data we need:


link:https://raw.githubusercontent.com/neo4j-graph-examples/northwind/main/scripts/northwind.cypher[role=include]
link:https://raw.githubusercontent.com/neo4j-graph-examples/movies/main/scripts/movies.cypher[role=include]


Graph Patterns vs. JOINS


Main Focus on patterns - Ascii-Art


Node: () / (:Person) / (p:Person {name:'Emil'})


Relationship: -- / --> / -[:RATED]-> / -[:RATED {stars:5}]->


(:Person {name:'Emil'})-[:RATED {stars:5}]->(:Movie {title:'The Matrix'})


Graph Patterns used for finding (MATCH), creating (CREATE), updating (MERGE), exploring data


each dash would be one to two join in SQL, e.g. 6 joins here

(:Customer)-->(:Product)<--(:Customer)-->(:Product)


FROM customers c1
JOIN orders o1 ON (c1.customer_id = o1.customer_id)
JOIN order_details od1 ON (od1.order_id = o1.order_id)
JOIN products p1 ON (od1.product_id = p1.product_id)
JOIN orders o2 ON (o2.customer_id = o1.customer_id)
JOIN order_details od2 ON (od2.order_id = o2.order_id)
JOIN products p2 ON (od2.product_id = p2.product_id)
WHERE c1.customer_name = 'Emil'
  AND p1.product_name = 'Philz Coffee'


Other Clauses


The following clauses may follow a MATCH clause. They work with the
properties stored at the nodes and relationships found in the graph
matching that pattern.


match

MATCH (a:Person)-[:ACTED_IN]→(m:Movie)


filter

WHERE m.title CONTAINS 'Matrix'


aggregate

WITH a, count(m) as movies WHERE movies > 2


return

RETURN a.name, movies


order

ORDER BY movies DESC


limit

LIMIT 5;


Or all together:


MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title CONTAINS 'Matrix'

WITH a, count(m) as movies WHERE movies > 2

RETURN a.name, movies
ORDER BY movies DESC LIMIT 5;


Path Patterns


Follow a chain of relationships to an arbitrary depth and return the full path


:param product=>"Pavlova";


MATCH path = (p:Product {productName:$product})-[:PART_OF*]->(root:Category)
RETURN path


Recursive SQL with Common Table Expression

WITH RECURSIVE subcategories AS (
    SELECT
        category_id,
        parent_category_id,
        category_name
    FROM
        categories
    WHERE
        category_id = 1
    UNION
        SELECT
            category_id,
            parent_category_id,
            category_name
        FROM
            categories c
        INNER JOIN subcategories sc
        ON sc.category_id = c.parent_category_id
) SELECT
    *
FROM
    subcategories;


Finding loops:


MATCH path = (p:Person)-[*1..5]-(p)
RETURN [n in nodes(path) | coalesce(n.name, n.title)] as path, length(path)
LIMIT 10


MATCH path = (p:Person)-[*3..10]-(p)
RETURN path
LIMIT 1


Shortest Path


Find the shortest connection between two entities


:param name=>"Meg Ryan";


MATCH path = shortestPath(
    (:Person {name:$name})-[:ACTED_IN*..10]-(:Person {name:'Kevin Bacon'}) )
RETURN path, length(path);


Aggregation without GROUP BY, Multi Part Statements with WITH


GROUP BY is superfluous, non-aggregated fields form grouping key


"in-between" aggregation with WITH, no HAVING required


any number of times


MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)

// in-between agreggation, only passing on declared fields
WITH a, count(m) as movies
// with filter on the aggregated value
WHERE movies > 2

// next query part
MATCH (a)-[:DIRECTED]->(m)

// aggregation at the end by a.name
RETURN a.name, collect(m.title) as movies, count(*) as count
ORDER BY count DESC LIMIT 5;


Built in data structure support - Lists and Maps


Literal syntax for lists [1,2,3] and range(1,100,2) and


maps {answer: 42}


list slices, index access, list comprehensions, quantors


map keys, dot-access, map projections


// list comprehension
WITH [id IN range(1,100)
        WHERE id%13 = 0 |
        // literal map construction
        {id:id, name:'Joe '+id, age:id%100}] as people

// list quantor predicate on list elements
WHERE ALL(p in people WHERE p.age % 13 = 0)
RETURN people


UNWIND and Collect


First class aggreation functions for creating real lists: collect


very useful for creating nested documents


keeps ordering


Turn a list into rows of values with UNWIND,


esp. helpful for data creation and updates based on lists of data


// turn list into 10k rows
UNWIND range(1,10000) as id

CREATE (p:Person {id:id, name:"Joe "+id, age:id%100})

// aggregate people into age groups, collect per age-group
RETURN p.age, count(*) as ageCount, collect(p.name)[0..5] as ageGroup


Pattern Comprehensions and Map Projections


Nested documents like in a document database


Borrowed from the GraphQL query langauge


MATCH (m:Movie)

// map projection, with property access
RETURN m { .*,
    // pattern comprehension
    actors:
        [(m)<-[r:ACTED_IN]-(a) |
            // property access, nested expressions
            a { .name, roles: r.roles,
                movies: size([()<-[:ACTED_IN]-(a)|a]) }
        ][0..5], // list slice
    // pattern comprehension with filter & expression
    directors: [(m)<-[:DIRECTED]-(d) WHERE d.born < 1975 | d.name]
    } as movieDocument
LIMIT 10


Combining Reads and Writes


e.g. Graph Refactoring, partial updates, idempotent graph construction


:param rating=>({title:'The Matrix',name:'Emil Eifrem', stars:5})


MATCH (m:Movie {title:$rating.title})
// possibly write
MERGE (u:User {name:$rating.name})

// possibly write
MERGE (u)-[r:RATED]->(m)
// write
SET r.stars = $rating.stars
WITH *
MATCH (m)<-[:ACTED_IN]-(a:Person)
// read & return
RETURN u, m, r, collect(a)


Transactional Batching


Transactional batching for loads, updates, refactoring, deletes


:auto MATCH (o:Order) // imagine 100M
CALL { WITH o
    MATCH (o)-[r:ORDERS]->()
    WITH o, sum(r.count) as products, sum(toFloat(r.unitPrice)*r.quantity) as total
    SET o.count = products, o.total = total
} IN TRANSACTIONS OF 100000 ROWS


Note


This is a new feature of Neo4j 4.4.
You can check your installation version in the Neo4j Browser sidebar or by running:


call dbms.components();


Built in Data Loading


Load CSV, JSON and other datatypes from other data sources


WITH "https://data.neo4j.com/importing/ratings.csv" AS url

LOAD CSV WITH HEADERS FROM url AS row

MATCH (m:Movie {movieId:row.movieId})
MERGE (u:User {userId:row.userId})
ON CREATE SET u.name = row.name

MERGE (u)-[r:RATED]->(m)
SET r.rating = toFloat(row.rating)
SET r.timestamp = toInteger(row.timestamp);


Load data from a web API (StackOverflow) using apoc.load.json


WITH "https://api.stackexchange.com/2.2/questions?pagesize=2&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url

// load the json data from the URL
CALL apoc.load.json(url) YIELD value

// turn element list into rows
UNWIND value.items AS item
// deconstruct row data
RETURN item.title, item.owner, item.creation_date, keys(item)
LIMIT 5;


APOC Utility Library


The apoc utility library is already installed on Neo4j AuraDB and Neo4j Sandbox
In Neo4j Desktop you can install it from the database settings.
In other Neo4j installations you can copy the jar from the labs into the plugins folder.
For Docker you can use the --env NEO4J_LABS_PLUGINS=["apoc"] environment variable.


You can check if apoc is installed by running:


call apoc.help();


Documentation for apoc.load.json


Spatial


Spatial datatypes (polar and cartesian, 2d and 3d), distance, bounding box


Dijkstra, A* and other pathfinding


MATCH (loc:Location)-[:HOSTS]->(e:Event)
// events near me
WHERE point.distance(loc.coords, $myLocation) < 1000
RETURN e.title, e.rating, size((e)<-[:RSVPD]-()) as guests
ORDER BY e.rating DESC LIMIT 5


Datetime


Datetime, Timezones, Durations


formatting, parsing, computation


with date('2021-06-01') as start, date('2021-06-30') as end
unwind [days IN range(0,duration.between(start,end).days) | start + duration({days:days})] as day
WITH * WHERE NOT EXISTS { (:Appointment {date:day}) }
RETURN day


Response to: https://twitter.com/craigkerstiens/status/1402688838124802054


Conclusion


Cypher is a really powerful, but still humane query language for Graphs and Data.


It has been adopted by many other graph databases already, including AWS Neptune, RedisGraph, AgensGraph, Memgraph.


It is the basis for GQL the ISO Graph Query language, so it’s a worthwhile investment to learn it.


There are a lot of resources


Cypher Refcard


GraphAcademy Training


Cypher Developer Guides


Cypher Manual


10 Things (actually 12)


✓ graph patterns vs. joins


✓ idempotent graph construction


✓ combining reads and writes


✓ UNWIND / collect


✓ in-line batching for updates / refactorings


✓ data deconstruction / map / list support


✓ list comprehensions / quantors


✓ multi-step-aggregation - WITH (bonus, look ma, no group by)


✓ LOAD CSV / apoc.load.json


✓ deep tree hierarchies


✓ shortest path


✓ map projection / pattern comprehension


✓ spatial, distance


✓ datetime, durations
match	`MATCH (a:Person)-[:ACTED_IN]→(m:Movie)`
filter	`WHERE m.title CONTAINS 'Matrix'`
aggregate	`WITH a, count(m) as movies WHERE movies > 2`
return	`RETURN a.name, movies`
order	`ORDER BY movies DESC`
limit	`LIMIT 5;`