Last active
August 29, 2015 13:56
-
-
Save totetmatt/8867551 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
:neo4j-version: 2.0.0 | |
:author: Matthieu Totet | |
:twitter: @totetmatt | |
:tags: Oreilly, Books, Media | |
= Oreilly book graph | |
== Dat Model | |
Data scrapped from Oreilly Web site : http://www.oreilly.com/ | |
---- | |
[:Book]-[:AUTHOR]->[:Author] | |
[:Book]-[:CATEGORY]->[:Category] | |
[:Category]-[:HAS_SUBJECT]->[:Subject] | |
[:Book]-[:MEDIA {price}]->[:MediaType] | |
---- | |
== Dat Data | |
++++ | |
<img src="http://cdn.memegenerator.net/instances/500x/45891161.jpg"/> | |
++++ | |
I didn't manage to make it works on GraphGist... | |
So you have to download this file : http://matthieu-totet.fr/Neoreilly.zip. It's a neo4J database with all the data already loaded. | |
Unzip it and run the bin/neo4J script to start the DB. | |
Go to http://localhost:7474/webadmin and copy paste the queries below to test. (http://localhost:7474/browser/ won't work because of the 1000 result limit) | |
You can also find in the zip the original oreilly.geoff file that I use to import data into Neo4J. | |
== Query Time ! | |
=== Get the average price & number of media, grouped by Subject | |
[source,cypher] | |
---- | |
MATCH (s)<-[:HAS_SUBJECT]-(c)<-[l:CATEGORY]-(n)-[r:MEDIA]->(m) | |
return AVG(r.price),count(*),s.name | |
ORDER BY AVG(r.price) DESC | |
---- | |
=== Get average price & number of media , grouped by Category and MediaType | |
[source,cypher] | |
---- | |
MATCH (c)<-[l:CATEGORY]-(n)-[r:MEDIA]->(m) | |
return AVG(r.price),count(*),m.name,c.name | |
ORDER BY c.name,AVG(r.price) DESC | |
---- | |
=== Get number of author per media | |
[source,cypher] | |
---- | |
MATCH (a1)<-[:AUTHOR]-(m) | |
return count(*),m.name | |
ORDER BY count(*) DESC | |
---- | |
=== Get number of media per author | |
[source,cypher] | |
---- | |
MATCH (a1)<-[:AUTHOR]-(m) | |
return count(*),a1.name | |
ORDER BY count(*) DESC | |
---- | |
=== Get all authors and all the collaborations | |
[source,cypher] | |
---- | |
MATCH (a1)<-[:AUTHOR]-()-[:AUTHOR]->(a2) | |
WHERE a1 <> a2 | |
return a1.name,collect(a2.name) | |
---- | |
=== Get people that works with "David Pogue" | |
[source,cypher] | |
---- | |
MATCH (a1)<-[:AUTHOR]-()-[:AUTHOR]->(a2) | |
WHERE a1 <> a2 and a1.name ="David Pogue" | |
return a2.name | |
---- | |
=== ... and see in which category they works together | |
[source,cypher] | |
---- | |
MATCH (a1)<-[:AUTHOR]-(m)-[:AUTHOR]->(a2),(m)-[:CATEGORY]->(c) | |
WHERE a1 <> a2 and a1.name ="David Pogue" | |
return a2.name,c.name | |
---- | |
=== Get the average price per media type for all Authors | |
[source,cypher] | |
---- | |
MATCH (a1)<-[:AUTHOR]-(m)-[t:MEDIA]->(ty) | |
return a1.name,avg(t.price),ty.name | |
ORDER BY avg(t.price) DESC | |
---- | |
=== Get Books that are mediaType Video OR a non-Ebook under 50€ about Certification and with at least 2 authors. | |
[source,cypher] | |
---- | |
MATCH (a1)<-[:AUTHOR]-(m)-[t:MEDIA]->(ty),(s)<-[:HAS_SUBJECT]-(c)<-[l:CATEGORY]-(m) | |
WITH count(distinct a1) as nbAuthor, m as m,t as t, ty as ty, c as c, s as s | |
WHERE ( ty.name="Video" OR ( t.price < 50 AND ty.name<> "Ebook" AND ty.name<>"PrintandEbook") ) | |
AND s.name="Certification" | |
AND nbAuthor >= 2 | |
return distinct m.name,t.price,ty.name, nbAuthor | |
---- | |
=== do your own query | |
[source] | |
---- | |
(you)-[:RELEASE]->(creativity) | |
---- |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment