totetmatt/Oreilly Graphgist

## Oreilly Graphgist
:neo4j-version: 2.0.0
:author: Matthieu Totet
:twitter: @totetmatt
:tags: Oreilly, Books, Media

= Oreilly book graph


== Dat Model

Data scrapped from Oreilly Web site : http://www.oreilly.com/

----
[:Book]-[:AUTHOR]->[:Author]
[:Book]-[:CATEGORY]->[:Category]
[:Category]-[:HAS_SUBJECT]->[:Subject]
[:Book]-[:MEDIA {price}]->[:MediaType]
----

== Dat Data
++++
<img src="http://cdn.memegenerator.net/instances/500x/45891161.jpg"/>
++++
I didn't manage to make it works on GraphGist...

So you have to download this file : http://matthieu-totet.fr/Neoreilly.zip. It's a neo4J database with all the data already loaded.
Unzip it and run the bin/neo4J script to start the DB.

Go to http://localhost:7474/webadmin and copy paste the queries below to test. (http://localhost:7474/browser/ won't work because of the 1000 result limit)

You can also find in the zip the original oreilly.geoff file that I use to import data into Neo4J.

== Query Time !
=== Get the average price & number of media, grouped by Subject
[source,cypher]
----
MATCH (s)<-[:HAS_SUBJECT]-(c)<-[l:CATEGORY]-(n)-[r:MEDIA]->(m)
return AVG(r.price),count(*),s.name
ORDER BY AVG(r.price) DESC
----

=== Get average price & number of media , grouped by Category and MediaType
[source,cypher]
----
MATCH (c)<-[l:CATEGORY]-(n)-[r:MEDIA]->(m)
return AVG(r.price),count(*),m.name,c.name
ORDER BY c.name,AVG(r.price) DESC
----

=== Get number of author per media
[source,cypher]
----
MATCH (a1)<-[:AUTHOR]-(m)
return count(*),m.name
ORDER BY count(*) DESC
----

=== Get number of media per author
[source,cypher]
----
MATCH (a1)<-[:AUTHOR]-(m)
return count(*),a1.name
ORDER BY count(*) DESC
----
=== Get all authors and all the collaborations
[source,cypher]
----
MATCH (a1)<-[:AUTHOR]-()-[:AUTHOR]->(a2)
WHERE a1 <> a2
return a1.name,collect(a2.name)
----
=== Get people that works with "David Pogue"
[source,cypher]
----
MATCH (a1)<-[:AUTHOR]-()-[:AUTHOR]->(a2)
WHERE a1 <> a2 and a1.name ="David Pogue"
return a2.name
----

=== ... and see in which category they works together
[source,cypher]
----
MATCH (a1)<-[:AUTHOR]-(m)-[:AUTHOR]->(a2),(m)-[:CATEGORY]->(c)
WHERE a1 <> a2 and a1.name ="David Pogue"
return a2.name,c.name
----

=== Get the average price per media type for all Authors
[source,cypher]
----
MATCH (a1)<-[:AUTHOR]-(m)-[t:MEDIA]->(ty)
return a1.name,avg(t.price),ty.name
ORDER BY avg(t.price) DESC
----
=== Get Books that are mediaType Video OR a non-Ebook under 50€ about Certification and with at least 2 authors.
[source,cypher]
----
MATCH (a1)<-[:AUTHOR]-(m)-[t:MEDIA]->(ty),(s)<-[:HAS_SUBJECT]-(c)<-[l:CATEGORY]-(m)
WITH count(distinct a1) as nbAuthor, m as m,t as t, ty as ty, c as c, s as s
WHERE ( ty.name="Video" OR ( t.price < 50 AND ty.name<> "Ebook" AND ty.name<>"PrintandEbook") )
AND s.name="Certification"
AND nbAuthor >= 2
return distinct m.name,t.price,ty.name, nbAuthor
----

=== do your own query
[source]
----
(you)-[:RELEASE]->(creativity)
----
	:neo4j-version: 2.0.0
	:author: Matthieu Totet
	:twitter: @totetmatt
	:tags: Oreilly, Books, Media

	= Oreilly book graph


	== Dat Model

	Data scrapped from Oreilly Web site : http://www.oreilly.com/

	----
	[:Book]-[:AUTHOR]->[:Author]
	[:Book]-[:CATEGORY]->[:Category]
	[:Category]-[:HAS_SUBJECT]->[:Subject]
	[:Book]-[:MEDIA {price}]->[:MediaType]
	----

	== Dat Data
	++++
	<img src="http://cdn.memegenerator.net/instances/500x/45891161.jpg"/>
	++++
	I didn't manage to make it works on GraphGist...

	So you have to download this file : http://matthieu-totet.fr/Neoreilly.zip. It's a neo4J database with all the data already loaded.
	Unzip it and run the bin/neo4J script to start the DB.

	Go to http://localhost:7474/webadmin and copy paste the queries below to test. (http://localhost:7474/browser/ won't work because of the 1000 result limit)

	You can also find in the zip the original oreilly.geoff file that I use to import data into Neo4J.

	== Query Time !
	=== Get the average price & number of media, grouped by Subject
	[source,cypher]
	----
	MATCH (s)<-[:HAS_SUBJECT]-(c)<-[l:CATEGORY]-(n)-[r:MEDIA]->(m)
	return AVG(r.price),count(*),s.name
	ORDER BY AVG(r.price) DESC
	----

	=== Get average price & number of media , grouped by Category and MediaType
	[source,cypher]
	----
	MATCH (c)<-[l:CATEGORY]-(n)-[r:MEDIA]->(m)
	return AVG(r.price),count(*),m.name,c.name
	ORDER BY c.name,AVG(r.price) DESC
	----

	=== Get number of author per media
	[source,cypher]
	----
	MATCH (a1)<-[:AUTHOR]-(m)
	return count(*),m.name
	ORDER BY count(*) DESC
	----

	=== Get number of media per author
	[source,cypher]
	----
	MATCH (a1)<-[:AUTHOR]-(m)
	return count(*),a1.name
	ORDER BY count(*) DESC
	----
	=== Get all authors and all the collaborations
	[source,cypher]
	----
	MATCH (a1)<-[:AUTHOR]-()-[:AUTHOR]->(a2)
	WHERE a1 <> a2
	return a1.name,collect(a2.name)
	----
	=== Get people that works with "David Pogue"
	[source,cypher]
	----
	MATCH (a1)<-[:AUTHOR]-()-[:AUTHOR]->(a2)
	WHERE a1 <> a2 and a1.name ="David Pogue"
	return a2.name
	----

	=== ... and see in which category they works together
	[source,cypher]
	----
	MATCH (a1)<-[:AUTHOR]-(m)-[:AUTHOR]->(a2),(m)-[:CATEGORY]->(c)
	WHERE a1 <> a2 and a1.name ="David Pogue"
	return a2.name,c.name
	----

	=== Get the average price per media type for all Authors
	[source,cypher]
	----
	MATCH (a1)<-[:AUTHOR]-(m)-[t:MEDIA]->(ty)
	return a1.name,avg(t.price),ty.name
	ORDER BY avg(t.price) DESC
	----
	=== Get Books that are mediaType Video OR a non-Ebook under 50€ about Certification and with at least 2 authors.
	[source,cypher]
	----
	MATCH (a1)<-[:AUTHOR]-(m)-[t:MEDIA]->(ty),(s)<-[:HAS_SUBJECT]-(c)<-[l:CATEGORY]-(m)
	WITH count(distinct a1) as nbAuthor, m as m,t as t, ty as ty, c as c, s as s
	WHERE ( ty.name="Video" OR ( t.price < 50 AND ty.name<> "Ebook" AND ty.name<>"PrintandEbook") )
	AND s.name="Certification"
	AND nbAuthor >= 2
	return distinct m.name,t.price,ty.name, nbAuthor
	----

	=== do your own query
	[source]
	----
	(you)-[:RELEASE]->(creativity)
	----