Skip to content

Instantly share code, notes, and snippets.

@maylim17
Last active August 29, 2015 14:02
Show Gist options
  • Save maylim17/ed982f1b817cb49ebbf7 to your computer and use it in GitHub Desktop.
Save maylim17/ed982f1b817cb49ebbf7 to your computer and use it in GitHub Desktop.
Gist of Gists
= Gist of Gists
:author: May Lim
:twitter: @aprmayyjun
:tags: domain:graph-database, use-case:content-management
This tutorial acts as a simplified Content Management System (CMS) which models and analyzes the repository of Neo4j GraphGists.
image::http://i.imgur.com/X3uMgkX.png[]
'Typical implementation of a GraphGist'
'''
== Introduction
A web CMS is a bundled or stand-alone application that can create, manage, store and deploy content on web pages. Common platforms that use CMSs are blogs, news sites and e-commerce retailers.
'''
== Problem
Graph technologies have mostly been unexplored by enterprises due to the prevalence of long-standing technologies (think SQL and Relational DBMS), and probably the huge inertia to move from legacy systems as well. But http://blogs.gartner.com/svetlana-sicular/5-big-data-companies-to-watch/[many] believe graphs have a great future as they deliver new insights from data, which is a game-changer in the recent infiltration of 'Big Data' (rapidly increasing data quantity and complexity).
'''
== Solution
The https://github.com/neo4j-contrib/graphgist/wiki[Neo4j GraphGist wiki] is a growing repository of use cases that demonstrate how graph databases can uncover the potential behind connected data of various forms. In fact, these use cases themselves form a valuable data set that can be better organized/managed to produce even more useful insights for interested end-users to understand the application of Neo4j, or graph databases in general.
'''
== Data Model
Core entities are modeled as nodes, and their relationships mapped as shown in the diagram below. For presentation simplicity, only 15 GraphGists (and a subset of their Tweet mentions) are initialized in this data set.
image::http://i.imgur.com/lak9QWe.png[]
'Visualization of the data model used'
Each Gist in the repository is also scored based on several factors (see brief rubrics below), so that end-users are able to reap the greatest benefit from Gists that are better presented.
'Scoring Rubrics: Explanation of Problem, Solution, Data Model, Queries (15% each); Number of queries (15%); Number of Tweet mentions (15%); Number of Facebook Likes (5%); Gist Vote Score (5%)'
'''
== Data Set
//setup
//hide
[source,cypher]
----
//Create :Domain
CREATE (finance:Domain {Name:"Finance"}),
(banking:Domain {Name:"Banking"}),
(competitivesport:Domain {Name:"Competitive Sport"}),
(healthcare:Domain {Name:"Healthcare"}),
(biomedical:Domain {Name:"Biomedical"}),
(lifescience:Domain {Name:"Life Science"}),
(resources:Domain {Name:"Resources"}),
(environment:Domain {Name:"Environment"}),
(retail:Domain {Name:"Retail"}),
(foodandbeverage:Domain {Name:"Food and Beverage"}),
(telecommunication:Domain {Name:"Telecommunication"}),
(cloudcomputing:Domain {Name:"Cloud Computing"}),
(software:Domain {Name:"Software"}),
(infrastructure:Domain {Name:"Infrastructure"}),
(automobile:Domain {Name:"Automobile"}),
(transport:Domain {Name:"Transport"}),
(entertainment:Domain {Name:"Entertainment"}),
(massmedia:Domain {Name:"Mass Media"}),
(datascience:Domain {Name:"Data Science"}),
(gaming:Domain {Name:"Gaming"}),
(socialnetwork:Domain {Name:"Social Network"}),
(it:Domain {Name:"IT"}),
(socialmedia:Domain {Name:"Social Media"}),
(ecology:Domain {Name:"Ecology"}),
//Create :UseCase
(frauddetection:UseCase {Name:"Fraud Detection"}),
(performanceanalysis:UseCase {Name:"Performance Analysis"}),
(playerranking:UseCase {Name:"Player Ranking"}),
(drugportfolio:UseCase {Name:"Drug Portfolio"}),
(bioinformatics:UseCase {Name:"Bioinformatics"}),
(resourcemanagement:UseCase {Name:"Resource Management"}),
(recommendation:UseCase {Name:"Recommendation"}),
(productcomparison:UseCase {Name:"Product Comparison"}),
(costoptimization:UseCase {Name:"Cost Optimization"}),
(satellitenavigation:UseCase {Name:"Satellite Navigation"}),
(georouting:UseCase {Name:"Geo-routing"}),
(informationretrieval:UseCase {Name:"Information Retrieval"}),
(strategy:UseCase {Name:"Strategy"}),
(temporalanalysis:UseCase {Name:"Temporal Analysis"}),
(networkanalysis:UseCase {Name:"Network Analysis"}),
(systemanalysis:UseCase {Name:"System Analysis"}),
//Create :Gist
(gist1:Gist {Title:"Bank Fraud Detection", URL:"http://gist.neo4j.org/?github-neo4j-contrib%2Fgists%2F%2Fother%2FBankFraudDetection.adoc", Score:"6.1"}),
(gist2:Gist {Title:"FIS Alpine Skiing Seasons", URL:"http://gist.neo4j.org/?8019511", Score:"5.7"}),
(gist3:Gist {Title:"Pharmaceutical Drugs and their Targets", URL:"http://gist.neo4j.org/?7968633", Score:"3.4"}),
(gist4:Gist {Title:"Piping Water", URL:"http://gist.neo4j.org/?8141937", Score:"4.4"}),
(gist5:Gist {Title:"Single Malt Scotch Whisky", URL:"http://gist.neo4j.org/?8139605", Score:"5.4"}),
(gist6:Gist {Title:"Amazon Web Services Global Infrastructure Graph", URL:"http://gist.neo4j.org/?8526106", Score:"4.6"}),
(gist7:Gist {Title:"Roads, Nodes and Automobiles", URL:"http://gist.neo4j.org/?8635758", Score:"8.6"}),
(gist8:Gist {Title:"Movie Recommendations with k-NN and Cosine Similarity ", URL:"Movie Recommendations with k-NN and Cosine Similarity ", Score:"8.8"}),
(gist9:Gist {Title:"Chess Games and Positions", URL:"http://gist.neo4j.org/?6506717", Score:"6.5"}),
(gist10:Gist {Title:"Credit Card Fraud Detection", URL:"http://gist.neo4j.org/?3ad4cb2e3187ab21416b", Score:"6.5"}),
(gist11:Gist {Title:"Time Scale Event Meta Model", URL:"http://gist.neo4j.org/?github-kbastani/gists//meta/TimeScaleEventMetaModel.adoc", Score:"6.9"}),
(gist12:Gist {Title:"Information Flow Through a Network", URL:"http://gist.neo4j.org/?451b776cb3a782965a63", Score:"6.5"}),
(gist13:Gist {Title:"TV Show Graph", URL:"http://gist.neo4j.org/?github-neo4j-contrib%2Fgists%2F%2Fother%2FTVShowGraph.adoc", Score:"3.9"}),
(gist14:Gist {Title:"Small Social Networking Website", URL:"http://gist.neo4j.org/?8389170", Score:"6.3"}),
(gist15:Gist {Title:"Trophic Cascade: A Wolf’s Role in the Ecosystem of Yellowstone", URL:"http://gist.neo4j.org/?0ac320c799ce55089377", Score:"6.0"}),
//Create :Person
(person1:Person {Name:"Kenny Bastani", Twitter:"@kennybastani", TwitterFollowers:1172}),
(person2:Person {Name:"pac_19", Twitter:"@pac_19", TwitterFollowers:13}),
(person3:Person {Name:"Josh Kunken", Twitter:"@joshkunken", TwitterFollowers:87}),
(person4:Person {Name:"Shaun Daley", Twitter:"@shaundaley1", TwitterFollowers:1055}),
(person5:Person {Name:"Patrick Baumgartner", Twitter:"@patbaumgartner", TwitterFollowers:832}),
(person6:Person {Name:"Aidan Casey", Twitter:"@aidanjcasey", TwitterFollowers:279}),
(person7:Person {Name:"Jacqui Read", Twitter:"@tekiegirl", TwitterFollowers:126}),
(person8:Person {Name:"Nicole White", Twitter:"@_nicolemargaret", TwitterFollowers:283}),
(person9:Person {Name:"Wes Freeman", Twitter:"@wefreema", TwitterFollowers:508}),
(person10:Person {Name:"Jean Villedieu", Twitter:"@jvilledieu", TwitterFollowers:395}),
(person11:Person {Name:"George Lesica", Twitter:"@glesica", TwitterFollowers:141}),
(person12:Person {Name:"William Lyon", Twitter:"@lyonwj", TwitterFollowers:123}),
(person13:Person {Name:"Neo4j", Twitter:"@neo4j", TwitterFollowers:8762}),
(person14:Person {Name:"Raul Estrada", Twitter:"@RaulEstrada", TwitterFollowers:204}),
(person15:Person {Name:"May Lim", Twitter:"@aprmayyjun", TwitterFollowers:14}),
(person16:Person {Name:"//\\", Twitter:"@phongphan", TwitterFollowers:48}),
(person17:Person {Name:"Félix López", Twitter:"@flopezluis", TwitterFollowers:429}),
(person18:Person {Name:"Shreyas Kulkarni", Twitter:"@curlyreggie", TwitterFollowers:366}),
(person19:Person {Name:"NoSQL Weekly", Twitter:"@nosqlweekly", TwitterFollowers:1334}),
(person20:Person {Name:"Damien Francois", Twitter:"@damienfrancois", TwitterFollowers:267}),
(person36:Person {Name:"Mario", Twitter:"@mariogray", TwitterFollowers:55}),
(person37:Person {Name:"Lorenzo Speranzoni", Twitter:"@inserpio", TwitterFollowers:51}),
(person38:Person {Name:"Peter Neubauer", Twitter:"@peterneubauer", TwitterFollowers:3105}),
//Create :Tweet
(tweet1:Tweet {URL:"https://twitter.com/phongphan/status/467280192969261057", Retweets:"0", Favorites:"0"}),
(tweet2:Tweet {URL:"https://twitter.com/flopezluis/status/453539574607015936", Retweets:"1", Favorites:"2"}),
(tweet3:Tweet {URL:"https://twitter.com/curlyreggie/status/451786059672604672", Retweets:"0", Favorites:"3"}),
(tweet4:Tweet {URL:"https://twitter.com/nosqlweekly/status/453215736597839872", Retweets:"0", Favorites:"1"}),
(tweet5:Tweet {URL:"https://twitter.com/damienfrancois/status/451728463561756672", Retweets:"0", Favorites:"0"}),
(tweet21:Tweet {URL:"https://twitter.com/mariogray/status/468984213853577216", Retweets:"4", Favorites:"6"}),
(tweet22:Tweet {URL:"https://twitter.com/inserpio/status/456751398307123200", Retweets:"0", Favorites:"0"}),
(tweet23:Tweet {URL:"https://twitter.com/pac_19/status/413237812452794368", Retweets:"4", Favorites:"4"}),
(tweet24:Tweet {URL:"https://twitter.com/kennybastani/status/413790146996092929", Retweets:"3", Favorites:"1"}),
(tweet25:Tweet {URL:"https://twitter.com/peterneubauer/status/416859050391785473", Retweets:"1", Favorites:"3"}),
//Create (domain)-[:HAS_USECASE]->(usecase)
(finance)-[:HAS_USECASE]->(frauddetection),
(banking)-[:HAS_USECASE]->(frauddetection),
(competitivesport)-[:HAS_USECASE]->(performanceanalysis),
(competitivesport)-[:HAS_USECASE]->(playerranking),
(healthcare)-[:HAS_USECASE]->(drugportfolio),
(healthcare)-[:HAS_USECASE]->(bioinformatics),
(biomedical)-[:HAS_USECASE]->(drugportfolio),
(biomedical)-[:HAS_USECASE]->(bioinformatics),
(resources)-[:HAS_USECASE]->(resourcemanagement),
(environment)-[:HAS_USECASE]->(resourcemanagement),
(retail)-[:HAS_USECASE]->(recommendation),
(retail)-[:HAS_USECASE]->(productcomparison),
(foodandbeverage)-[:HAS_USECASE]->(recommendation),
(foodandbeverage)-[:HAS_USECASE]->(productcomparison),
(telecommunications)-[:HAS_USECASE]->(costoptimization),
(cloudcomputing)-[:HAS_USECASE]->(costoptimization),
(software)-[:HAS_USECASE]->(costoptimization),
(infrastructure)-[:HAS_USECASE]->(satellitenavigation),
(infrastructure)-[:HAS_USECASE]->(georouting),
(automobile)-[:HAS_USECASE]->(satellitenavigation),
(automobile)-[:HAS_USECASE]->(georouting),
(transport)-[:HAS_USECASE]->(satellitenavigation),
(transport)-[:HAS_USECASE]->(georouting),
(entertainment)-[:HAS_USECASE]->(recommendation),
(entertainment)-[:HAS_USECASE]->(informationretrieval),
(massmedia)-[:HAS_USECASE]->(recommendation),
(massmedia)-[:HAS_USECASE]->(informationretrieval),
(datascience)-[:HAS_USECASE]->(recommendation),
(datascience)-[:HAS_USECASE]->(informationretrieval),
(entertainment)-[:HAS_USECASE]->(strategy),
(gaming)-[:HAS_USECASE]->(strategy),
(retail)-[:HAS_USECASE]->(frauddetection),
(socialnetwork)-[:HAS_USECASE]->(recommendation),
(socialnetwork)-[:HAS_USECASE]->(temporalanalysis),
(retail)-[:HAS_USECASE]->(temporalanalysis),
(finance)-[:HAS_USECASE]->(networkanalysis),
(finance)-[:HAS_USECASE]->(georouting),
(it)-[:HAS_USECASE]->(networkanalysis),
(it)-[:HAS_USECASE]->(georouting),
(telecommunication)-[:HAS_USECASE]->(networkanalysis),
(telecommunication)-[:HAS_USECASE]->(georouting),
(socialnetwork)-[:HAS_USECASE]->(networkanalysis),
(socialnetwork)-[:HAS_USECASE]->(informationretrieval),
(socialmedia)-[:HAS_USECASE]->(networkanalysis),
(socialmedia)-[:HAS_USECASE]->(informationretrieval),
(socialmedia)-[:HAS_USECASE]->(recommendation),
(lifescience)-[:HAS_USECASE]->(systemanalysis),
(lifescience)-[:HAS_USECASE]->(systemanalysis),
(ecology)-[:HAS_USECASE]->(systemanalysis),
(ecology)-[:HAS_USECASE]->(systemanalysis),
//Create (gist)-[:HAS_DOMAIN]->(domain)
(gist1)-[:HAS_DOMAIN]->(finance),
(gist1)-[:HAS_DOMAIN]->(banking),
(gist2)-[:HAS_DOMAIN]->(competitivesport),
(gist3)-[:HAS_DOMAIN]->(healthcare),
(gist3)-[:HAS_DOMAIN]->(biomedical),
(gist4)-[:HAS_DOMAIN]->(resources),
(gist4)-[:HAS_DOMAIN]->(environment),
(gist5)-[:HAS_DOMAIN]->(retail),
(gist5)-[:HAS_DOMAIN]->(foodandbeverage),
(gist6)-[:HAS_DOMAIN]->(telecommunications),
(gist6)-[:HAS_DOMAIN]->(cloudcomputing),
(gist6)-[:HAS_DOMAIN]->(software),
(gist7)-[:HAS_DOMAIN]->(infrastructure),
(gist7)-[:HAS_DOMAIN]->(automobile),
(gist7)-[:HAS_DOMAIN]->(transport),
(gist8)-[:HAS_DOMAIN]->(entertainment),
(gist8)-[:HAS_DOMAIN]->(massmedia),
(gist8)-[:HAS_DOMAIN]->(datascience),
(gist9)-[:HAS_DOMAIN]->(entertainment),
(gist9)-[:HAS_DOMAIN]->(gaming),
(gist10)-[:HAS_DOMAIN]->(finance),
(gist10)-[:HAS_DOMAIN]->(retail),
(gist10)-[:HAS_DOMAIN]->(banking),
(gist11)-[:HAS_DOMAIN]->(socialnetwork),
(gist11)-[:HAS_DOMAIN]->(retail),
(gist12)-[:HAS_DOMAIN]->(finance),
(gist12)-[:HAS_DOMAIN]->(it),
(gist12)-[:HAS_DOMAIN]->(telecommunication),
(gist13)-[:HAS_DOMAIN]->(entertainment),
(gist13)-[:HAS_DOMAIN]->(massmedia),
(gist14)-[:HAS_DOMAIN]->(socialnetwork),
(gist14)-[:HAS_DOMAIN]->(socialmedia),
(gist15)-[:HAS_DOMAIN]->(lifescience),
(gist15)-[:HAS_DOMAIN]->(ecology),
//Create (gist)-[:HAS_USECASE]->(usecase)
(gist1)-[:HAS_USECASE]->(frauddetection),
(gist2)-[:HAS_USECASE]->(performanceanalysis),
(gist2)-[:HAS_USECASE]->(playerranking),
(gist3)-[:HAS_USECASE]->(drugportfolio),
(gist3)-[:HAS_USECASE]->(bioinformatics),
(gist4)-[:HAS_USECASE]->(resourcemanagement),
(gist5)-[:HAS_USECASE]->(recommendation),
(gist5)-[:HAS_USECASE]->(productcomparison),
(gist6)-[:HAS_USECASE]->(costoptimization),
(gist7)-[:HAS_USECASE]->(satellitenavigation),
(gist7)-[:HAS_USECASE]->(georouting),
(gist8)-[:HAS_USECASE]->(recommendation),
(gist8)-[:HAS_USECASE]->(informationretrieval),
(gist9)-[:HAS_USECASE]->(strategy),
(gist10)-[:HAS_USECASE]->(frauddetection),
(gist11)-[:HAS_USECASE]->(recommendation),
(gist11)-[:HAS_USECASE]->(temporalanalysis),
(gist12)-[:HAS_USECASE]->(networkanalysis),
(gist12)-[:HAS_USECASE]->(georouting),
(gist13)-[:HAS_USECASE]->(informationretrieval),
(gist14)-[:HAS_USECASE]->(networkanalysis),
(gist14)-[:HAS_USECASE]->(informationretrieval),
(gist14)-[:HAS_USECASE]->(recommendation),
(gist15)-[:HAS_USECASE]->(systemanalysis),
//Create (person)-[:AUTHORED]->(gist)
(person1)-[:AUTHORED]->(gist1),
(person1)-[:AUTHORED]->(gist11),
(person2)-[:AUTHORED]->(gist2),
(person3)-[:AUTHORED]->(gist3),
(person4)-[:AUTHORED]->(gist4),
(person5)-[:AUTHORED]->(gist5),
(person6)-[:AUTHORED]->(gist6),
(person7)-[:AUTHORED]->(gist7),
(person8)-[:AUTHORED]->(gist8),
(person9)-[:AUTHORED]->(gist9),
(person10)-[:AUTHORED]->(gist10),
(person11)-[:AUTHORED]->(gist12),
(person12)-[:AUTHORED]->(gist12),
(person13)-[:AUTHORED]->(gist13),
(person14)-[:AUTHORED]->(gist14),
(person15)-[:AUTHORED]->(gist15),
//Create (tweet)-[:MENTIONED]->(gist)
(tweet1)-[:MENTIONED]->(gist8),
(tweet2)-[:MENTIONED]->(gist8),
(tweet3)-[:MENTIONED]->(gist8),
(tweet4)-[:MENTIONED]->(gist8),
(tweet5)-[:MENTIONED]->(gist8),
(tweet21)-[:MENTIONED]->(gist1),
(tweet22)-[:MENTIONED]->(gist1),
(tweet23)-[:MENTIONED]->(gist2),
(tweet24)-[:MENTIONED]->(gist2),
(tweet25)-[:MENTIONED]->(gist5),
//Create (person)-[:AUTHORED]->(tweet)
(person16)-[:POSTED]->(tweet1),
(person17)-[:POSTED]->(tweet2),
(person18)-[:POSTED]->(tweet3),
(person19)-[:POSTED]->(tweet4),
(person20)-[:POSTED]->(tweet5),
(person36)-[:POSTED]->(tweet21),
(person37)-[:POSTED]->(tweet22),
(person2)-[:POSTED]->(tweet23),
(person1)-[:POSTED]->(tweet24),
(person38)-[:POSTED]->(tweet25)
RETURN *
----
//graph
'''
== Entity Link Analysis
==== (a) "What are some Gists in the Finance domain?"
A simple query to return a list of Gists (with its title, URL and quality score) that are tagged in the Finance domain. The list is also sorted by decreasing Gist score to highlight the more popular Gists.
[source,cypher]
----
MATCH (gist:Gist)-[:HAS_DOMAIN]->(:Domain {Name:"Finance"})
RETURN gist.Title AS Finance_Gist_Title, gist.URL AS Gist_URL, gist.Score AS Gist_Score
ORDER BY gist.Score DESC
----
//output
//table
==== (b) "How are the Gists ranked in terms of quality?"
The list returned ranks all Gists in the data set according to their quality score. Each of their domains are also reflected to provide more information.
[source,cypher]
----
MATCH (author:Person)-[:AUTHORED]->(gist:Gist)-[:HAS_DOMAIN]->(domain:Domain)
RETURN gist.Title AS Gist_Title, collect(DISTINCT author.Name) AS Author, collect(domain.Name) AS Gist_Domain, gist.Score AS Gist_Score
ORDER BY gist.Score DESC
----
//output
//table
==== (c) "What are some use cases (for graph databases) in the Retail domain?"
Retailers that are new to the graph technology may wish to find out how can graph databases be applied effectively in their industry. Example Gists are also provided.
[source,cypher]
----
MATCH (domain:Domain)<-[:HAS_DOMAIN]-(gist:Gist)-[:HAS_USECASE]->(usecase:UseCase)<-[:HAS_USECASE]-(domain:Domain)
WHERE domain.Name = "Retail"
RETURN usecase.Name AS Use_Case, collect(gist.Title) AS Example_Gist_Titles
----
//output
//table
==== (d) "What are the domains that can use graph databases for Recommendation use case?"
Budding developers who are interested in building recommendation engines with graph databases may be keen to find out the industries that likely need their skills. Example Gists are also provided.
[source,cypher]
----
MATCH (domain:Domain)<-[:HAS_DOMAIN]-(gist:Gist)-[:HAS_USECASE]->(usecase:UseCase)
WHERE usecase.Name = "Recommendation"
RETURN domain.Name AS Domain, collect(gist.Title) AS Example_Gist_Titles
ORDER BY domain.Name
----
//output
//table
==== (e) "Who are some popular Twitter users that have given any Gist a shout-out?"
Our Gists have also caught the attention of trending Twitter users ;)
[source,cypher]
----
MATCH (gist:Gist)<-[:MENTIONED]-(tweet:Tweet)<-[:POSTED]-(user:Person)
RETURN user.Twitter AS Twitter_User, user.TwitterFollowers AS Follower_Count, gist.Title AS Gist_Title, tweet.URL AS Tweet_URL
ORDER BY user.TwitterFollowers DESC
LIMIT 5
----
//output
//table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment