Skip to content

Instantly share code, notes, and snippets.

@nicolewhite
Last active April 7, 2019 16:34
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save nicolewhite/cc178bf2a761d7ac3a20 to your computer and use it in GitHub Desktop.
Save nicolewhite/cc178bf2a761d7ac3a20 to your computer and use it in GitHub Desktop.
Datasets for Graph Hack.

Datasets for Graph Hack 2014.

All stores are Neo4j 2.1.3.

Transportation

What is related, and how?

Flight	ORIGIN        Airport
Flight	DESTINATION   Airport
Flight	CARRIER       Carrier
Flight	AIRCRAFT      Aircraft
Flight	DELAYED_BY    Reason
Flight	CANCELLED_BY  Reason
Airport	IN_CITY	      City
City    IN_STATE      State
Flight	DIVERTED_TO   Airport

Which cities have more than 1 airport?

Query

MATCH (:Airport)-[:IN_CITY]->(c:City)
WITH c, COUNT(*) AS Count
WHERE Count > 1
RETURN c.name AS City, Count
ORDER BY Count

Result

City Count
WASHINGTON, DC 2
HOUSTON, TX 2
NEW YORK, NY 2
CHICAGO, IL 2

Which cities have more than 1 airport, and what are they?

Query

MATCH (a:Airport)-[:IN_CITY]->(c:City)
WITH c.name AS City, COLLECT(a.name) AS Airports
WHERE LENGTH(Airports) > 1
RETURN City, Airports
ORDER BY LENGTH(Airports)

Result

City Airports
HOUSTON, TX GEORGE BUSH INTERCONTINENTAL/HOUSTON, WILLIAM P HOBBY
WASHINGTON, DC RONALD REAGAN WASHINGTON NATIONAL, WASHINGTON DULLES INTERNATIONAL
NEW YORK, NY JOHN F. KENNEDY INTERNATIONAL, LAGUARDIA
CHICAGO, IL CHICAGO O'HARE INTERNATIONAL, CHICAGO MIDWAY INTERNATIONAL

Of the flights that departed from SFO and were diverted, what are the top airports that they were diverted to?

Query

MATCH (f:Flight)-[:ORIGIN]->(:Airport {abbr:'SFO'}),
      (f)-[d:DIVERTED_TO]->(a:Airport)
WHERE d.step = 1
RETURN a.abbr + ' - ' + a.name AS Airport, COUNT(*) AS Count
ORDER BY Count DESC
LIMIT 10

Result

Airport Count
SFO - SAN FRANCISCO INTERNATIONAL 13
TWF - JOSLIN FIELD - MAGIC VALLEY REGIONAL 6
RDM - ROBERTS FIELD 5
GJT - GRAND JUNCTION REGIONAL 3
RDD - REDDING MUNICIPAL 3
EUG - MAHLON SWEET FIELD 2
DTW - DETROIT METRO WAYNE COUNTY 2
LAX - LOS ANGELES INTERNATIONAL 2
BOS - LOGAN INTERNATIONAL 2
LMT - KLAMATH FALLS AIRPORT 2

What is related, and how?

Airline   BASED_IN  Country
Series    MODEL     Model
Aircraft  SERIES    Series
Aircraft  OWNED_BY  Airline

Which airlines own the most active Boeing 747s? How many do they own? In which countries are these airlines based?

Query

MATCH (a:Airline)-[:BASED_IN]->(c:Country),
      (craft:Aircraft)-[:OWNED_BY]->(a),
      (craft)-[:SERIES]->(:Series)-[:MODEL]->(m:Model)
WHERE craft.status = 'Active' AND m.name = 'Boeing 747'
WITH a, COUNT(m) AS count, c
ORDER BY count DESC
RETURN a.name AS Airline, c.name AS Country, count AS `Number of Boeing 747 Owned`
LIMIT 10

Result

Airline Country Number of Boeing 747 Owned
British Airways United Kingdom 48
Korean Air South Korea 36
Cathay Pacific Hong-Kong 33
Lufthansa Germany 32
China Airlines Taiwan 31
KLM Netherlands 26
United Airlines USA 24
Atlas Air USA 22
Transaero Airlines Russia 20
Cargolux Luxembourg 19

Which Embraer models have the most series of planes? What are they?

Query

MATCH (s:Series)-[:MODEL]->(m:Model)
WHERE m.name =~ 'Embraer.*'
WITH m.name AS Model, COLLECT(s.name) AS Series
RETURN Model, Series
ORDER BY LENGTH(Series) DESC

Result

Model Series
Embraer 135/145 145SM, 145RS, 145EW, 145MP, 145BJ, 135ER, 135EJ, 135LR, 135, 145MR, 145LI, 145XR, 145ER, 145EP, 145LR, 145LU, 145SA, 135BK, 145EU, 145EJ, 135BJ, 140LR
Embraer 190/195 195STD, 190SLR, 190ECJ, 190BJ, 195LR, 190IGW, 195AR, 195SR, 190ER, 190, 195IGW, 190AR, 190LR, 190STD, 190SR
Embraer 170/175 175SR, 175SU, 170SL, 170SE, 170STD, 175SD, 175LR, 170LR, 170SU, 175STD, 170ST, 175IGW, 170AR, 170SR
Embraer 120 Brasilia 120ER, 120QC, 120, 120RT

What is related, and how?

Gate   IN_TERMINAL  Terminal
Place  IN_CATEGORY  Category
Place  AT_GATE      Gate

At which gates in terminal A can I find a Starbucks?

Query

MATCH (p:Place)-[:AT_GATE]->(g:Gate),
      (g)-[:IN_TERMINAL]->(t:Terminal)
WHERE p.name = 'Starbucks Coffee' AND t.name = 'A'
RETURN g.gate AS Gate, t.name AS Terminal
ORDER BY Gate

Result

Gate Terminal
29 A
33 A
37 A

Where are places in terminal D in categories 'Bar' or 'Barbecue'?

Query

MATCH (p:Place)-[:IN_CATEGORY]->(c:Category)
WHERE c.name IN ['Bar', 'Barbecue']
WITH p
MATCH (p)-[:AT_GATE]->(g:Gate),
      (g)-[:IN_TERMINAL]->(t:Terminal)
WHERE t.name = 'D'
RETURN p.name AS Place, g.gate AS Gate
ORDER BY Gate

Result

Place Gate
Railhead BBQ 18
Tigin Irish Pub & Restaurant 20
Cousin's Bar-B-Q 28
Gas Monkey Bar & Grill 31

What is related, and how?

Stop  IN        Zone
Stop  NEXT      Stop
Train STOPS_AT  Stop

Which consecutive stops are furthest apart?

Query

MATCH (s1:Stop)-[r:NEXT]->(s2:Stop)
WITH s1, s2, r.distance AS Distance
ORDER BY Distance DESC
LIMIT 1
RETURN s1.name AS Stop1, s2.name AS Stop2, Distance

Result

Stop1 Stop2 Distance
BLOSSOM HILL MORGAN HILL 11.8

Which trains can I take home to San Mateo from San Francisco after the hackathon tonight?

Query

MATCH (t:Train)
WHERE t.direction = 'Southbound' AND t.type = 'Weekday'
WITH t

MATCH (t)-[leave:STOPS_AT]->(begin:Stop),
      (t)-[:STOPS_AT]->(end:Stop)
WHERE begin.name = 'SAN FRANCISCO' AND
      end.name = 'SAN MATEO'
WITH t, leave.hour*100 + leave.minute AS minutes, leave.hour_s AS Hour, leave.minute_s AS Minute
WHERE minutes > 2100 OR minutes < 100
RETURN t.id AS Train, Hour + ':' + Minute AS `Departure Time`

Result

Train Departure Time
194 21:40
196 22:40
198 00:01

Music

What is related, and how?

Scrobble  PLAYED     Track
Scrobble  NEXT       Scrobble
User      SCROBBLED  Scrobble
Artist    MEMBER_OF  Genre
Track     SUNG_BY    Artist

Which artists do users 'nmwhite0131' and 'smooligans' mutually listen to most often?

Query

MATCH (u1:User {username:'nmwhite0131'}), (u2:User {username:'smooligans'})
WITH u1, u2
MATCH (u1)-[:SCROBBLED]->(:Scrobble)-[:PLAYED]->(:Track)-[:SUNG_BY]->(a:Artist),
      (u2)-[:SCROBBLED]->(:Scrobble)-[:PLAYED]->(:Track)-[:SUNG_BY]->(a)
RETURN a.name AS Artist, COUNT(*) AS Count
ORDER BY Count DESC
LIMIT 10

Result

Artist Count
Drake 15019
Beyoncé 6869
Lana Del Rey 3723
Jay-Z 2967
Nicki Minaj 2624
Lil' Wayne 2242
Rihanna 1911
Daft Punk 1560
Explosions in the Sky 1165
Tyga 663

User nmwhite0131's last five scrobbles.

Query

MATCH (:User {username:'nmwhite0131'})-[:SCROBBLED]->(last:Scrobble),
      recent = (:Scrobble)-[:NEXT*4]->(last)
WHERE NOT ((last)-[:NEXT]->(:Scrobble))
WITH NODES(recent) AS scrobbles
UNWIND scrobbles AS s
MATCH (s)-[:PLAYED]->(t:Track)-[:SUNG_BY]->(a:Artist)
RETURN s.date AS Date, t.name AS Track, a.name AS Artist

Result

Date Track Artist
2014-09-24 Easy Saycet
2014-09-24 Bruyère Saycet
2014-09-24 Opal Saycet
2014-09-24 Her Movie Saycet
2014-09-24 We Walk Fast Saycet

Social

What is related, and how?

Tweet    USING     Source
Tweet    CONTAINS  Link
Tweet    MENTIONS  User
Tweet    RETWEETS  Tweet
User     POSTS     Tweet
Tweet    REPLY_TO  Tweet
Hashtag  TAGS      Tweet

How are hashtags 'java' and 'graph' related?

Query

MATCH (java:Hashtag {name:'java'}), (graph:Hashtag {name:'graph'})
WITH java, graph
MATCH p = shortestPath((java)-[:TAGS*]-(graph))
RETURN p

Result

hashtags

What is the most retweeted tweet, who posted it, and what did it say?

Query

MATCH (:Tweet)-[:RETWEETS]->(t:Tweet)
WITH t, COUNT(*) AS count
ORDER BY count DESC
LIMIT 1

MATCH (u:User)-[:POSTS]->(t)
RETURN u.screen_name AS User, t.text AS Tweet, count AS Retweets

Result

User Tweet Retweets
println_ts A twist on how to do login & registration form on #Neo4J's website. #ux #ui http://t.co/CmDIfkSDT3 http://t.co/fjtSik59Qa 5
@jkumar10
Copy link

jkumar10 commented Apr 7, 2019

I wanted to replicate this on my local machine. Where is the dataset? Where are the import queries?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment