Skip to content

Instantly share code, notes, and snippets.

@rvanbruggen
Last active July 2, 2019 17:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save rvanbruggen/7b0e0a05ac4d4f953f2f to your computer and use it in GitHub Desktop.
Save rvanbruggen/7b0e0a05ac4d4f953f2f to your computer and use it in GitHub Desktop.
BeerGraphGuide into Neo4j Browser

Playing the Belgian Beer Graph into the Neo4j Browser!

For the past couple of years, I have been preaching the Neo4j gospel in many different places, meetups, conferences, and what have you. For the most part, I have been using a very specific demo that has been super well-received: The Belgian Beer Graph. It started out as a learning experience for me personally back in the day when Neo4j did not have any proper data import functionality - and I had to load the graph by jumping through all kinds of hoops.

belgian beers

And now: we can play this demo straight into the Neo4j browser. Such JOY!

Sources and History of the Belgian Beer Graph

Here’s some of the material that I base this on:

Sounds easy enough? Let’s explore.

Scraping Wikipedia into a Google spreadsheet

I remember "back in the day" I had to manually copy/paste the tables on the Wikipedia page into a spreadsheet in order to load it into Neo4j. Turns out that’s no longer necessary - at all. In a Google spreadsheet you now have a function called "ImportHTML" that allows you to connect to a URL, select a table, and import that table into the spreadsheet. See the Google Spreadsheet help page for more info.

On the Wikipedia page there are 27 tables like that (one for every letter of the alfabet + one for a couple of beers whose brand does not start with a letter) that we would need to go through. Using a formula, we can get all of these tables loaded quickly, and then use a "Query" functionality to remove the header rows.

Super easy. It will dynamically update the data in the sheet - so effectively we can now scrape Wikipedia and immediately make it available for loading into Neo4j from a CSV export. All we now need to do is make the sheet available to the public so that LOAD CSV can use the URL to download the dataset, and make sure the AlcoholPercentages are loaded correctly as numerical values.

Loading data into a Model

In the following three steps of this Guide, we are going to load the data into the model that we chose. As we know, in Neo4j, the model is still as important as ever - it’s just the Schema that’s optional.

The model that we would be using to import the data into is very simple:

Model

This may seem a bit weird at first: why would you "separate" the AlcoholPercentages from the BeerBrand nodes? Well, the answer is that there is not real good answer to that. The model should be driven by your query patterns - and since in my (future) query patterns the AlcoholPercentages will be quite important, it makes sense to separate them out.

Preparing to load the data: setting up the indexes

Let’s set up the indexes first:

create index on :BeerBrand(name);
create index on :Brewery(name);
create index on :BeerType(name);
create index on :AlcoholPercentage(value);

Using LOAD CSV to load the data

And then we are ready to load the data from the Google Spreadsheet that I mentioned before. We can do that in several different steps, or do it in one larger chained query. Here’s the latter version of the data loading query:

load csv with headers from
"https://docs.google.com/spreadsheets/d/1FwWxlgnOhOtrUELIzLupDFW7euqXfeh8x3BeiEY_sbI/export?format=csv&id=1FwWxlgnOhOtrUELIzLupDFW7euqXfeh8x3BeiEY_sbI&gid=0" as csv
with csv
where csv.BeerType is not null
merge (b:BeerType {name: csv.BeerType})
with csv
where csv.BeerBrand is not null
merge (b:BeerBrand {name: csv.BeerBrand})
with csv
where csv.Brewery is not null
merge (b:Brewery {name: csv.Brewery})
with csv
where csv.AlcoholPercentage is not null
merge (b:AlcoholPercentage {value: tofloat(replace(replace(csv.AlcoholPercentage,'%',''),',','.'))})
with csv
match (ap:AlcoholPercentage {value: tofloat(replace(replace(csv.AlcoholPercentage,'%',''),',','.'))}),
(br:Brewery {name: csv.Brewery}),
(bb:BeerBrand {name: csv.BeerBrand}),
(bt:BeerType {name: csv.BeerType})
merge (bb)-[:HAS_ALCOHOLPERCENTAGE]->(ap)
merge (bb)-[:IS_A]->(bt)
merge (bb)<-[:BREWS]-(br);

Here’s the result of that import:

We then need to add the "alcohol-index" and connect the different AlcoholPercentage nodes to eachother so that they form an AlcoholPercentage-timeline:

MATCH (ap:AlcoholPercentage)
WITH ap
ORDER BY ap.value ASC
WITH collect(ap) as sorted_ap
FOREACH(i in RANGE(0, length(sorted_ap)-2) |
  FOREACH(sorted_ap1 in [sorted_ap[i]] |
    FOREACH(sorted_ap2 in [sorted_ap[i+1]] |
      CREATE UNIQUE (sorted_ap1)-[:PRECEDES]->(sorted_ap2))));

And then we have the entire graph sorted. Let’s now do some querying.

Querying the Belgian Beer Graph

Let’s start simple. We can of course take a sample with 10 BeerBrands and their relationships:

match (b:BeerBrand)
with b
limit 10
match (b)--(n)
return b,n;

Let’s look at the sample:

And then we can do our typical Beer queries

MATCH (orval:BeerBrand {name:"Orval"})
return orval;

And again we look at the result:

But let’s crank it up a notch, and do something a bit more complex.

Some more complex/interesting queries!

Here’s a first one for you: the surroundings over "Orval" and the other Trappist beers:

MATCH (orval:BeerBrand {name:"Orval"})-[r*..2]-(other)
return orval,r,other;

And we look at the result:

And then this is one of my favourites: finding the paths between "Duvel" and "Orval" beers

MATCH (duvel:BeerBrand {name:"Duvel"}), (orval:BeerBrand {name:"Orval"}),
p = AllshortestPaths( (duvel)-[*]-(orval) )
RETURN p;

It’s still a great example of a typical pathfinding operation that is yielding interesting "beer recommendations" - something I can always appreciate :) …​

Conclusion: Loading Beers from Wikipedia into Neo4j is EASY!

That’s my conclusion, for sure. If I think back to how much time I had to spend a few years ago to create this beergraph, and how easy it has now become - it’s just absolutely stunning. Progress, FTW!

I hope this gist was interesting for you, and that we will see each other soon.

This browser guide was created by Rik Van Bruggen

<style type="text/css" media="screen">
/*
.nodes-image {
margin:-100;
}
*/
.imageblock .content img {
max-width: 900px;
max-height: 300px;
}
.deck h3 {
display: block !important;
margin-bottom:8px;
margin-top:5px;
}
.listingblock {
margin:8px;
}
.pull-bottom {
position:relative;
bottom:1em;
}
</style>
<style type="text/css" media="screen">
#editor.maximize-editor .CodeMirror-code {
font-size:40px;
line-height:45px;
}
</style><article class="guide">
<carousel class="deck container-fluid">
<!--slide class="row-fluid">
<div class="col-sm-3">
<h3>Playing the Belgian Beer Graph into the Neo4j Browser!</h3>
<p class="lead">Information</p>
<!dl>
<dt>author"</dt><dd>Rik Van Bruggen</dd>
</dl>
</div>
<div class="col-sm-9">
<figure>
<img style="width:300px" src=""/>
</figure>
</div>
</slide-->
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Playing the Belgian Beer Graph into the Neo4j Browser!</h3>
<br/>
<div>
<div class="paragraph">
<p>For the past couple of years, I have been preaching the <a href="http://www.neo4j.com">Neo4j</a> gospel in many different places, meetups, conferences, and what have you. For the most part, I have been using a very specific demo that has been super well-received: The Belgian Beer Graph. It started out as a learning experience for me personally back in the day when Neo4j did not have any proper data import functionality - and I had to load the graph by jumping through all kinds of hoops.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/2013/01/belgian-beers.jpg" alt="belgian beers">
</div>
</div>
<div class="paragraph">
<p>And now: we can play this demo straight into the Neo4j browser. Such JOY!</p>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Sources and History of the Belgian Beer Graph</h3>
<br/>
<div>
<div class="paragraph">
<p>Here&#8217;s some of the material that I base this on:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>a <a href="https://nl.wikipedia.org/wiki/Lijst_van_Belgische_bieren">Wikipedia page</a> that contains the source data</p>
</li>
<li>
<p>a <a href="http://bit.ly/belgianbeergraph-gdoc">google spreadsheet</a> that automatically loads the data from the Wikipedia page</p>
</li>
<li>
<p>a <a href="http://bit.ly/belgianbeergraph-full-csv">CSV export of that spreadsheet</a> that we can automatically feed to Cypher&#8217;s LOAD CSV functionality</p>
</li>
<li>
<p>a <a href="http://bit.ly/belgianbeergraph-limited-csv">subset of that spreadsheet as a csv file</a> that I put together for use in this graphgist.</p>
</li>
<li>
<p>a LOAD CSV statement that would load the data into the Neo4j Graph. Point it to the full CSV on your own machine - this graphgist will refer to the subset CSV file.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Sounds easy enough? Let&#8217;s explore.</p>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Scraping Wikipedia into a Google spreadsheet</h3>
<br/>
<div>
<div class="paragraph">
<p>I remember "back in the day" I had to manually copy/paste the tables on <a href="https://nl.wikipedia.org/wiki/Lijst_van_Belgische_bieren">the Wikipedia page</a> into a spreadsheet in order to load it into Neo4j. Turns out that&#8217;s no longer necessary - at all. In a Google spreadsheet you now have a function called "ImportHTML" that allows you to connect to a URL, select a table, and import that table into the spreadsheet. See <a href="https://support.google.com/docs/answer/3093339?hl=en">the Google Spreadsheet help page</a> for more info.</p>
</div>
<div class="paragraph">
<p>On the <a href="https://nl.wikipedia.org/wiki/Lijst_van_Belgische_bieren">Wikipedia page</a> there are 27 tables like that (one for every letter of the alfabet + one for a couple of beers whose brand does not start with a letter) that we would need to go through. Using a formula, we can get all of these tables loaded quickly, and then use a "Query" functionality to remove the header rows.</p>
</div>
<div class="paragraph">
<p>Super easy. It will dynamically update the data in the sheet - so effectively we can now scrape Wikipedia and immediately make it available for loading into Neo4j from a CSV export. All we now need to do is make the sheet available to the public so that LOAD CSV can use the URL to download the dataset, and make sure the AlcoholPercentages are loaded correctly as numerical values.</p>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Loading data into a Model</h3>
<br/>
<div>
<div class="paragraph">
<p>In the following three steps of this Guide, we are going to load the data into the model that we chose. As we know, in Neo4j, the model is still as important as ever - it&#8217;s just the Schema that&#8217;s optional.</p>
</div>
<div class="paragraph">
<p>The model that we would be using to import the data into is very simple:</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/2013/05/Screen-Shot-2013-05-17-at-21.40.36.png" alt="Model">
</div>
</div>
<div class="paragraph">
<p>This may seem a bit weird at first: why would you "separate" the AlcoholPercentages from the BeerBrand nodes? Well, the answer is that there is not real good answer to that. The model should be driven by your query patterns - and since in my (future) query patterns the AlcoholPercentages will be quite important, it makes sense to separate them out.</p>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Preparing to load the data: setting up the indexes</h3>
<br/>
<div>
<div class="paragraph">
<p>Let&#8217;s set up the indexes first:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">create index on :BeerBrand(name);</code></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">create index on :Brewery(name);</code></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">create index on :BeerType(name);</code></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">create index on :AlcoholPercentage(value);</code></pre>
</div>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Using LOAD CSV to load the data</h3>
<br/>
<div>
<div class="paragraph">
<p>And then we are ready to load the data from the Google Spreadsheet that I mentioned before. We can do that in several different steps, or do it in one larger chained query. Here&#8217;s the latter version of the data loading query:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">load csv with headers from
"https://docs.google.com/spreadsheets/d/1FwWxlgnOhOtrUELIzLupDFW7euqXfeh8x3BeiEY_sbI/export?format=csv&amp;id=1FwWxlgnOhOtrUELIzLupDFW7euqXfeh8x3BeiEY_sbI&amp;gid=0" as csv
with csv
where csv.BeerType is not null
merge (b:BeerType {name: csv.BeerType})
with csv
where csv.BeerBrand is not null
merge (b:BeerBrand {name: csv.BeerBrand})
with csv
where csv.Brewery is not null
merge (b:Brewery {name: csv.Brewery})
with csv
where csv.AlcoholPercentage is not null
merge (b:AlcoholPercentage {value: tofloat(replace(replace(csv.AlcoholPercentage,'%',''),',','.'))})
with csv
match (ap:AlcoholPercentage {value: tofloat(replace(replace(csv.AlcoholPercentage,'%',''),',','.'))}),
(br:Brewery {name: csv.Brewery}),
(bb:BeerBrand {name: csv.BeerBrand}),
(bt:BeerType {name: csv.BeerType})
merge (bb)-[:HAS_ALCOHOLPERCENTAGE]-&gt;(ap)
merge (bb)-[:IS_A]-&gt;(bt)
merge (bb)&lt;-[:BREWS]-(br);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Here&#8217;s the result of that import:</p>
</div>
<div class="paragraph">
<p>We then need to add the "alcohol-index" and connect the different AlcoholPercentage nodes to eachother so that they form an AlcoholPercentage-timeline:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">MATCH (ap:AlcoholPercentage)
WITH ap
ORDER BY ap.value ASC
WITH collect(ap) as sorted_ap
FOREACH(i in RANGE(0, length(sorted_ap)-2) |
FOREACH(sorted_ap1 in [sorted_ap[i]] |
FOREACH(sorted_ap2 in [sorted_ap[i+1]] |
CREATE UNIQUE (sorted_ap1)-[:PRECEDES]-&gt;(sorted_ap2))));</code></pre>
</div>
</div>
<div class="paragraph">
<p>And then we have the entire graph sorted. Let&#8217;s now do some querying.</p>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Querying the Belgian Beer Graph</h3>
<br/>
<div>
<div class="paragraph">
<p>Let&#8217;s start simple. We can of course take a sample with 10 BeerBrands and their relationships:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">match (b:BeerBrand)
with b
limit 10
match (b)--(n)
return b,n;</code></pre>
</div>
</div>
<div class="paragraph">
<p>Let&#8217;s look at the sample:</p>
</div>
<div class="paragraph">
<p>And then we can do our typical Beer queries</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">MATCH (orval:BeerBrand {name:"Orval"})
return orval;</code></pre>
</div>
</div>
<div class="paragraph">
<p>And again we look at the result:</p>
</div>
<div class="paragraph">
<p>But let&#8217;s crank it up a notch, and do something a bit more complex.</p>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Some more complex/interesting queries!</h3>
<br/>
<div>
<div class="paragraph">
<p>Here&#8217;s a first one for you: the surroundings over "Orval" and the other Trappist beers:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">MATCH (orval:BeerBrand {name:"Orval"})-[r*..2]-(other)
return orval,r,other;</code></pre>
</div>
</div>
<div class="paragraph">
<p>And we look at the result:</p>
</div>
<div class="paragraph">
<p>And then this is one of my favourites: finding the paths between "Duvel" and "Orval" beers</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher" class="highlight pre-scrollable code runnable"><code class="cypher language-cypher">MATCH (duvel:BeerBrand {name:"Duvel"}), (orval:BeerBrand {name:"Orval"}),
p = AllshortestPaths( (duvel)-[*]-(orval) )
RETURN p;</code></pre>
</div>
</div>
<div class="paragraph">
<p>It&#8217;s still a great example of a typical pathfinding operation that is yielding interesting "beer recommendations" - something I can always appreciate :) &#8230;&#8203;</p>
</div>
</div>
</div>
</slide>
<slide class="row-fluid">
<div class="col-sm-12">
<h3>Conclusion: Loading Beers from Wikipedia into Neo4j is EASY!</h3>
<br/>
<div>
<div class="paragraph">
<p>That&#8217;s my conclusion, for sure. If I think back to how much time I had to spend a few years ago to create this beergraph, and how easy it has now become - it&#8217;s just absolutely stunning. Progress, FTW!</p>
</div>
<div class="paragraph">
<p>I hope this gist was interesting for you, and that we will see each other soon.</p>
</div>
<div class="paragraph">
<p>This browser guide was created by <a href="mailto:rik@neotechnology.com">Rik Van Bruggen</a></p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://blog.bruggen.com">My Blog</a></p>
</li>
<li>
<p><a href="http://blog.bruggen.com/p/podcast_13.html">the Neo4j Graph Database Podcast</a></p>
</li>
<li>
<p><a href="http://learningneo4j.net">My Book</a></p>
</li>
<li>
<p><a href="http://twitter.com/rvanbruggen">On Twitter</a></p>
</li>
<li>
<p><a href="http://be.linkedin.com/in/rikvanbruggen/">On LinkedIn</a></p>
</li>
</ul>
</div>
</div>
</div>
</slide>
</carousel>
</article>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment