nrouyer/open_food_facts.txt

## open_food_facts.txt
= Open Food Facts
:neo4j-version: 2.3.2
:author: Nicolas Rouyer
:toc: right
:twitter: @rrrouyer
:description: Open Food Facts
:tags: domain:open data, use-case:open food facts

This interactive Neo4j graph tutorial shows how to handle open food facts data... for the best of your health !

'''

:toc: left

'''

[[introduction]]
== Open food facts

image::http://static.openfoodfacts.org/images/misc/openfoodfacts-logo-en-178x150.png[Open Food Facts]

Open food facts is the free food product database !
It gathers information and data on food products from around the world.

This database is completed thanks to individual, international contributors who scan product barcodes and upload pictures of their label.

http://fr.openfoodfacts.org/

[[graph_creation]]
=== Creating open food facts graph
[source,cypher]
----
// OPEN FOOD FACTS - CREATE INDEX ON PRODUCT CODE
CREATE INDEX ON :Product(code);
// OPEN FOOD FACTS - CREATE INDEX ON INGREDIENT FOOD
CREATE INDEX ON :Ingredient(food);

// OPEN FOOD FACTS - LOAD PRODUCT NODES
LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/fdcea6bbb5ea8e3377fb2ec3139b0c17/raw/f93de61779cccd74e9eb94566a6efc3358b00db1/off_products_163.csv" AS csvLine
FIELDTERMINATOR ";"
CREATE (p:Product { 	code: csvLine.code,
					name: coalesce(csvLine.name,"NA"),
					sodiumPer100g: coalesce(csvLine.sodiumPer100g,"NA"),
					fatPer100g: coalesce(csvLine.fatPer100g,"NA"),
					proteinsPer100g: coalesce(csvLine.proteinsPer100g,"NA"),
					nutritionScoreFrPer100g: coalesce(csvLine.nutritionScoreFrPer100g,"NA"),
					energyPer100g: coalesce(csvLine.energyPer100g,"NA"),
					fiberPer100g: coalesce(csvLine.fiberPer100g,"NA"),
					sugarsPer100g: coalesce(csvLine.sugarsPer100g,"NA"),
					saltPer100g: coalesce(csvLine.saltPer100g,"NA"),
					nutritionScoreUkPer100g: coalesce(csvLine.nutritionScoreUkPer100g,"NA")
});

// LOAD INGREDIENTS
LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/40f6b8d87f7f239f5a0f62e7756f8879/raw/1cc542d70a1bc1829d2643eb02d046f733545bb8/off_ingredients_163.csv" AS csvLine
FIELDTERMINATOR ';'
MERGE (i:Ingredient { food: csvLine.Ingredient });

// LOAD COMPOSITION RELATIONSHIPS
LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/8cc54359a569d5df445f8fa1066f2daa/raw/ecc608e6b59c971db448a4fd59c62e14c21dd0cc/off_composition_163.csv" AS csvLine
FIELDTERMINATOR ';'
MATCH (p:Product { code: csvLine.code })
MATCH (i:Ingredient { food: csvLine.food })
MERGE (p)-[:CONTAINS { rank: coalesce(csvLine.rank,"NA") }]->(i);

----
Graph data loaded !

'''

[[graph_consultation]]
=== Sodas' ingredients war : Pepsi vs 7Up

As a warm up, let us compare Pepsi and 7Up composition (whose tastes are radically different...)

[source,cypher]
----
// OPEN FOOD FACTS - GET 7UP INGREDIENTS SHORT NAME
MATCH (p:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
WITH i, SPLIT(i.food, '/') AS Ingredients
RETURN Ingredients[4] AS Ingredient
----

[source,cypher]
----
// OPEN FOOD FACTS - GET PEPSI INGREDIENTS SHORT NAME
MATCH (p:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i:Ingredient)
WITH i, SPLIT(i.food, '/') AS Ingredients
RETURN Ingredients[4] AS Ingredient
----

[source,cypher]
----
// OPEN FOOD FACTS - GET INGREDIENTS COMMON TO PEPSI & 7UP
MATCH (p1:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
MATCH (p2:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i)
RETURN i.food AS Ingredient
----

'''

[[graph_food_neighbours]]
=== My neighbourfood

With Cypher we can easily query the food data model and find closest enighbours to any given product (that is, the products that have the most common ingredients)

[source,cypher]
----
// OPEN FOOD FACTS - CLOSEST NEIGHBOURS (2)
MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product)
RETURN p2.name AS Neighbour, collect(i.food) AS Ingredients_In_Common, count(i.food) AS STRENGTH
ORDER BY STRENGTH DESC
----

[[graph_refactoring]]
=== Refactoring OFF graph

Let us simply perform a cosmetic customization on our Open Food Facts graph :

[source,cypher]
----
MATCH (i:Ingredient)
WITH i, SPLIT(i.food, '/') AS Ingredients
SET i.shortname = Ingredients[4]
----

Then we query the closest neighbours again, with a better formatted result.

[source,cypher]
----
MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product)
RETURN p2.name AS Neighbour, collect(i.shortname) AS Ingredients_In_Common, count(i.food) AS STRENGTH
ORDER BY STRENGTH DESC
----

[[shortest_food_path]]
=== Find shortest path between products

Hey, let us randomly select 2 food products. Can we discover anything with the shortest path between them ?

[source,cypher]
----
// OPEN FOOD FACTS - SHORTEST PATH
MATCH (rollmops:Product {name:"Rollmop Herrings"}),
      (macncheese:Product {code:"00036559"}),
      p =(rollmops)-[:CONTAINS*1..6]-(macncheese)
WHERE ANY(x IN NODES(p) WHERE x:Ingredient)
WITH p ORDER BY LENGTH(p) LIMIT 1
RETURN p
----

'''

[[conclusion]]
=== Let's feed the food graph...
This great, open, database helps find insights on our day-to-day essential. It was made for more transparency and to share universal knowledge. +

image::http://static.openfoodfacts.org/images/svg/crowdsourcing-icon.svg[Yes we scan !!!]

There are excellent works performed with the whole database on [Kaggle](https://www.kaggle.com/ "The Home of Data Science"). +
Please enjoy and post your remarks: +
mailto:rouyer.nicolas@gmail.com>[Nicolas ROUYER]
	= Open Food Facts
	:neo4j-version: 2.3.2
	:author: Nicolas Rouyer
	:toc: right
	:twitter: @rrrouyer
	:description: Open Food Facts
	:tags: domain:open data, use-case:open food facts

	This interactive Neo4j graph tutorial shows how to handle open food facts data... for the best of your health !

	'''

	:toc: left

	'''

	[[introduction]]
	== Open food facts

	image::http://static.openfoodfacts.org/images/misc/openfoodfacts-logo-en-178x150.png[Open Food Facts]

	Open food facts is the free food product database !
	It gathers information and data on food products from around the world.

	This database is completed thanks to individual, international contributors who scan product barcodes and upload pictures of their label.

	http://fr.openfoodfacts.org/

	[[graph_creation]]
	=== Creating open food facts graph
	[source,cypher]
	----
	// OPEN FOOD FACTS - CREATE INDEX ON PRODUCT CODE
	CREATE INDEX ON :Product(code);
	// OPEN FOOD FACTS - CREATE INDEX ON INGREDIENT FOOD
	CREATE INDEX ON :Ingredient(food);

	// OPEN FOOD FACTS - LOAD PRODUCT NODES
	LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/fdcea6bbb5ea8e3377fb2ec3139b0c17/raw/f93de61779cccd74e9eb94566a6efc3358b00db1/off_products_163.csv" AS csvLine
	FIELDTERMINATOR ";"
	CREATE (p:Product { code: csvLine.code,
	name: coalesce(csvLine.name,"NA"),
	sodiumPer100g: coalesce(csvLine.sodiumPer100g,"NA"),
	fatPer100g: coalesce(csvLine.fatPer100g,"NA"),
	proteinsPer100g: coalesce(csvLine.proteinsPer100g,"NA"),
	nutritionScoreFrPer100g: coalesce(csvLine.nutritionScoreFrPer100g,"NA"),
	energyPer100g: coalesce(csvLine.energyPer100g,"NA"),
	fiberPer100g: coalesce(csvLine.fiberPer100g,"NA"),
	sugarsPer100g: coalesce(csvLine.sugarsPer100g,"NA"),
	saltPer100g: coalesce(csvLine.saltPer100g,"NA"),
	nutritionScoreUkPer100g: coalesce(csvLine.nutritionScoreUkPer100g,"NA")
	});

	// LOAD INGREDIENTS
	LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/40f6b8d87f7f239f5a0f62e7756f8879/raw/1cc542d70a1bc1829d2643eb02d046f733545bb8/off_ingredients_163.csv" AS csvLine
	FIELDTERMINATOR ';'
	MERGE (i:Ingredient { food: csvLine.Ingredient });

	// LOAD COMPOSITION RELATIONSHIPS
	LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/8cc54359a569d5df445f8fa1066f2daa/raw/ecc608e6b59c971db448a4fd59c62e14c21dd0cc/off_composition_163.csv" AS csvLine
	FIELDTERMINATOR ';'
	MATCH (p:Product { code: csvLine.code })
	MATCH (i:Ingredient { food: csvLine.food })
	MERGE (p)-[:CONTAINS { rank: coalesce(csvLine.rank,"NA") }]->(i);

	----
	Graph data loaded !

	'''

	[[graph_consultation]]
	=== Sodas' ingredients war : Pepsi vs 7Up

	As a warm up, let us compare Pepsi and 7Up composition (whose tastes are radically different...)

	[source,cypher]
	----
	// OPEN FOOD FACTS - GET 7UP INGREDIENTS SHORT NAME
	MATCH (p:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
	WITH i, SPLIT(i.food, '/') AS Ingredients
	RETURN Ingredients[4] AS Ingredient
	----

	[source,cypher]
	----
	// OPEN FOOD FACTS - GET PEPSI INGREDIENTS SHORT NAME
	MATCH (p:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i:Ingredient)
	WITH i, SPLIT(i.food, '/') AS Ingredients
	RETURN Ingredients[4] AS Ingredient
	----

	[source,cypher]
	----
	// OPEN FOOD FACTS - GET INGREDIENTS COMMON TO PEPSI & 7UP
	MATCH (p1:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
	MATCH (p2:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i)
	RETURN i.food AS Ingredient
	----

	'''

	[[graph_food_neighbours]]
	=== My neighbourfood

	With Cypher we can easily query the food data model and find closest enighbours to any given product (that is, the products that have the most common ingredients)

	[source,cypher]
	----
	// OPEN FOOD FACTS - CLOSEST NEIGHBOURS (2)
	MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product)
	RETURN p2.name AS Neighbour, collect(i.food) AS Ingredients_In_Common, count(i.food) AS STRENGTH
	ORDER BY STRENGTH DESC
	----

	[[graph_refactoring]]
	=== Refactoring OFF graph

	Let us simply perform a cosmetic customization on our Open Food Facts graph :

	[source,cypher]
	----
	MATCH (i:Ingredient)
	WITH i, SPLIT(i.food, '/') AS Ingredients
	SET i.shortname = Ingredients[4]
	----

	Then we query the closest neighbours again, with a better formatted result.

	[source,cypher]
	----
	MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product)
	RETURN p2.name AS Neighbour, collect(i.shortname) AS Ingredients_In_Common, count(i.food) AS STRENGTH
	ORDER BY STRENGTH DESC
	----

	[[shortest_food_path]]
	=== Find shortest path between products

	Hey, let us randomly select 2 food products. Can we discover anything with the shortest path between them ?

	[source,cypher]
	----
	// OPEN FOOD FACTS - SHORTEST PATH
	MATCH (rollmops:Product {name:"Rollmop Herrings"}),
	(macncheese:Product {code:"00036559"}),
	p =(rollmops)-[:CONTAINS*1..6]-(macncheese)
	WHERE ANY(x IN NODES(p) WHERE x:Ingredient)
	WITH p ORDER BY LENGTH(p) LIMIT 1
	RETURN p
	----

	'''

	[[conclusion]]
	=== Let's feed the food graph...
	This great, open, database helps find insights on our day-to-day essential. It was made for more transparency and to share universal knowledge. +

	image::http://static.openfoodfacts.org/images/svg/crowdsourcing-icon.svg[Yes we scan !!!]

	There are excellent works performed with the whole database on [Kaggle](https://www.kaggle.com/ "The Home of Data Science"). +
	Please enjoy and post your remarks: +
	mailto:rouyer.nicolas@gmail.com>[Nicolas ROUYER]