Skip to content

Instantly share code, notes, and snippets.

@wvengen
Last active July 14, 2021 07:03
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save wvengen/0d202dafb78070baa6c269117f8bbf9e to your computer and use it in GitHub Desktop.
Save wvengen/0d202dafb78070baa6c269117f8bbf9e to your computer and use it in GitHub Desktop.
Semantic web of food notes

Try out the OpenFoodFacts data using Apache Jena.

Install

We'll use the Apache Jena docker image.

docker pull stain/jena-fuseki
docker run --name fuseki -p 3030:3030 -e ADMIN_PASSWORD=secret -e JVM_ARGS=-Xmx4g -v fuseki:/fuseki stain/jena-fuseki

Load data

First create persistent dataset openfoodfacts-en in the web-interface, then load the data.

docker stop fuseki

cd /tmp
wget https://world.openfoodfacts.org/data/en.openfoodfacts.org.products.rdf
cat /tmp/en.openfoodfacts.org.products.rdf | sed 's/fr\.openfoodfacts\.org/en\.openfoodfacts\.org/ >/tmp/aa && \
  mv /tmp/aa /tmp/en.openfoodfacts.org.products.rdf

docker run --rm -ti --volume fuseki:/fuseki -v /tmp:/staging stain/jena-fuseki \
  ./load.sh openfoodfacts-en /staging/en.openfoodfacts.org.products.rdf
  
docker start fuseki

Query

We'll look at food products with declared ingredients. Each row includes a product-ingredient combination.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX food: <http://data.lirmm.fr/ontologies/food#>
PREFIX off-ing: <http://en.openfoodfacts.org/ingredient/>
PREFIX off-product: <http://world-en.openfoodfacts.org/product/>
PREFIX wd: <https://query.wikidata.org/>

SELECT ?product ?code ?name ?ing_name
WHERE {
  ?product a food:FoodProduct;
           food:name ?name;
           food:code ?code;
           food:containsIngredient [food:food [food:name ?ing_name]].
  FILTER (?name != "" && ?code != "")
}
LIMIT 10

Let's see if we can link this to Wikidata. First just list Wikidata food ingredients having an link to their OpenFoodFacts counterpart.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdprop: <http://www.wikidata.org/prop/direct/>

SELECT ?object ?off_ing
WHERE {
  SERVICE <https://query.wikidata.org/sparql> {
    ?object wdprop:P5930 ?off_ing
  }
}
LIMIT 10

Then we're going to combine the earlier query with the Wikidata query. Since Wikidata uses the OpenFoodFacts ingredient id, but the OpenFoodFacts data uses full URIs for ingredients, we need to extract the name from the URI (as ?ing_id), and link using that.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX food: <http://data.lirmm.fr/ontologies/food#>
PREFIX off-ing: <http://en.openfoodfacts.org/ingredient/>
PREFIX off-product: <http://world-en.openfoodfacts.org/product/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdprop: <http://www.wikidata.org/prop/direct/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?product ?code ?name ?ing_name ?wdobj
WHERE {
  ?product a food:FoodProduct;
           food:name ?name;
           food:code ?code;
           food:containsIngredient [food:food ?ing].
  ?ing food:name ?ing_name.
  BIND(REPLACE(STR(?ing), STR(off-ing:en%3A), "") AS ?ing_id)
  SERVICE <https://query.wikidata.org/sparql> {
    ?wdobj wdprop:P5930 ?ing_id
  }
}
LIMIT 10

Now let's do something useful: show products with their ingredients that are food additives. So we'll only see products having food additives.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX food: <http://data.lirmm.fr/ontologies/food#>
PREFIX off-ing: <http://en.openfoodfacts.org/ingredient/>
PREFIX off-product: <http://world-en.openfoodfacts.org/product/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdprop: <http://www.wikidata.org/prop/direct/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?product ?code ?name ?ing_name ?wdobj
WHERE {
  ?product a food:FoodProduct;
           food:name ?name;
           food:code ?code;
           food:containsIngredient [food:food ?ing].
  ?ing food:name ?ing_name.
  BIND(REPLACE(STR(?ing), STR(off-ing:en%3A), "") AS ?ing_id)
  SERVICE <https://query.wikidata.org/sparql> {
    ?wdobj wdprop:P5930 ?ing_id; # find openfoodfacts ingredient id
           wdprop:P279 wd:Q189567 # that is a subclass of food additive
  }
}
LIMIT 10

Wikidata is Wikipedia for structured data. Let's see what we could learn from it regarding food products.

Companies in the food industry with their brands

We can look at companies in the food industry, like Unilever. These companies own (food) brands, which we're interested in. Let's see:

SELECT ?owner ?ownerLabel ?ownerUrl ?brand ?brandLabel ?brandUrl
WHERE 
{
  ?owner wdt:P452 wd:Q540912; # industry: food industry
         wdt:P1830 ?brand.
  OPTIONAL {
    ?owner wdt:P154 ?ownerLogo;
           wdt:P856 ?ownerUrl.
    ?brand wdt:P856 ?brandUrl
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

This returns interesting combinations like Kellogg's Eggo, but also Virgin Group, which is in the food industry (Cola and Vodka, according to the Russion Wikipedia page), but only non-food brands are listed.

Food brands

Perhaps we can better look for all things that are an instance of food and also a brand.

SELECT DISTINCT ?brand ?brandLabel ?brandUrl ?owner ?ownerLabel ?ownerUrl ?ownerLogo
WHERE 
{
  ?brand wdt:P31/wdt:P279* wd:Q2095;   # instance of subclass of: food
         wdt:P31 wd:Q431289.           # instance of: brand
  OPTIONAL {
    ?brand wdt:P856 ?brandUrl;
           wdt:P127 ?owner.
    ?owner wdt:P154 ?ownerLogo;
           wdt:P856 ?ownerUrl;
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Food ingredients

Let's see what Wikidata knows about food ingredients, and their alternative names in English and Dutch. Also included is whether the ingredient is vegetable, fruit, nut, etc.

SELECT
  ?ing ?ingLabel_en ?ingLabel_nl
  ?isVeg ?isFruit ?isNut ?isSpice ?isDairy ?isMeat ?isFish ?isChem
  ?Enumber
  (GROUP_CONCAT(DISTINCT(?ingAltLabel_en); separator = " | ") AS ?ingAltList_en)
  (GROUP_CONCAT(DISTINCT(?ingAltLabel_nl); separator = " | ") AS ?ingAltList_nl)
WHERE 
{
  ?ing wdt:P31 wd:Q25403900.

  BIND (EXISTS { ?ing wdt:P279   wd:Q11004. } AS ?isVeg)
  BIND (EXISTS { ?ing wdt:P279 wd:Q3314483. } AS ?isFruit)
  BIND (EXISTS { ?ing wdt:P279   wd:Q11009. } AS ?isNut)
  BIND (EXISTS { ?ing wdt:P279   wd:Q42527. } AS ?isSpice)
  BIND (EXISTS { ?ing wdt:P279  wd:Q185217. } AS ?isDairy)
  BIND (EXISTS { ?ing wdt:P279   wd:Q10990. } AS ?isMeat)
  BIND (EXISTS { ?ing wdt:P279  wd:Q600396. } AS ?isFish)
  BIND (EXISTS { ?ing wdt:P279   wd:Q11173. } AS ?isChem)
  
  OPTIONAL {
    ?ing skos:altLabel ?ingAltLabel_en. FILTER (lang(?ingAltLabel_en) = "en")
    ?ing skos:altLabel ?ingAltLabel_nl. FILTER (lang(?ingAltLabel_nl) = "nl")
    ?ing wdt:P628 ?Enumber
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". ?ing rdfs:label ?ingLabel_en. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl". ?ing rdfs:label ?ingLabel_nl. hint:Prior hint:runLast false. }
 }
 GROUP BY ?ing ?ingLabel_en ?ingLabel_nl ?isVeg ?isFruit ?isNut ?isSpice  ?isDairy ?isMeat ?isFish ?isChem ?Enumber

to be continued

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment