Try out the OpenFoodFacts data using Apache Jena.
We'll use the Apache Jena docker image.
docker pull stain/jena-fuseki
docker run --name fuseki -p 3030:3030 -e ADMIN_PASSWORD=secret -e JVM_ARGS=-Xmx4g -v fuseki:/fuseki stain/jena-fuseki
First create persistent dataset openfoodfacts-en
in the web-interface, then load the data.
docker stop fuseki
cd /tmp
wget https://world.openfoodfacts.org/data/en.openfoodfacts.org.products.rdf
cat /tmp/en.openfoodfacts.org.products.rdf | sed 's/fr\.openfoodfacts\.org/en\.openfoodfacts\.org/ >/tmp/aa && \
mv /tmp/aa /tmp/en.openfoodfacts.org.products.rdf
docker run --rm -ti --volume fuseki:/fuseki -v /tmp:/staging stain/jena-fuseki \
./load.sh openfoodfacts-en /staging/en.openfoodfacts.org.products.rdf
docker start fuseki
We'll look at food products with declared ingredients. Each row includes a product-ingredient combination.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX food: <http://data.lirmm.fr/ontologies/food#>
PREFIX off-ing: <http://en.openfoodfacts.org/ingredient/>
PREFIX off-product: <http://world-en.openfoodfacts.org/product/>
PREFIX wd: <https://query.wikidata.org/>
SELECT ?product ?code ?name ?ing_name
WHERE {
?product a food:FoodProduct;
food:name ?name;
food:code ?code;
food:containsIngredient [food:food [food:name ?ing_name]].
FILTER (?name != "" && ?code != "")
}
LIMIT 10
Let's see if we can link this to Wikidata.
First just list Wikidata food ingredients having an link to their OpenFoodFacts counterpart.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdprop: <http://www.wikidata.org/prop/direct/>
SELECT ?object ?off_ing
WHERE {
SERVICE <https://query.wikidata.org/sparql> {
?object wdprop:P5930 ?off_ing
}
}
LIMIT 10
Then we're going to combine the earlier query with the Wikidata query.
Since Wikidata uses the OpenFoodFacts ingredient id, but the OpenFoodFacts data
uses full URIs for ingredients, we need to extract the name from the URI (as ?ing_id
),
and link using that.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX food: <http://data.lirmm.fr/ontologies/food#>
PREFIX off-ing: <http://en.openfoodfacts.org/ingredient/>
PREFIX off-product: <http://world-en.openfoodfacts.org/product/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdprop: <http://www.wikidata.org/prop/direct/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?product ?code ?name ?ing_name ?wdobj
WHERE {
?product a food:FoodProduct;
food:name ?name;
food:code ?code;
food:containsIngredient [food:food ?ing].
?ing food:name ?ing_name.
BIND(REPLACE(STR(?ing), STR(off-ing:en%3A), "") AS ?ing_id)
SERVICE <https://query.wikidata.org/sparql> {
?wdobj wdprop:P5930 ?ing_id
}
}
LIMIT 10
Now let's do something useful: show products with their ingredients that are food additives.
So we'll only see products having food additives.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX food: <http://data.lirmm.fr/ontologies/food#>
PREFIX off-ing: <http://en.openfoodfacts.org/ingredient/>
PREFIX off-product: <http://world-en.openfoodfacts.org/product/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdprop: <http://www.wikidata.org/prop/direct/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?product ?code ?name ?ing_name ?wdobj
WHERE {
?product a food:FoodProduct;
food:name ?name;
food:code ?code;
food:containsIngredient [food:food ?ing].
?ing food:name ?ing_name.
BIND(REPLACE(STR(?ing), STR(off-ing:en%3A), "") AS ?ing_id)
SERVICE <https://query.wikidata.org/sparql> {
?wdobj wdprop:P5930 ?ing_id; # find openfoodfacts ingredient id
wdprop:P279 wd:Q189567 # that is a subclass of food additive
}
}
LIMIT 10