{{ message }}

Instantly share code, notes, and snippets.

# pwin/ProductForge_SPARQL_exploration_1.md

Last active May 2, 2019
A script of an introduction to SPARQL and http://statistics.gov.scot for the ProductForge Hack event - 26 Jan 2018

ProductForge 26 January 2018

https://goo.gl/WPeY5r

General resources:

DEMO

1: Go to http://yasgui.org

2: Paste http://statistics.gov.scot/sparql into the endpoint box

3: let's see if there are any triples there (Basic Graph Pattern - BGP)

``````    ask
{?s ?p ?o}
``````

4: is there at least one named graph?

``````ask {
graph ?g {?s ?p ?o}
}
``````

5: How do we know it isn't just returning True all the time?

``````ask {
graph ?g
{?s <http://something.org/text> ?o}
}
``````

6: Lets look at some graphs

``````select distinct ?g
{graph ?g
{?s ?p ?o }
}
limit 200
``````

6: How many graphs are there?

``````select (count(distinct ?g) as ?numberOfGraphs)
{graph ?g
{?s ?p ?o }
}
``````

6a: What Classes of "thing" are there?

``````select distinct ?o
{graph ?g
{?s a ?o }
}
``````

6b: the 'a' in the above is a shorthand for rdf:type

So we could have looked for the Classes of 'thing' in the following way

``````PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?s
{graph ?g
{?s rdf:type rdfs:Class }
}
``````

7: Having spotted what looks like an interesting graph earlier, (just from the URI) let's rummage around

``````SELECT * WHERE {
graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
?sub ?pred ?obj .
}
}
LIMIT 10
``````

7a: noticing the 'cube' predicates, let's explore http://purl.org/linked-data/cube#structure

7b: let's look to see if there is a qb:DataSet

``````PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?s
{graph ?g
{?s rdf:type qb:DataSet }
}
``````

8: Just speculatively picking a 'structure' URI, we can describe it

``````describe <http://statistics.gov.scot/data/structure/p1-BMI-clinical>
``````

9: ...we can grub around more using 'describe'

``````describe <http://statistics.gov.scot/def/code-list/p1-BMI-clinical/weightCategory>
``````

10: let's explore this 'code list' to see what it is about

``````SELECT * WHERE {
?sub ?pred <http://statistics.gov.scot/def/concept/weight-category/clinical-overweight> .
}
LIMIT 10
``````
``````PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?obj ?objLabel WHERE {
<http://statistics.gov.scot/def/concept-scheme/weight-category> ?pred ?obj ;.
?obj    rdfs:label | dc:title ?objLabel .
}
LIMIT 100
``````

11: So, back to the query earlier

``````SELECT * WHERE {
graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
?sub ?pred ?obj .
}
}
LIMIT 10
``````

we can see several potentially interesting cube components for http://statistics.gov.scot/data/structure/p1-BMI-clinical

12: Find some observations

``````PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT * WHERE {
graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
}
}
LIMIT 10
``````

13: describe one

``````describe <http://statistics.gov.scot/data/p1-BMI-clinical/government-year/2007-2008/S08000015/gender/female/weight-category/clinical-underweight/percent-of-p1-children/ratio>
``````

Then pull the description to create a more general query

``````<http://statistics.gov.scot/data/p1-BMI-clinical/government-year/2007-2008/S08000015/gender/female/weight-category/clinical-underweight/percent-of-p1-children/ratio> a <http://purl.org/linked-data/cube#Observation> ;
<http://statistics.gov.scot/def/dimension/gender> <http://statistics.gov.scot/def/concept/gender/female> ;
<http://statistics.gov.scot/def/measure-properties/ratio> 6.3E-1 ;
<http://statistics.gov.scot/def/dimension/weightCategory> <http://statistics.gov.scot/def/concept/weight-category/clinical-underweight> .
``````

Replace the entity URIs for the objects above with variables, and stick a select verb around it to create a query

``````select * {
graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
<http://statistics.gov.scot/def/dimension/gender> ?gender ;
<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
}
}
limit 10
``````

Now we need some labels

``````PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select * {
graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
<http://statistics.gov.scot/def/dimension/gender> ?gender ;
<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
?refArea rdfs:label ?refAreaLabel.
?refPeriod rdfs:label ?refPeriodLabel .
?gender rdfs:label ?genderLabel.
}
}
limit 10
``````

But this doesn't return anything. Spoiler-alert: The labels are somewhere else - different graph/s

``````PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?g ?g1 ?g2 ?g3 ?g4 ?refArea ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio  WHERE {
graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
<http://statistics.gov.scot/def/dimension/gender> ?gender ;
<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .

}
graph ?g {
?refArea rdfs:label ?refAreaLabel.
}
graph ?g1 {
?refPeriod rdfs:label ?refPeriodLabel .
}
graph ?g2 {
?refPeriod rdfs:label ?refPeriodLabel .
}
graph ?g3 {
?gender rdfs:label ?genderLabel.
}
graph ?g4 {
?weightCategory  rdfs:label ?weightCategoryLabel .
}
}
LIMIT 10
``````

We can in this case remove the specification for individual named graphs and use the "Union Graph" - (think sets) Also, instead of everything we just return the labels

``````PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio {
<http://statistics.gov.scot/def/dimension/gender> ?gender ;
<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
?refArea rdfs:label ?refAreaLabel.
?refPeriod rdfs:label ?refPeriodLabel .
?gender rdfs:label ?genderLabel.
?weightCategory  rdfs:label ?weightCategoryLabel .
}
#limit 10
``````

From this we can use the yasgui embedded pivot table or charts to create some views on the results. We can use FILTER() conditions to focus the query better

``````PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio {
<http://statistics.gov.scot/def/dimension/gender> ?gender ;
<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
?refArea rdfs:label ?refAreaLabel.
?refPeriod rdfs:label ?refPeriodLabel .
?gender rdfs:label ?genderLabel.
?weightCategory  rdfs:label ?weightCategoryLabel .
FILTER(lcase(?refAreaLabel) = "scotland" && contains(?weightCategoryLabel,"Clinical - Healthy")  &&  ?genderLabel = "Male")
}
``````

Once we have a suitable query we need to think about how to use this in an application. We can use tools like Postman to help.

Using Postman to test, the "&&" cause problems, so use several FILTER() statements

``````Accept=application/SPARQL-Results+xml
``````
``````query=PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio {
<http://statistics.gov.scot/def/dimension/gender> ?gender ;
<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
?refArea rdfs:label ?refAreaLabel.
?refPeriod rdfs:label ?refPeriodLabel .
?gender rdfs:label ?genderLabel.
?weightCategory  rdfs:label ?weightCategoryLabel .
FILTER(lcase(?refAreaLabel) = "scotland")
FILTER(contains(?weightCategoryLabel,"Clinical - Healthy"))
FILTER(?genderLabel = "Male")
}
order by ?refPeriodLabel
``````