Skip to content

Instantly share code, notes, and snippets.

@mattrayner
Created July 4, 2017 09:45
Show Gist options
  • Save mattrayner/3d039b1e04da138672550791675365c7 to your computer and use it in GitHub Desktop.
Save mattrayner/3d039b1e04da138672550791675365c7 to your computer and use it in GitHub Desktop.
Sparql docs
  • SPARQL Basics
    • CONSTRUCT
    • SELECT
  • Basic Syntax and Conventions
    • Variables
    • URIs
    • Indentation
    • Keywords
    • Prefixes
    • rdf:type
    • Full stops and semicolons
    • Types and predicates
  • Keywords
    • Binding variables
    • Filtering
    • Ordering
    • Counts
    • Maximum and minimum
  • Tips and Tricks
    • Always bring back types
    • Custom predicates
    • Using the ontology
    • Using existing queries
  • Advanced Queries
    • Union
    • Filtering with dates (including casting as dateTimes, now() and COALESCE)
    • Blank nodes
  • Updating the API
  • The future
    • listAs, displayAs and fullTitle
    • The houseIncumbency hack
    • Changes to party memberships
    • Additional data
  • Useful links

SPARQL Basics

SPARQL is the query language for the semantic web. It is a recursive acronym, which stands for SPARQL Protocol and RDF Query Language. It has some syntactical similarities to SQL, but querying linked data works quite differently to querying a relational database. RDF stands for Resource Description Framework. Here is a description from the W3C:

The Resource Description Framework (RDF) is a framework for representing information in the Web. The abstract syntax has two key data structures: RDF graphs are sets of subject-predicate-object triples, where the elements may be IRIs, blank nodes, or datatyped literals. They are used to express descriptions of resources. RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs.

Queries consist of statements: these can be thought of as patterns you are matching against or (in the case of a CONSTRUCT) a template for the data you wish to be returned. Statements always consist of three parts: subject, predicate, object. The subject, just as in a sentence, is what the statement is about. It can be a variable or URI. The predicate is a relationship connecting the subject and object. It can be a variable or URI. The object is the thing which the subject is related to. It can be a variable, URI or literal (eg. string, integer, date, etc.).

Variables always begin with a question mark, and URIs are written using either a prefix, parl:personGivenName, or angular brackets, <http://id.ukpds.org/schema/personGivenName>.

Some examples:

  • A person has the name Bob:
?person parl:personGivenName "Bob" .

The variable ?person is connected to the literal "Bob" by the predicate parl:personGivenName.

  • A person has a party membership:
?person parl:partyMemberHasPartyMembership ?partyMembership .

The variable ?person is connected the variable ?partyMembership by the predicate parl:partyMemberHasPartyMembership.

  • A person has a type parl:Person:
?person a parl:Person .

The variable ?person is connected to the URI parl:Person by the predicate a which is an alias for rdf:type (see the later section on rdf:type for more explanation about this).

There are four main types of query: CONSTRUCT, SELECT, DESCRIBE and ASK. In our parliament queries, we use CONSTRUCT, but SELECT is helpful sometimes to quickly query the triplestore or to verify part of a query. Sometimes, in more advanced queries, we also use them together. We have not had a use for DESCRIBE or ASK so far, so I will not go into more detail about those two, but you can read more about them here and here.

A CONSTRUCT query returns a graph; that is, a set of RDF triples. The format we usually request from the data API is n-triples, but there are many other data formats available for RDF data. Some of the more common are rdf+xml, json-ld and turtle. However, whichever format you request, you will always see the triplet pattern in the data.

A SELECT query returns a collection of variables, rather than triples. These can then be serialized into JSON or XML or another data format. It does not return a graph.

CONSTRUCT

A simple CONSTRUCT query consists of two basic parts:

CONSTRUCT {

}
WHERE {

}

The CONSTRUCT part will contain the statements used for the template of the graph you wish to return. This will be shaped as triples.

The WHERE part will contain the statements used to actually query the triplestore to return the information you need.

SELECT

A simple SELECT query has only one part:

SELECT * WHERE {
}

The body of the SELECT will contain the statements used the query the triplestore. The * means that you would like to return all variables. You can specify individual variables instead:

SELECT ?person WHERE {
    ?person a parl:Person .
}

You can specify as many or as few variables as you need.

Basic Syntax and Conventions

Variables

Variable names always begin with a question mark and are written in camelCase (with the first character always lowercase). Try to be semantic with your variable naming (as you would in any other language), and ensure that you use the same naming for a variable across all parts of a query (sounds obvious, but this is a really easy mistake to make).

Some examples:

?party
?seatIncumbency
?partyMembershipStartDate

URIs

URIs are written inside angular brackets.

For example,

<http://id.ukpds.org/1234>.

They can also be written with a prefix, eg.

parl:personFamilyName.

Indentation

When writing queries, we would usually follow the same sort of indentation rules as when writing Ruby. SPARQL is not sensitive to this though, so it is purely for making the queries easier to read.

Keywords

SPARQL keywords are usually written in capitals to easily distinguish them from variables or other parts of the query. This is not mandatory, however, so your query will still work if you write them in lowercase.

Prefixes

These are used to help simplify queries. They are given at the top of a query:

PREFIX parl: <http://id.ukpds.org/schema/>

CONSTRUCT {

}
WHERE {

}

This then allows you to replace the beginning part of a URI with the prefix.

For example, instead of writing <http://id.ukpds.org/schema/personGivenName> we can just write parl:personGivenName. There are several commonly used prefixes which you might see in the triplestore:

(Although parl is our pre-defined parliamentary prefix in the triplestore, in the data API, they have chosen to shorten this even further by just using a colon. So instead of parl:personGivenName, you would see :personGivenName.)

rdf:type

rdf:type is the shortened version of the URI <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>. It is defined as meaning 'the subject is an instance of a class'. However, it can be shortened even further to just 'a', and you will see this extensively within our queries. Used in this form, we do not need to use the rdf prefix at the beginning of the query.

For example,

?party a parl:Party .

The variable ?party is an instance of parl:Party (<http://id.ukpds.org/schema/Party>)

Full stops and semicolons

A SPARQL statement ends with a full stop (just like a sentence). Using semicolons, however, can allow us to write simpler queries by collecting together statements with the same subject.

For example, instead of:

?person parl:personGivenName ?givenName .
?person parl:personFamilyName ?familyName .
?person parl:personDateOfBirth ?dateOfBirth .
?party parl:partyName ?partyName .

we can write:

?person parl:personGivenName ?givenName ;
        parl:personFamilyName ?familyName ;
        parl:personDateOfBirth ?dateOfBirth .
?party parl:partyName ?partyName .

We use a semicolon at the end of each statement until we reach the last one. Here we use a full stop as the subject changes in the next statement. It is convention to indent the predicates with a shared subject. This makes the query easier to read, as it makes it clearer that these statements share one subject.

Types and predicates

Just like variables, types and predicates, by convention, are written in camelCase. However, predicates will always begin with a lowercase letter, whereas a type will begin with an uppercase letter. It is really easy to mix this up and make a mistake, so be careful with these.

Some examples:

predicates: parl:seatIncumbencyEndDate, parl:partyMembershipHasMember, parl:houseName

types: parl:ParliamentPeriod, parl:HouseIncumbency, parl:House

Keywords

Binding variables

The SPARQL command BIND can be used to associate a variable with an entity. We most commonly use it to bind a variable to a resource at the beginning of a query in the following way:

BIND(<http://id.ukpds.org/1234> AS ?parliament)

(However, as our queries are dynamic, we would be passing in the id.)

This then allows us to use the variable ?parliament in the query instead of having to repeat the URI throughout. An alternative way to do this is to use FILTER:

FILTER(?person = <http://id.ukpds.org/1234>)

However, using FILTER in this way has negative performance implications, so it is preferable to use BIND instead. (However, FILTER has lots of other applications which I will discuss later.)

BIND can be use within a query to assign a variable to a new variable.

BIND(?incumbency AS ?seatIncumbency)

We use this pattern frequently to separate house incumbencies and seat incumbencies.

BIND can also be used to cast a variable to a different type. For example, in some of the parliament queries, we use it to cast dates to datetimes:

BIND(xsd:dateTime(?startDate) AS ?startDateTime)

I will discuss this further in the later section on advanced queries.

One of the most complex examples of BIND we use is part of the queries which return the letters for the a-z.

BIND(UCASE(SUBSTR(?listAs, 1, 1)) AS ?firstLetter)

This takes the ?listAs variable (a person's sort name) and slices a substring of this variable (using SUBSTR) from the 1st character for a length of 1 character (ie. the first letter of their last name). This is then bound to the variable ?firstLetter.

Optionals

In linked data, there is no concept of nil or null, which is different to how data would be represented in a relationship database.

For example, in SQL, you could write the following query to find all people with no middle name:

SELECT * FROM members WHERE middle_names IS NULL;

This would return all rows from the table members where the entry had no middle name. However, as linked data is created from statements, in a triplestore, the statement ?person parl:personOtherNames ?middleNames simply won't exist.

This can make querying tricky, because another quirk of sparql is that if one statement in a query does not exist, then none of statements will be returned. This is because the entire query pattern must match for there to be a solution.

For example, if we run the following query:

PREFIX parl: <http://id.ukpds.org/schema/>
SELECT * WHERE {
    BIND(<http://id.ukpds.org/14Eva0nx> AS ?s)
    ?s parl:personGivenName ?givenName ;
    	parl:personOtherNames ?otherNames .
}

this will return nothing because this person has no other names.

This is where OPTIONAL comes in handy! It allows solutions to be returned where only some part of the query pattern matches. So we could change the previous query to this:

PREFIX parl: <http://id.ukpds.org/schema/>
SELECT * WHERE {
    BIND(<http://id.ukpds.org/14Eva0nx> AS ?s)
    ?s parl:personGivenName ?givenName .
    OPTIONAL { ?s parl:personOtherNames ?otherNames . }
}

Now we would expect to see the person and their given name.

OPTIONAL can be used to wrap a single statement, as in the above example, or to wrap a whole section of a query. The latter would be used when a statement is OPTIONAL, but then this statement has succeeding statements which depend upon its existence.

Large parts of our data set require the use of optionals. It is therefore important to consider whether data could possibly be optional when writing a query, and I would suggest to err on the side of caution here. Better to assume data might be missing or incomplete, than to find we are missing huge chunks on the frontend because we didn't include an OPTIONAL. Common examples include end dates (anything current will not have an end date), names (due to incomplete data), seat incumbencies and house incumbencies. It is possible to nest optionals and you will see this pattern frequently in some of our more complex queries. Neglecting to use enough OPTIONALS was largely what caused the issues we had when dissolution occurred as we had not anticipated that a house could have no current members in many of our queries.

Filtering

Often within our queries, it is necessary to filter out irrelevant or unwanted statements, and FILTER NOT EXISTS can be used to do this. We most frequently use this to remove past objects (and hence return only current objects) from our query. All temporal objects have a past subclass (eg. PartyMembership and PastPartyMembership, Incumbency and PastIncumbency, etc.), and it is this subclass we can filter out.

For example, to return only current seat incumbencies:

PREFIX parl: <http://id.ukpds.org/schema/>
SELECT * WHERE {
    ?s a parl:SeatIncumbency .
    FILTER NOT EXISTS { ?s a parl:PastIncumbency . }
}

This query is selecting all objects of type parl:SeatIncumbency but filtering out those of type parl:PastIncumbency, which is a subclass of parl:Incumbency. An alternative command which can be used to achieve similar results is MINUS. However, this was found to have serious performance implications particularly if used more than once, so we avoid using this now. MINUS works by removing solutions from the result set, whereas FILTER NOT EXISTS tests whether a pattern exists in the data as the query is performed.

Filters can also be used to select on a certain condition. We use this pattern quite frequently for string matching, particularly on our a-z/:letter and lookup by letters routes.

In lookup by letters we use: FILTER CONTAINS(LCASE(?houseName), LCASE("Commons"))

This is filtering for a ?houseName variable which contains the letters "Commons". Used as part of a query, we would expect this to return the House of Commons.

In a-z/:letter we use: FILTER STRSTARTS(LCASE(?listAs), LCASE("B"))

This is filtering for a ?listAs variable which starts with the letter B.

We can also use REGEX to match strings in SPARQL queries:

FILTER REGEX(str(?name), \"^B", 'i')

This would return the same result as the previous filter.

Ordering

Graph inherently has no order. This is important to be aware of as you can not rely on statements being returned in a particular order if you use CONSTRUCT.

It is possible, however, to order graph using ORDER BY and LIMIT. We don't use this very much in our queries as we don't limit them, but if you are only returning a certain number of results, then it is possible to order results in this way.

For example, here is LIMIT and ORDER BY being used to return the last three parliament periods:

PREFIX parl: <http://id.ukpds.org/schema/>
CONSTRUCT {
    ?parliament a parl:PastParliamentPeriod ;
                parl:parliamentPeriodStartDate ?parliamentPeriodStartDate ;
               	parl:parliamentPeriodEndDate ?parliamentPeriodEndDate .
}
WHERE {
    ?parliament a parl:PastParliamentPeriod ;
                parl:parliamentPeriodStartDate ?parliamentPeriodStartDate ;
               	parl:parliamentPeriodEndDate ?parliamentPeriodEndDate .
    }
    ORDER BY DESC(?parliamentPeriodStartDate)
    LIMIT 3

It is possible to order by ascending or descending, and also to order by more than one variable. If you are writing a SELECT query then you can use ORDER BY without LIMIT since you are returning variables rather than graph.

Counts

Counting variables can be achieved by using the COUNT keyword. COUNT is used with a SELECT, so queries containing counts look a bit different to our usual queries as we combine CONSTRUCT and SELECT. Within SPARQL, COUNT is an example of an aggregate, and so we have to use it in conjunction with GROUP BY. This will aggregate the variables within the query into groups.

For example, here is part of the query used to find the count of current members by party:

    { SELECT ?party ?partyName (COUNT(?member) AS ?memberCount) WHERE {
    ?party a :Party ;
        :partyName ?partyName ;
    	:partyHasPartyMembership ?partyMembership .
        FILTER NOT EXISTS { ?partyMembership a :PastPartyMembership . }
        ?partyMembership :partyMembershipHasPartyMember ?member .
        ?member :memberHasIncumbency ?seatIncumbency .
        ?seatIncumbency a :SeatIncumbency .
        FILTER NOT EXISTS { ?seatIncumbency a :PastIncumbency . }
	}
    GROUP BY ?party ?partyName

All the variables we wish to be returned must be included in the SELECT, as well as the the variable we wish to count. The count of the ?member variable is bound to the alias ?memberCount which will be used in the CONSTRUCT part of the query. Note that all the variables given in the SELECT also need to be present in the GROUP BY part of the query.

Sometimes, as in the above example, it is necessary to separate a count query from the rest of a query. We can then chain these queries together (see the later section on Advanced Queries). This might be because we are asked to return the count of a variable with different status (eg. return all members, but a count of only the current ones) or to keep the SELECT part of the query to a manageable size.

Maximum and minimum

Similar to COUNT are the aggregate functions MIN and MAX. Unlike COUNT, we can use these without a GROUP BY. We use MAX in the query to find a previous parliament.

SELECT (MAX(?parliamentPeriodEndDate) AS ?maxEndDate) 
WHERE {
    ?parliament 
        a :ParliamentPeriod ;
        :parliamentPeriodEndDate ?parliamentPeriodEndDate .
}

This finds the maximum (ie. most recent) parliamentPeriodEndDate and aliases it to ?maxEndDate. This variable can then be used later on in the query.

Eliminating duplicate results

There are sometimes situations where duplicate results could be returned. For example, with COUNT, we usually want to count unique instances of a variable. To avoid duplicates, we can use the keyword DISTINCT. This needs to be used within a SELECT. The most common place we use this is to return the letters for the a-z. Here is a portion of a query which returns the letters for the a-z for people:

    SELECT DISTINCT ?firstLetter WHERE {
        ?s a :Person .
        ?s <http://example.com/A5EE13ABE03C4D3A8F1A274F57097B6C> ?listAs .
        BIND(ucase(SUBSTR(?listAs, 1, 1)) as ?firstLetter)
    }

This returns us the list of letters which each person's sort name ?listAs begins with. Without the DISTINCT we would get back a list of thousands of letters, with many duplicates.

Tips and Tricks

Always bring back types

Within our CONSTRUCT, we always return the type of each object. We use the types in the front end for various things, such as filtering, and in some of the decorators. To make sure I always remember to include this, I always begin each part of the CONSTRUCT with the type of the variable.

Custom predicates

Using CONSTRUCT rather than SELECT to build our queries enables us to shape the data returned. One of the ways we can do this is by using custom predicates. Clearly, we should adhere to the nomenclature set out in the ontology, but sometimes we will need to return something not included in the ontology. In our queries, examples of this are the letters we return for the a-z and the counts. These both use custom predicates: parl:value and parl:count respectively. These were agreed with the data team. If a scenario arises where other custom predicates are required, you should consult with the data team first.

Using the ontology

Before beginning to write a query, it is a good idea to look at the Physical Ontology to make sure you understand the relationships between the objects you are being asked to return. (After a while the ontology will be in your head and you may even start dreaming about predicates, so you can skip this step!) You can see all the predicates, types and literals here and how the objects connect together. Be aware that some of the objects in the ontology do not exist in the triplestore yet. Also, there are a lot of subclasses in the ontology which we don't frequently use, but are there either due to inferencing or as aggregations of the same type of object (eg. parl:TemporalThing).

Another way to explore the data is using the GraphDB Workbench. You could, for example, start with a simple query to return a particular type of variable, and click around to see what it is connected with.

It is also important to consider how the objects will be used in the front end when designing a query. In the ontology, most predicates have an inverse. For example, you can link a person to an incumbency this way:

?person parl:memberHasIncumbency ?incumbency .

or using the inverse:

?incumbency parl:incumbencyHasMember ?person .

In the WHERE part of your query, it does not matter which way round you choose to use the predicate, but in the CONSTRUCT, you are effectively creating a template for how the n-triple data will be returned. If you chose the first predicate, then on the front end, they would be able to do person.incumbencies, whereas with the second, it would be incumbency.members. Therefore, before beginning a query, it is helpful to look at any designs created and to have a chat with the front end about how they expect to use the data in the view.

It is also important to think about which decorators will be used in the view, and therefore the data you will need to supply in order to use those decorators. For example, if we require a list of a member's parties, we would use person.parties. Behind the scenes in parliament-grom-decorators this uses party memberships. So within your CONSTRUCT you would need to have:

CONSTRUCT {
    ?person a parl:Person ;
            parl:partyMemberHasPartyMembership .
    
    ?partyMembership a parl:PartyMembership ;
                    parl:partyMembershipHasParty ?party .
    ?party a parl:Party ;
           parl:partyName ?partyName .
}

This will create n-triples linking a person to their party memberships, which in turn will each link to a party. If you wanted the inverse, ie. party.members, then you would need to reverse these statements.

Lastly, some of our queries return a huge number of statements. It is therefore important when writing queries that we aim to keep the CONSTRUCT as slim as possible, only returning the statements which are necessary to display the data required in the view.

Using existing queries

I very rarely write a query from scratch as many of our queries return similar data. Additionally, we are trying to achieve more consistency across our views, so it is important to always return the data following the same template. This means that in many of our queries, the CONSTRUCT part of the queries will be identical. You can see this across all our member queries (ie. all members, members of a party, members of a house, etc.). The WHERE part of each query will be different to bring back the correct set of members, but the CONSTRUCTs are all the same.

Advanced Queries

Union

The keyword UNION is used widely throughout our queries. We use it in two different ways: to match alternative patterns (think of it as meaning OR) and to combine queries.

The syntax to follow when using a UNION as an OR is as follows:

{
    first pattern
}
UNION {
    second pattern
}
UNION {
    third pattern
}
etc.

Note, the first pattern is only enclosed with curly braces, and we do not start using the UNION keyword until after it. You can have as many different patterns as you like, but mostly we only use two. The most common example of this in our queries is any query involving both houses, as we have to use different patterns to match seat incumbencies for the House of Commons and house incumbencies for the House of Lords. Consider the following:

{
    ?incumbency parl:houseIncumbencyHasHouse ?house .
    OPTIONAL { ?incumbency parl:incumbencyEndDate ?houseIncumbencyEndDate . }
    BIND(?incumbency AS ?houseIncumbency)        
}
UNION {
    ?incumbency parl:seatIncumbencyHasHouseSeat ?houseSeat .
    OPTIONAL { ?incumbency parl:incumbencyEndDate ?seatIncumbencyEndDate . }
    ?houseSeat parl:houseSeatHasHouse ?house .
    BIND(?incumbency AS ?seatIncumbency)
    OPTIONAL { ?houseSeat parl:houseSeatHasConstituencyGroup ?constituencyGroup .
        ?constituencyGroup parl:constituencyGroupName ?constituencyName .
        FILTER NOT EXISTS { ?constituencyGroup a parl:PastConstituencyGroup . }
    }
}

The first pattern matches incumbencies which are connected to ?house via the predicate parl:houseIncumbencyHasHouse. These will be incumbencies of type parl:HouseIncumbency. The second pattern matches incumbencies which are connected to ?houseSeat, which then connects to a ?house. These incumbencies will be of the type parl:SeatIncumbency. I have then bound the variable ?incumbency in each pattern to distinguish between the two types of incumbency. As members could have either or both types, on the front end we need to be able to distinguish between them to know which house a member belongs to now (or has belonged to in the past). I would then use the bound variable names in the CONSTRUCT part of the query to return both types separately.

Sometimes, it is possible to interchange between UNION and OPTIONAL and have the same data set returned. However, UNION is usually faster than OPTIONAL, so if appropriate, it is preferable to use UNION. If used together, UNION and OPTIONAL can return surprising results and not behave as expected, so it is important to carefully check the data returned is what you expect.

The second way of using UNION is used quite widely, particularly to return the a-z letters along with other data in a single combined query. The basic syntax is as follows:

CONSTRCT {
   ...statements
}
WHERE {
    { SELECT * WHERE {
        first query
        }
    }
    UNION {
        SELECT * WHERE {
            second query
        }
    }
    etc.
}

As long as you are careful about curly braces, this is a straightforward construction to use and is useful when you don't want to or can't combine everything into one query. The results of the different queries will be returned together in one set of n-triple statements. From the front end, we want to avoid making multiple calls to the API over HTTP; using UNION in this way allows us to avoid that.

Filtering with dates (including casting as dateTimes, now() and COALESCE)

In some of the parliament queries, I have used some complex filters with dates to return the correct parties and party memberships for a parliament period. The general pattern I am using is as follows:

BIND(xsd:dateTime(?partyMembershipEndDate) AS ?pmEndDateTime)
BIND(xsd:dateTime(?seatIncumbencyEndDate) AS ?incEndDateTime)
BIND(xsd:dateTime(?seatIncumbencyStartDate) AS ?incStartDate)
BIND(xsd:dateTime(?partyMembershipStartDate) AS ?pmStartDate)
              
BIND(COALESCE(?pmEndDateTime,now()) AS ?pmEndDate)
BIND(COALESCE(?incEndDateTime,now()) AS ?incEndDate)

FILTER (
    (?pmStartDate <= ?incStartDate && ?pmEndDate > ?incStartDate) ||
    (?pmStartDate >= ?incStartDate && ?pmStartDate < ?incEndDate)
)

Firstly, I am using BIND to cast various dates to the type xsd:dateTime (xsd is the prefix for http://www.w3.org/2001/XMLSchema#). By default, all the dates in the triple store are of type xsd:date. I then use COALESCE to compare the party membership end date and incumbency end date with now(), which returns the dateTime now. COALESCE returns the variable if is exists, otherwise it will return now(). I then bind the result to a new variable name. As now() returns an xsd:dateTime, this is why I had to cast the variables above to the same type, to allow this comparison to happen. In the filter, I am then comparing the dateTimes. This inequality is a bit easier to understand graphically:

            <-----------------------Incumbency------------------------>
<--------PartyMembership--------->
       <-------------------------PartyMembership----------------------------->
                        <------PartyMembership---------------->
                                <--------PartyMembership----------------------->

As you can see, there are four variations for a party membership to 'belong' to an incumbency. The inequalities in the filter matches those which begin before the incumbency and end after the incumbency starts (which matches the first two) or those which begin some time between the beginning and end of the incumbency (the second two). This complex filter therefore returns only the party memberships which a member held during their incumbency in a particular parliament.

Blank nodes

Blank nodes can be used in a CONSTRUCT as an anonymous variable. They are not bound to a named variable. We use them in the queries which return the letters for the a-z. There are two formats you can use:

[ parl:value ?firstLetter ]

or

_:x parl:value ?firstLetter .

They are returned in the n-triples in a similar format to the second variation. Another thing to note is that as they are anonymous variables, they do not have a type. You can filter them in the front end using Grom::BLANK. (See the grom gem for more detail on how this works.)

Updating the API

Now the data team has taken over the running of the API, the process for updating has a few more steps than when we maintained it. It is important to check your queries carefully as it is really easy to make mistakes. Chris Alcock from the data team handles deployment of the API, but if he's not available you can ask Wojciech too. Also, Wojciech or Jian Han in the data team know the most about SPARQL, so they are my go to if I get really stuck. Samu is the authority on the triplestore, so if you have a fundamental question about data structure, then he is the person to ask. Also, if you find something that just seems wrong in the data (and it does happen, particularly after an update or addition of new data), then speak to Wojciech as he imports and orchestrates all the data. He is really patient and will always help, but make sure you are clear about what the problem is, and have an example or evidence to illustrate the problem if possible.

These are the steps which I usually follow.

  1. If it's a new query, see if there are any similar ones to use as a basis, and consult the ontology if necessary.
  2. Open up the GraphDB workbench and write my query there (or adapt an existing query).
  3. Open up the Data API. I have parallels (which allows me to run Windows), so I can open it in Visual Studio, but if you're opening it on a mac, then I would use Sublime, Atom or Visual Studio Code. Open the solution Parliament.Data.Api.FixedQuery, then open the Controllers folder. The queries all live in the controllers which correspond to our Rails controllers.
  4. If you're just editing a query, then replace the existing one with your new query. Be careful if it has variables in it, such as ids, and make sure that you use the correct C# syntax. If you're writing a new query, then follow the existing pattern for creating an action and query carefully.
    (Note: in most of the queries, the data team have chosen to simplify the parl PREFIX. Instead of using parl, they have simply used a colon.)
  5. Run the tests to check your SPARQL syntax. I'm not sure how you would run these in Sublime or Atom, but in Visual Studio Code you should be able to run the tests from one of the menus at the top.
  6. I would usually also run the app, but I'm not sure how you do that from Visual Studio Code. Chris from the data team might be able to help you with this. You will definitely need to add a copy of the Secrets.config file. This is similar to our .env files in Rails and is gitignored. This step is useful as C# needs to compile, so running the app will check it can compile and build correctly. This is particularly important if you've added a new action or controller or changed anything other than just the query itself.
  7. Open a pull request on github. Be careful that your text editor/IDE hasn't added any extra files (particularly hidden ones) and that you haven't altered the line endings (you might get a warning about this). Message Chris and he will review it, merge and deploy it to the dev API for you.
  8. Check your changes in devci and make sure everything is working as expected.
  9. Make a cup of tea to celebrate your first successful API update!

The future

There are a few future changes which will have a significant impact on the API and/or queries.

listAs, displayAs and fullTitle

These three properties of a person come directly from MNIS. listAs is used as a sort name, displayAs is how members would like their names to be displayed and fullTitle is to display the full title for Lords. This was a hack as detailed here. At some point, names will be modelled correctly and the data will be available to fit this model. Currently, listAs and displayAs are used in all lists of members, so all these queries would need to be updated. fullTitle and displayAs are used on people/:id, so these would need to be updated here.

The houseIncumbency hack

In order to have Lords on the website, house incumbencies were created. Eventually, house incumbencies will no longer be needed and Lords will be modelled using seat incumbencies and seats. The blocker to this currently is a lack of data to link their seats and seat incumbencies. When this is resolved, this will require a major refactor of queries, and potentially a rethink of some of our decorators and logic in the front end.

The majority of the house queries contain a UNION to cater to each house and the different data models. This will no longer be necessary, so these queries should be simplified. Some of the other queries under parties and members may need to be refactored in a similar way. This will create issues on the front end as everyone will now have seat incumbencies, so we will no longer be able to use the different types of incumbencies to distinguish between members of the House of Commons or Lords. Therefore, you will probably have to modify the queries to return the house for each incumbency, and use this to check which house an incumbency belonged to.

The other queries which will be affected by this are the parliament queries which return members. As parliamentPeriods are currently only linked to seat incumbencies, I made no provision in these queries for houseIncumbencies. When Lords use seat incumbencies, then these queries will suddenly return Lords too. Therefore, if the aim is still to only return MPs with these queries, then there will need to be some sort of filter applied to these queries to exclude Lords. On the other hand, if it is decided that Lords are required before the hack is resolved, then a similar pattern as that used in house members queries involving UNION can be used to return them.

Changes to party memberships

Michael has proposed a new model for party affiliations. This is a more complex model than the current one, so would require a re-writing of all queries involving party memberships. However, this may be some time away as I am not sure that we have the necessary data to fit this model currently.

Renaming of routes

The data team have spoken about renaming all their routes to be simpler using only the controller and action names. This won't require a refactor of queries, but will require a refactor of our frontend application and the routes we generate to call the API.

Additional data

With the addition of data views, we could potentially think about writing some queries which are for data only. For example, there is additional data on people which we do not return currently (eg. gender, middle names, date of birth, etc.) because we are trying to keep the queries as slim as possible. If we had some data only routes, then there would be the potential to return richer data here.

Useful links

  • SPARQL documentation
  • Data Driven - this was our original prototype. It contains lots of queries, and shows you where we have come from.
  • Physical ontology - github - this is the ontology which the data team actually use to model the data. This is created based on Michael's higher level ontologies.
  • Physical ontology - data visualisation - set filter at the bottom to 0 to see all the literals.
  • Resources in the office: Wojciech is my go-to for help on SPARQL queries, Chris will help with the API, and Michael is extremely knowledgeable on linked data and the semantic web (if you ask him nicely, he may even give you his talk about linked data and the semantic web).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment