Skip to content

Instantly share code, notes, and snippets.

@aolieman
Last active August 29, 2015 14:07
Show Gist options
  • Save aolieman/8c1a1edd023ec2751446 to your computer and use it in GitHub Desktop.
Save aolieman/8c1a1edd023ec2751446 to your computer and use it in GitHub Desktop.
Strategy for linking mentioned Dutch government and parliament members

Strategy for linking mentioned Dutch government and parliament members

Spotting

Mentions of government and parliament members in a text are found by regular expression. At least three cases can be distinguished here.

  1. A single person is mentioned by name
  2. Multiple persons are mentioned by name
  3. A person is mentioned only by his/her function

In the first two cases, the strategy is to make use of highly regular address styles that are used in parliamentary speech, and thus also in the proceedings. Some examples (in translation) are: sir, madam, member, colleague, minister, and secretary of state. Such an address is immediately followed by a member's last name. The pattern that is used to find these names needs to span as much of the name as possible, without including any other subsequent words.

In the third case, we do not have the benefit of knowing the name of the person who is mentioned. The regular expression therefore only finds the function (e.g. chairperson), which is optionally followed by a portfolio (e.g. the minister of agriculture).

Linking

The strings that are found by the regex need to be linked to a unique identifier of the goverment or parliament member. To do this with any success, several types of contextual information need to be taken into account. In any parliamentary debate, most of the people that are referred to are present in that session. The PoliticalMashup proceedings include a structured speakers list that can be used to resolve such mentions.

To disambiguate mentions of non-speaking members, we need to also use information external to the proceedings. From the proceedings metadata, the date and the house in which the debate took place are used to look up suitable candidate members.

Mentions by name

For each string where the regex matches, we take the address style and a list of names as input. A distinction is made between government and parliament members (their role) based on the address style. It is possible, however, for government members to be mentioned with a common address, but never the other way around.

First, the speakers list is searched for a member for which the role and name matches with the input. If no such speaker is found, the PoliticalMashup "ID Members" search interface is queried with the member's name, role, and address style, and the proceedings' date and house.

The mention is considered to be succesfully disambiguated if exactly one member is found. In the case of zero results, we also look for a previously mentioned government member with this name. If a member is still not found, we also consider a previously mentioned member with a longer or shorter version of this name as a valid candidate. If there instead are multiple results for the initial query, the disambiguation is considered to have failed.

No candidate members at the end of this process is interpreted as NIL (i.e. not a parliament member). This is done because non-members are sometimes mentioned in a way that is hard to distinguish from a Dutch member (e.g. mevrouw Merkel in Duitsland).

Mentions by function

The input that can be used from the regex when no name is mentioned, is the person's function and optionally his/her portfolio. If the portfolio is mentioned (and found), we should be able to disambiguate with a high confidence, because on any date there should only be one government member holding this function and portfolio. To find who is referred to, we've built an index of cabinet members from all members' biographies. This index is queried with a two-step search process: first by proceedings date and second by portfolio. Again, exactly one result is considered a succesful disambiguation, while multiple results indicate a failed disambiguation.

The second case we consider is when the function in question may be held by only one person at a time. This is the case with the prime-minister and possibly with the vice prime-minister. A caveat is that there may be multiple vice prime-ministers at the same time, in which case the disambiguation currently fails. A possible strategy is to look if one of the speakers is currently a vice prime-minister.

If no candidate has been found at this stage, we assume that the mentioned person is a speaker. The speakers list is searched for any members with the mentioned function (e.g. minister, secretary, chairperson). If there is a single speaker with this function, the disambiguation is considered succesful. If there are multiple speakers, we assume that the last-mentioned member with this function is mentioned here.

No disambiguated member at the end of this process is currently considered as an error. It is, however also possible that a foreign cabinet member is mentioned (e.g. Germany's minister of Economy and Energy). The Dutch proceedings still need to searched for how often this occurs (if at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment