Skip to content

Instantly share code, notes, and snippets.

@aolieman
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aolieman/df6a7a7c8e083114cff1 to your computer and use it in GitHub Desktop.
Save aolieman/df6a7a7c8e083114cff1 to your computer and use it in GitHub Desktop.
Strategy for recognizing and linking Dutch political parties

Strategy for linking political parties

Build a name tree and a lookup map

  1. Retrieve all party identifiers
  2. For each party: parse XML and map known names to its identifier
  3. Build an Aho-Corasick tree from the party names

String matching and linking

The tree is searched with the proceedings input string (case-insensitive, leftmost longest-match), yielding the names that were matched. The lookup map is used to find the party identifier that corresponds with the matched name. The name is linked, unless it has already been recognized as a member's name. Another reason not to link a found name of a (single-member) party, is if it is part of a longer name, such as that of a motion or committee.

There is also a grey area where a longer name starts with a party name. In Dutch we could say, e.g., (1) VVD-fractie or (2) VVD-partijprogramma. Case one, we could argue, is functionally synonymous with the party itself when found in parliamentary proceedings. Case two, however, commonly refers to an artifact, or is more loosely used to denote a certain consensus within the party. It does not refer to the party as a whole.

Finally, there is an issue with party names that also occur as common words. Because a case-insensitive search is used, we run the risk of annotating the common usage of these words. A simple solution is to never annotate these names in their lowercase forms, although several names/acronyms remain ambiguous.

Dutch examples of such party names are:

  • Nieuw Nederland
  • LEF (Lijst 17)
  • Volkspartij
  • EB
  • Mens
  • WO (common usage also as acronym)
  • Vrije Boeren
  • C.D. (common as acronym and initials)
  • BP (also British Petroleum)
  • Ab (common male first name)
  • VAR (also Verklaring arbeidsrelatie)
  • Jong
  • TON
  • LSP (also Landelijk Schakelpunt)

Other political organisations

Currently, the focus is solely on political parties. The names and acronyms of other political organisations can be found as appendices of government documents, e.g. http://www.rijksbegroting.nl/2013/voorbereiding/begroting,kst173857_26.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment