Skip to content

Instantly share code, notes, and snippets.

@tmtmtmtm
Last active June 30, 2020 08:26
Show Gist options
  • Save tmtmtmtm/7a436c0a81e61a1d25d441e63990a64c to your computer and use it in GitHub Desktop.
Save tmtmtmtm/7a436c0a81e61a1d25d441e63990a64c to your computer and use it in GitHub Desktop.
Adding Members of the 6th National Assembly of Azerbaijan to Wikidata

Step 1: Set up the Report

Setting this up first lets us see what information is already in Wikidata, and makes it easy to see progress.

This is mostly a cut'n'paste job from the report for the previous term, with the P2937 (legislative term) qualifier changed.

I can't find an existing Wikidata item for the new term, though, so I need to create that first.

Step 2: Set up Items for new Term

We already have items for the "Members of the 6th National Assembly of Azerbaijan", and the "2020 Azerbaijani parliamentary election", so I created "6th Convocation of the National Assembly of Azerbaijan", modelled after the 5th Convocation item, and linked these all together.

With those in place I can finish Step 1.

Step 3: Check for Existing Memberships

As I've only just created the item for the term, there can't be any memberships currently using it. But there might be some that have dates that intersect with the term, and which we'll need to add term qualifiers to first. Those can be checked via https://w.wiki/VQj:

SELECT ?item ?itemLabel ?start ?end ?term ?termLabel ?district ?districtLabel ?group ?groupLabel WHERE { 
  ?item p:P39 ?ps .
  ?ps ps:P39 [ wdt:P279* wd:Q21269547 ; pq:P580 ?start ] .
  OPTIONAL { ?ps pq:P582 ?end }
  OPTIONAL { ?ps pq:P768 ?district }
  OPTIONAL { ?ps pq:P4100 ?group }
  FILTER NOT EXISTS { ?ps pq:P2937 [] }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,az" . }
}

That gives no results, so nothing to be done here.

Step 4: Gather data from Wikipedia

This requires a few non-standard tweaks.

The first, and simplest is to explicitly map 'Bitərəf' to the Wikidata ID for an independent politician.

The trickier one is in dealing with constituencies. Currently only a very small number of the contituencies listed on that Wikipedia page are links, but it's from those links that we usually derive the Wikidata IDs (using the wikidata_ids_decorator scraper plugin). Without those links, we won't be able to add electoral district qualifiers. But as each one starts with a constituency number, and there already Wikidata items for all of them, I can build a lookup table to those, via a SPARQL query.

Step 5: Sanity-check the Party data

Sometimes Wikipedia links to the wrong place (e.g. a Disambiguation page), so our derived IDs aren't quite correct. So before using these IDs it's useful to check that they all seem sensible.

I have a standard query I use for this, filling in the IDs from the scraper output: https://w.wiki/VR9

These all look like the right sort of thing, so there's nothing that needs any additional tidying (e.g by editing Wikipedia page to point at the correct page).

Step 5: Sanity-check the list of Members

Similar to Step 5, but for the people. Again, it's fairly common for Wikipedia articles to link to disambiguation pages, or even a different person with the same name. So it's important to sanity check that everyone is human, alive in the relevant time period, is from the correct country, is a politician, etc. Again, this is a fairly standard query, into which I can slot the IDs from the scraper: https://tinyurl.com/az6members (this query is too long for the built in URL-shortener on the Query Service!)

None of the results stand out as obviously wrong, but quite a few have very little information, so I need to check each to make sure they're actually the correct person (and add some extra bio info to them in the process.

Step 5b: Bulk add "occupation: politician"

Enough of these don't have 'politician' set as their occupation in Wikidata to make it worthwhile using Petscan to help out with that. If we take all the people in the Category 'Azərbaycan Respublikası Milli Məclisinin VI çağırış deputatları' on AzWiki we can ensure each of those are marked as a politician in Wikidata (by feeding P106: Q82955 to our Petscan batch (after making sure to unclick the 'Members of the 6th National Assembly of Azerbaijan' item)

With those in place (and some time for the Query Service to catch up to our additions) only one person now doesn't have 'occupation: politician' from our original query: Nizami Səfərov. His Wikipedia page says he's a member (even though it doesn't have him in the relevant Category), so it looks like our list of people is fine to work with.

Step 6: Set everyone as a Member

Usually this step involves checking which are not already listed in Wikidata as a member for this term, but as the term item didn't exist before I created it, and Step 3 returned no results, I can just add a new P39 membership for everyone. The easiest way to do this is by passing a dynamic JS function file to wikibase-cli:

module.exports = id => ({
  id,
  claims: {
    P39: {
      value: 'Q21269547',
      qualifiers: {
        P2937: 'Q96738941'
      },
      references: {
        P143: 'Q58251',
        P4656: 'https://az.wikipedia.org/wiki/Azərbaycan_Milli_Məclisinin_VI_çağırışı'
      },
    }
  }
})

Saving that as add_P39.js, I can then run: awk -F, '{ print $1 }' wikipedia.csv | tail +2 | egrep \^Q | xargs -n 1 wb ee add_P39.js

This adds a (suitably referenced) "Member of the National Assembly of Azerbaijan (parliamentary term: 6th Convocation)" to each item from our scraper (e.g. Anar İsgəndərov)

That statement doesn't yet include information on the constituency or party: that will happen in the next step. Here, where we know there was no pre-existing data, it wouldn't have been too difficult to extend the add_P39.js script to also include those fields, and then pass them in, but in most cases Wikidata already has some information, so I like to keep these steps separate, and then use a generic approach to fill in all the missing data.

Step 7: Generate JSON of existing Wikidata

With the output from a SPARQL query for existing membership data for the term:

# List of P39 data for this term. Fetch with:
#     wd sparql term-members.sparql > wikidata.json

SELECT DISTINCT ?item ?itemLabel ?statement ?party ?area ?start ?end ?replacedBy ?cause WHERE {
  ?item p:P39 ?statement .
  ?statement ps:P39 wd:Q21269547 ; pq:P2937 wd:Q96738941 .
  OPTIONAL { ?statement pq:P580 ?start }
  OPTIONAL { ?statement pq:P582 ?end }
  OPTIONAL { ?statement pq:P768 ?area }
  OPTIONAL { ?statement pq:P4100 ?party }
  OPTIONAL { ?statement pq:P1366 ?replacedBy }
  OPTIONAL { ?statement pq:P1534 ?cause }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?item

plus the output from the scraper (in wikipedia.csv), I can then use the check-data script to check for additions to make to Wikidata:

bundle exec ruby check-data.rb wikipedia.csv wikidata.json | wd aq --batch --summary "Add missing qualifiers for Members of the 6th Convocation of the National Assembly of Azerbaijan"

All the missing qualifiers are then added as batch https://tools.wmflabs.org/editgroups/b/wikibase-cli/48d6d1985be92/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment