Skip to content

Instantly share code, notes, and snippets.

View tmtmtmtm's full-sized avatar

Tony Bowden tmtmtmtm

  • Tallinn, Estonia
  • X @tmtm
View GitHub Profile
@tmtmtmtm
tmtmtmtm / wtf_with_wd.md
Last active January 8, 2023 09:24
augmenting wikipedia infobox links with Wikidata IDs

wtf_wikipedia is a wonderful tool for extracting structured data from Wikipedia pages. One of the main ways I use it is to extract information from politicians' infoboxes about the positions they've held, to compare this with what Wikidata knows.

To make processing these a lot simpler, I've often wished that the JSON returned from wft_wikipedia could be augmented with the Wikidata IDs for any linked item. So, for example, when getting officeholder data for Kaja Kallas, instead of

          "office": {
            "text": "19th Prime Minister of Estonia",
            "links": [
              {
                "type": "internal",
@tmtmtmtm
tmtmtmtm / add-armenian-assembly-member.js
Created October 22, 2020 10:22
wikibase-cli template to add a member of the National Assembly of Armenia
// Add a member of the Armenian National Assembly
module.exports = (id, name) => ({
labels: { "en": name },
descriptions: { "en": "Armenian politician" },
claims: {
P31: { value: 'Q5' },
P106: { value: 'Q82955' },
P39: {
value: 'Q17277248',
qualifiers: { P2937: 'Q61165268' },
@tmtmtmtm
tmtmtmtm / add-missing-category-P39s.md
Last active October 9, 2020 13:50
Add missing P39s based on Wikipedia Categories

Given two small wikibase-cli scripts:

category-members.js:

module.exports = (positionid) => {
  return `
    SELECT ?item (wd:${positionid} AS ?position) ?categoryPage
    WHERE {
      ?categoryPage schema:isPartOf <https://en.wikipedia.org/>;
@tmtmtmtm
tmtmtmtm / constituency-bodies.md
Last active August 22, 2020 16:49
Ensure UK Parliament Constituencies in Wikidata know which legislature the 'number of representatives' qualifier applies to

All Wikidata items for UK Parliamentary Constituencies (current and historic) should have a P1410 "number of representatives in legislature" claim (currently this is always 1, but historically some had more than this). The main statement generally exists, but in the vast majority of cases (2136) it is missing the required qualifier to say which legislature this applies to, and in 17 of the ones with the qualifier, it's pointing at Parliament rather than the House of Commons.

Step 1: Migrate 'Parliament' to 'House of Commons'

fix-constituency-body-uk.rb:

SELECT DISTINCT ?reps
WHERE {
 ?item wdt:P31 wd:Q27971968 ; p:P1410 ?reps .
@tmtmtmtm
tmtmtmtm / family-name.md
Last active August 6, 2020 08:31
Working with family names in Wikidata using wikibase-cli

Working with surnames in Wikidata

These use wikibase-cli and jq.

Find an existing family name item:

find-surname.js:

module.exports = name => `SELECT ?item WHERE { ?item wdt:P31 wd:Q101352; rdfs:label '${name}'@en }`
@tmtmtmtm
tmtmtmtm / no-p39-p580.md
Last active July 30, 2020 06:24
Migrating Norwegian P39/P580s

A term of the Norwegian Storting always starts on October 1st. When that's a Sunday, the opening is the next day instead (October 2nd), but any memberships of that term do still start on the 1st.

So we want to udpate any P39/P580s which are listed with that October 2nd date to the October 1st date instead.

(See conversation at https://www.wikidata.org/w/index.php?title=Wikidata%3ARequest_a_query&type=revision&diff=1241548292&oldid=1241336300)

In recent years this affects the 2017-2021 and 1989-1993 terms.

We can find these via:

@tmtmtmtm
tmtmtmtm / uk-by-elections.md
Last active July 27, 2020 08:27
UK Parliamentary by-election cleanups

Set P31 to "UK Parliamentary by-election"

Look for items that are set as the 'elected in' for a UK MP, but have instance of: 'by-election', rather than 'UK Parliamentary by-election'

SELECT DISTINCT ?item ?itemLabel
WHERE 
{
  ?person p:P39 [ps:P39/wdt:P279 wd:Q16707842; pq:P2715 ?item].
 ?item wdt:P31 wd:Q1057954 .
@tmtmtmtm
tmtmtmtm / election-winner-quantity-migration.md
Last active July 23, 2020 08:21
Migrate 'quantity' of seats won by a party in an election to 'number of representatives'

Per discussion at https://www.wikidata.org/wiki/Property_talk:P1410

election-winner-quantity.rq:

SELECT DISTINCT ?st 
WHERE {
  ?st ps:P991 ?winner ; pq:P1114 ?quantity .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
@tmtmtmtm
tmtmtmtm / lithuania-11-members.md
Last active July 13, 2020 10:53
Add Members of the 11th Lithuanian Seimas to Wikidata

Step 1: Add a tracking page

The first step for something like this is always to not only see what Wikidata already knows, but to capture that with a Listeria page, so we can track changes over time. Here that's WikiProject every politician/Lithuania/data/Seimas/11th. That initially has no members, which is sometimes a sign that the data has been entered in different way: e.g. with start/end dates rather than legislative terms. But a check for that approach (https://w.wiki/Wmi) shows no entries either, so we're working from a clean slate, and can continue with the term-based approach already taken for the 12th Seimas.

Step 2: Look for a Wikipedia category

If any of the Wikipedias have a category of "Member of the 11th Seim

A bunch of French municipal elections are showing up on the Recent elections with no Country report. Looking at some of those, it seems there's also a bit of a mismatch of modelling, with some being 'instance of: French municipal elections 2020', and some 'instance of: municipal election'+'part of: French municipal elections 2020'. The latter seems better to me, so I'm going to also migrate all the former to that whilst I'm here.

So, to select those:

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q60846649 .
  FILTER NOT EXISTS { ?item wdt:P17 [] }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}