Skip to content

Instantly share code, notes, and snippets.

View lokal-profil's full-sized avatar

André Costa lokal-profil

View GitHub Profile
@lokal-profil
lokal-profil / files_by_month.py
Last active December 21, 2023 15:04
Count the number of files created per month in a Commons category
# List creation times for all files in a category
from collections import Counter
import pywikibot
from tqdm import tqdm
def get_creation_month(fp):
creation_time = fp.oldest_revision.timestamp
return creation_time.isoformat().rpartition('-')[0]
@lokal-profil
lokal-profil / deriv_detector.py
Last active December 21, 2023 15:05
Detect links between files in a Commons category, giving a suggestion for which might be derivatives
# https://commons.wikimedia.org/w/api.php?action=query&format=json&prop=linkshere&continue=gcmcontinue%7C%7C&generator=categorymembers&formatversion=2&lhprop=pageid%7Ctitle%7Credirect&lhnamespace=6&lhshow=!redirect&lhlimit=max&gcmtitle=Category%3A100%20000%20Bildminnen&gcmtype=file&gcmlimit=max
import json
import pywikibot
from tqdm import tqdm
def get_infiles(fp):
links = fp.backlinks(filter_redirects=False, namespaces=['file'])
return [link for link in links]
@lokal-profil
lokal-profil / get_diff_stats.py
Last active December 19, 2023 16:32
Takes a commonsdiff output file and outputs some statistics.
"""Takes a commonsdiff output file and outputs some statistics."""
import json
def check_changes(d, key, stats_data):
key_data = d.get(key)
if(key_data.get('added') or key_data.get('removed')):
stats_data[key] += len(key_data.get('added')) - len(key_data.get('removed'))
return True
return False
@lokal-profil
lokal-profil / get_media_views.py
Last active October 25, 2023 07:01
A small CLI for taking a Wikimedia Commons category and retrieving the media views for each file.
"""Get all media-views/mediarequests for files in a category for a time span.
Returns only human views.
Limitations:
* Does a Rest-API call per file (and one to the Action API).
* If the time span includes the current month the results will likely be partial.
* Assumes a file has always been a member of the category if it is a member of it today.
* The statistics only go back to 2015.
"""
@lokal-profil
lokal-profil / get_caption_pywiki.py
Last active December 18, 2023 22:58
A small CLI for taking a Wikimedia Commons category and, for each file, retrieve the captions from any Wikimedia-project page where it is used.
"""Gets all images in a category and for each gets all global usages and associated captions.
Limitations:
* Not all Wikimedia wikis support returning captions.
* Captions in <gallery>-tags are only returned if retrieve_gallery is set to True.
* It does not filter by Namespace (but namespace is displayed in the results)
This loops over all the images in the category and then over all pages which they appear on,
so it isn't fast and doesn't make use of e.g. combined Action-API calls.
"""
@lokal-profil
lokal-profil / wcvp_merge.py
Last active November 2, 2023 15:38
Merges the two files in wcvp.zip from Kew Gardens on plant_name_id, split the result by family
import csv
from collections import defaultdict
from tqdm import tqdm
distribution_file = "wcvp_distribution.csv"
names_file = "wcvp_names.csv"
merge_file = "output_family/merge_{}.csv"
fieldnames_distribution = None
plant_id = defaultdict(list)
demo = False # only output matches for plant_name_id = 1 or 2
<?php
/**
* Created by User:Prolineserver
* Released under GPL per the following statement
* https://sv.wikipedia.org/w/index.php?title=Anv%C3%A4ndardiskussion:Prolineserver&oldid=53364425#Licens_f%C3%B6r_Slumpartikel
* @license GPL-3.0-or-later
*/
if(isset($_REQUEST['source'])){
if(intval($_REQUEST['source']) == 1) {
@lokal-profil
lokal-profil / gist:33238fae106db70249f766d4a30f0314
Last active March 11, 2021 08:37
OpenRefine: Matching a column of external identifiers to WIkidata entities
# Allows for reconciliation against Wikidata using _only_ an external identifier.
# This differs from the normal reconciliation which would use such a column together
# with normal matching techniques such as label matching.
# Add column by fetching URLs...
return "https://query.wikidata.org/sparql?format=json&query=SELECT%20DISTINCT%20%3Fq%20%7B%20VALUES%20%3Fvalue%20%7B%20%22{val}%22%20%22{val}%22%20%22{val}%22%20%7D%20.%20%3Fq%20wdt%3AP{prop}%20%3Fvalue%20%7D".format(prop=1260, val=value)
# more legible version of the above
import urllib
query = "SELECT DISTINCT ?q {{ VALUES ?value {{ '{val}' '{val}' '{val}' }} . ?q wdt:P{prop} ?value }}".format(prop=1260, val=value)
@lokal-profil
lokal-profil / litteraturbanken_scrape.py
Created October 12, 2020 21:37
Short script for scraping literaturbanken book images
#!/usr/bin/python
# -*- coding: utf-8 -*-
# short script for scraping literaturbanken book images
import requests
from tqdm import tqdm
def download_single(num, prefix, url):
num_s = '{0:04}'.format(num)
full_url = url.format(num_s)
output_file = '{1}_{0}.jpg'.format(num_s, prefix)
@lokal-profil
lokal-profil / activate_ast_on_vagrant.md
Last active June 24, 2020 20:41
Instructions for setting up phan on Vagrant

Install php-ast

Phan needs php-ast to run, unfortunately this does not come included in the MediaWiki-Vagrant installation. To install it follow these instructions (largely influenced by Mainframe98). You can also try to run the associated shell script to automate this part.

From the vagrant home directory run vagrant up to spin up vagrant then run vagrant ssh to enter your vagrant shell.

Inside Vagrant

Run the following and accept any prompts

wget http://pear.php.net/go-pear.phar
php go-pear.phar