Skip to content

Instantly share code, notes, and snippets.

@urbanecm
Last active January 28, 2019 17:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save urbanecm/8a090da58429b121067bf491d1e9a510 to your computer and use it in GitHub Desktop.
Save urbanecm/8a090da58429b121067bf491d1e9a510 to your computer and use it in GitHub Desktop.

This is script for dumping data from old Extension:EducationProgram to wiki, because only SQL dumps were provided (https://dumps.wikimedia.org/other/educationprogram/).

The requirement for running this script is to have: a) ToolForge access b) some templates (with syntax equivalent to the one Outreach Dashboard currently uses, so if you do have editing from the Dasbhaord enabled, you should have the templates); if you don't have such templates, adapt those from cs.wikipedia. If you want to, you can adapt the script to not require a) or b).

Before running, you must firstly adapt the script for your wiki. This means the following:

  1. Adapt line 7 and line 8 to create correct DB connection and site connection
  2. Adapt line 25, line 40, line 53 and line 54 to your template names.
  3. Adapt summary on line 55

After you have customized the script, you can do the following:

$ virtualenv -p python3 venv
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in venv/bin/python3
Also creating executable in venv/bin/python
Installing setuptools, pip...done.
$ source venv/bin/activate
(venv) $ pip install -U pip
Downloading/unpacking pip from https://files.pythonhosted.org/packages/5f/25/e52d3f31441505a5f3af41213346e5b6c221c9e086a166f3703d2ddaf940/pip-18.0-py2.py3-none-any.whl#sha256=070e4bf493c7c2c9f6a08dd797dd3c066d64074c38e9e8a0fb4e6541f266d96c
  Downloading pip-18.0-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
  Found existing installation: pip 1.5.4
    Uninstalling pip:
      Successfully uninstalled pip
Successfully installed pip
Cleaning up...
(venv) $ pip install -U setuptools
Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/96/06/c8ee69628191285ddddffb277bd5abdf769166e7a14b867c2a172f0175b1/setuptools-40.4.3-py2.py3-none-any.whl (569kB)
    100% |████████████████████████████████| 573kB 843kB/s 
Installing collected packages: setuptools
  Found existing installation: setuptools 2.2
    Uninstalling setuptools-2.2:
      Successfully uninstalled setuptools-2.2
Successfully installed setuptools-40.4.3
(venv) $ pip install toolforge
Collecting toolforge
  Downloading https://files.pythonhosted.org/packages/90/d9/789346a997731e54c6fb4155208876ff414e9373c5b3dc57027cafe9a164/toolforge-4.1.0-py2.py3-none-any.whl
Collecting requests (from toolforge)
  Downloading https://files.pythonhosted.org/packages/65/47/7e02164a2a3db50ed6d8a6ab1d6d60b69c4c3fdf57a284257925dfc12bda/requests-2.19.1-py2.py3-none-any.whl (91kB)
    100% |████████████████████████████████| 92kB 2.0MB/s 
Collecting pymysql (from toolforge)
  Downloading https://files.pythonhosted.org/packages/a7/7d/682c4a7da195a678047c8f1c51bb7682aaedee1dca7547883c3993ca9282/PyMySQL-0.9.2-py2.py3-none-any.whl (47kB)
    100% |████████████████████████████████| 51kB 1.6MB/s 
Collecting idna<2.8,>=2.5 (from requests->toolforge)
  Downloading https://files.pythonhosted.org/packages/4b/2a/0276479a4b3caeb8a8c1af2f8e4355746a97fab05a372e4a2c6a6b876165/idna-2.7-py2.py3-none-any.whl (58kB)
    100% |████████████████████████████████| 61kB 1.8MB/s 
Collecting chardet<3.1.0,>=3.0.2 (from requests->toolforge)
  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
    100% |████████████████████████████████| 143kB 2.1MB/s 
Collecting urllib3<1.24,>=1.21.1 (from requests->toolforge)
  Downloading https://files.pythonhosted.org/packages/bd/c9/6fdd990019071a4a32a5e7cb78a1d92c53851ef4f56f62a3486e6a7d8ffb/urllib3-1.23-py2.py3-none-any.whl (133kB)
    100% |████████████████████████████████| 143kB 2.5MB/s 
Collecting certifi>=2017.4.17 (from requests->toolforge)
  Downloading https://files.pythonhosted.org/packages/df/f7/04fee6ac349e915b82171f8e23cee63644d83663b34c539f7a09aed18f9e/certifi-2018.8.24-py2.py3-none-any.whl (147kB)
    100% |████████████████████████████████| 153kB 2.1MB/s 
Collecting cryptography (from pymysql->toolforge)
  Downloading https://files.pythonhosted.org/packages/59/32/92cade62c645756a83598edf56289e9b19aae5370642a7ce690cd06bc72f/cryptography-2.3.1-cp34-abi3-manylinux1_x86_64.whl (2.1MB)
    100% |████████████████████████████████| 2.1MB 337kB/s 
Collecting asn1crypto>=0.21.0 (from cryptography->pymysql->toolforge)
  Downloading https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)
    100% |████████████████████████████████| 102kB 2.8MB/s 
Collecting six>=1.4.1 (from cryptography->pymysql->toolforge)
  Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting cffi!=1.11.3,>=1.7 (from cryptography->pymysql->toolforge)
  Downloading https://files.pythonhosted.org/packages/64/3d/684e2f507c61995ee725c44e6f34e7a6a9b7286161ef370575f1bbda3899/cffi-1.11.5-cp34-cp34m-manylinux1_x86_64.whl (421kB)
    100% |████████████████████████████████| 430kB 1.2MB/s 
Collecting pycparser (from cffi!=1.11.3,>=1.7->cryptography->pymysql->toolforge)
  Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
    100% |████████████████████████████████| 163kB 2.3MB/s 
Installing collected packages: idna, chardet, urllib3, certifi, requests, asn1crypto, six, pycparser, cffi, cryptography, pymysql, toolforge
  Running setup.py install for pycparser ... done
Successfully installed asn1crypto-0.24.0 certifi-2018.8.24 cffi-1.11.5 chardet-3.0.4 cryptography-2.3.1 idna-2.7 pycparser-2.19 pymysql-0.9.2 requests-2.19.1 six-1.11.0 toolforge-4.1.0 urllib3-1.23
(venv) $ python dumpcourses.py
Sleeping for 9.0 seconds, 2018-09-25 19:53:31
Page [[Wikipedie:Nástěnka/Gymnázium Jana Keplera/Školení (2015)]] saved
[output was redacted]

Licensing

The script is hereby released into public domain.

st#!/usr/bin/env python
#-*- coding: utf-8 -*-
import pywikibot
import toolforge
site = pywikibot.Site('cs', 'wikipedia')
conn = toolforge.connect('cswiki')
prefix = "Wikipedie:Nástěnka/%s"
with conn.cursor() as cur:
cur.execute('SELECT course_id, org_name, course_name, course_title, course_start, course_end, course_description FROM ep_courses JOIN ep_orgs ON org_id=course_org_id')
courses = cur.fetchall()
for course in courses:
course_page = pywikibot.Page(site, prefix % course[3].decode('utf-8'))
with conn.cursor() as cur:
cur.execute('SELECT user_name FROM ep_users_per_course JOIN user ON user_id=upc_user_id WHERE upc_course_id=%s AND upc_role=1 /* instructor */ LIMIT 1;', course[0])
try:
instructor = cur.fetchall()[0][0].decode('utf-8')
except:
instructor = ""
start_date = "%s-%s-%s 00:00:00 UTC" % (course[4].decode('utf-8')[0:4], course[4].decode('utf-8')[4:6], course[4].decode('utf-8')[6:8])
end_date = "%s-%s-%s 00:00:00 UTC" % (course[5].decode('utf-8')[0:4], course[5].decode('utf-8')[4:6], course[5].decode('utf-8')[6:8])
course_page.text = """{{Nástěnka/Kurz\n
| course_name = %s
| instructor_username = %s
| instructor_realname =
| support_staff =
| subject =
| start_date = %s
| end_date = %s
| institution = %s
| expected_students = 0
| assignment_page = %s
| dump_from_EP = yes
}}
%s
{{Nástěnka/Seznam účastníků/Začátek tabulky}}
""" % (course[2].decode('utf-8'), instructor, start_date, end_date, course[1].decode('utf-8'), course_page.title(), course[6].decode('utf-8'))
with conn.cursor() as cur:
cur.execute('SELECT user_id, user_name FROM ep_users_per_course JOIN user ON user_id=upc_user_id WHERE upc_course_id=%s AND upc_role=0 /* student */;', course[0])
students = cur.fetchall()
for student in students:
with conn.cursor() as cur:
cur.execute('SELECT CONCAT("[[", article_page_title, "]]") FROM ep_articles WHERE article_course_id=%s AND article_user_id=%s;', (course[0], student[0]))
tmp = cur.fetchall()
articles = []
for article in tmp: articles.append(article[0].decode('utf-8'))
articles = ", ".join(articles)
course_page.text += "{{Nástěnka/Seznam účastníků/Řádek tabulky|%s|%s|%s}}\n" % (student[1].decode('utf-8'), articles, "")
course_page.text += "{{Nástěnka/Seznam účastníků/Konec tabulky}}"
course_page.save('Robot: Dump dat ze starého kurzového rozhraní')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment