Skip to content

Instantly share code, notes, and snippets.

@mdamien
Created March 19, 2019 10:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mdamien/1f4d0d32642143fef3380f5f9f36bc33 to your computer and use it in GitHub Desktop.
Save mdamien/1f4d0d32642143fef3380f5f9f36bc33 to your computer and use it in GitHub Desktop.
wikipedia: page authorship information by wikiwho
# limitation: Special:Export method 1000 revisions limit (without pagination)
from WikiWho.examples.process_xml_dump import process_xml_dump
from WikiWho.utils import iter_rev_tokens
xml_file_path = '/home/ecu/repos/WikiWho/export_radio_meuh.xml' # export from Special:Export
wikiwho_obj = process_xml_dump(xml_file_path)
print(wikiwho_obj.title)
print(wikiwho_obj.ordered_revisions)
prev = None
for token in iter_rev_tokens(wikiwho_obj.revisions[wikiwho_obj.ordered_revisions[-1]]):
print(token.value, end='\t')
rev = wikiwho_obj.revisions[token.origin_rev_id]
if token.origin_rev_id != prev:
print("https://fr.wikipedia.org/w/index.php?title=%s&diff=prev&oldid=%s&diffmode=source" % (wikiwho_obj.title, token.origin_rev_id), end=' ')
print(rev.editor, end='')
print()
prev = token.origin_rev_id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment