Skip to content

Instantly share code, notes, and snippets.

@halfak
Last active December 12, 2015 17:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save halfak/1620beae124716504cba to your computer and use it in GitHub Desktop.
Save halfak/1620beae124716504cba to your computer and use it in GitHub Desktop.
Python 3.4.3 (default, Jul 28 2015, 18:20:59)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from revscoring.datasources import revision
>>> from revscoring.extractors import APIExtractor
>>> import mwapi
>>> extractor = APIExtractor(mwapi.Session("https://en.wikipedia.org"))
Sending requests with default User-Agent. Set 'user_agent' on mwapi.Session to quiet this message.
>>> extractor.extract(637398301, revision.tokens)[:10]
[Token('{{', type='dcurly_open'),
Token('bots', type='word'),
Token('|', type='etc'),
Token('deny', type='word'),
Token('=', type='equals'),
Token('DPL', type='word'),
Token(' ', type='whitespace'),
Token('bot', type='word'),
Token('}}', type='dcurly_close'),
Token('\n', type='whitespace')]
>>> extractor.extract(686575075, revision.content_tokens)[:10]
[Token('135', type='number'),
Token('px140px135px140px', type='word'),
Token(' ', type='whitespace'),
Token('Biology', type='word'),
Token(' ', type='whitespace'),
Token('deals', type='word'),
Token(' ', type='whitespace'),
Token('with', type='word'),
Token(' ', type='whitespace'),
Token('the', type='word')]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment