Skip to content

Instantly share code, notes, and snippets.

@halfak
Created March 5, 2020 22:03
Show Gist options
  • Save halfak/b9ce3f174a066e4851d04a2de7d2437d to your computer and use it in GitHub Desktop.
Save halfak/b9ce3f174a066e4851d04a2de7d2437d to your computer and use it in GitHub Desktop.
Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mwapi
>>> from revscoring.languages import english
>>> from revscoring.dependencies import solve
>>> doc = mwapi.Session("https://en.wikipedia.org").get(action="query", prop="revisions", titles="Alan Turing", rvprop="content", formatversion=2)
Sending requests with default User-Agent. Set 'user_agent' on mwapi.Session to quiet this message.
The following query raised warnings: {'format': 'json', 'prop': 'revisions', 'rvprop': 'content', 'titles': 'Alan Turing', 'formatversion': 2, 'action': 'query'}
- revisions -- {'warnings': 'Because "rvslots" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used.'}
- main -- {'warnings': 'Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application.'}
>>> text = doc['query']['pages'][0]['revisions'][0]['content']
>>> text[:100]
'{{Redirect|Turing}}\n{{short description|English mathematician and computer scientist}}\n{{Use British'
>>> len(text)
>>> solve(english.idioms.revision.datasources.matches, cache={"datasource.revision.text": text})
['head and shoulders', "Who's Who", 'of a', 'of a', 'head and shoulders', 'fall between two stools', 'first love', 'of a', 'as if', 'kind of', 'hold on', 'guess what', 'of a', 'of a', 'According to', 'In addition', 'of an', 'end of', 'of a', 'According to', 'by hand', 'of a', 'of an', 'used to', 'of a', 'in detail', 'back into', 'All that', 'end of', 'rule out', 'of a', 'used to', 'that way', 'factor in', 'outside world', 'of a', 'of a', 'of a', 'of a', 'much less', 'according to', 'According to', 'of a', 'According to', 'Wall Street', 'of A', 'by hand', 'piece of work', 'of a', 'of a', 'of A', 'go through with', 'at that', 'used to', 'used to', 'In addition', 'set down', 'Iron Curtain', 'According to', 'According to', 'to go', 'to go', 'put the clock back', 'of course', 'and all', 'as well', 'of an', 'of a', 'if only', 'end of', 'of a', 'of a', 'of a', 'The man', 'as well', 'in the Making']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment