Last active
April 7, 2019 14:08
-
-
Save macloo/fe5a1c8d0141d239766c95f3f704ded3 to your computer and use it in GitHub Desktop.
Use Python Wikipedia-API to get text summary for any subject in a list of subjects
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Requires Wikipedia-API 0.5.1 or greater - and Python 3 | |
https://pypi.org/project/Wikipedia-API/ | |
""" | |
import wikipediaapi | |
w = wikipediaapi.Wikipedia('en') | |
p = w.page('N._K._Jemisin') | |
# print exactly 2 sentences from summary | |
print(w.extracts(p, exsentences=2)) | |
# print exactly 6 sentences from summary | |
# note - not all extracts have that many sentences | |
print(w.extracts(p, exsentences=6)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Requires Wikipedia-API and Python 3 | |
https://pypi.org/project/Wikipedia-API/ | |
""" | |
import wikipediaapi | |
w = wikipediaapi.Wikipedia(language='en', extract_format=wikipediaapi.ExtractFormat.WIKI) | |
subjects = ['N. K. Jemisin', 'Cixin Liu', 'Ann Leckie', 'John Scalzi', 'Red G. Bloo', 'Jo Walton'] | |
for subject in subjects: | |
p = w.page(subject) | |
if p.exists(): | |
print(p.summary, '\n') | |
print(p.fullurl, '\n') | |
else: | |
print(subject + ": No information available.\n") |
The package was updated April 7 to allow selection of extract length by number of sentences. @martin-majlis is amazing!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Only way to control length of summary is with split:
p.summary[0:60]
Regular API can get an extract, different from summary (shorter), but I haven't found a way to get this with Wikipedia-API.
Example:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&explaintext&exsentences=3& format=json&titles=Arundhati_Roy
.