Skip to content

Instantly share code, notes, and snippets.

@morkapronczay
Created October 11, 2019 13:16
Show Gist options
  • Save morkapronczay/c397c46add79a7be7f13cdab521d33a5 to your computer and use it in GitHub Desktop.
Save morkapronczay/c397c46add79a7be7f13cdab521d33a5 to your computer and use it in GitHub Desktop.
import wikipedia as wp
def extract_content_pages(files, page_list, languages=languages):
# iterate over languages
for lang in languages:
print(lang)
wp.set_lang(lang)
try:
files[lang]
except KeyError:
files[lang] = {}
# iterate over page names
for i, name in enumerate(tqdm(page_list)):
try:
page_content = wp.page(wp.search(name)[0]).content
files[lang][name] = page_content
except:
continue
return files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment