Skip to content

Instantly share code, notes, and snippets.

@sgraaf
Last active October 26, 2022 04:02
Show Gist options
  • Save sgraaf/3bf80b2d9579fcd5afa1bf0477cc4339 to your computer and use it in GitHub Desktop.
Save sgraaf/3bf80b2d9579fcd5afa1bf0477cc4339 to your computer and use it in GitHub Desktop.
Simple bash script to download the latest Wikipedia dump in the chosen language. Adapted from: https://github.com/facebookresearch/XLM/blob/master/get-data-wiki.sh
#!/bin/sh
set -e
LG=$1
WIKI_DUMP_NAME=${LG}wiki-latest-pages-articles.xml.bz2
WIKI_DUMP_DOWNLOAD_URL=https://dumps.wikimedia.org/${LG}wiki/latest/$WIKI_DUMP_NAME
# download latest Wikipedia dump in chosen language
echo "Downloading the latest $LG-language Wikipedia dump from $WIKI_DUMP_DOWNLOAD_URL..."
wget -c $WIKI_DUMP_DOWNLOAD_URL
echo "Succesfully downloaded the latest $LG-language Wikipedia dump to $WIKI_DUMP_NAME"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment