Skip to content

Instantly share code, notes, and snippets.

@amacal
Last active November 10, 2020 13:46
Show Gist options
  • Save amacal/87a5bba178d6506b8471d0fcca9304cd to your computer and use it in GitHub Desktop.
Save amacal/87a5bba178d6506b8471d0fcca9304cd to your computer and use it in GitHub Desktop.
from ftplib import FTP
from re import compile
names = list()
archive = compile('[^0-9](\.xml\.bz2|\.xml\.gz)$')
ftp = FTP('ftp.acc.umu.se')
ftp.login()
ftp.cwd('mirror/wikimedia.org/dumps/enwiki/20201020/')
ftp.retrlines('NLST', lambda x: names.append(x))
ftp.quit()
datasets = [name for name in names if archive.search(name) is not None ]
assert len(datasets) == 8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment