Media URLs scrapped out of wikipedia_en_top1m logfile with:
cat 28c070f7906bf9674d93ad36_mwoffliner.log | grep ".webm" | grep -v api.php | grep Downloading | sed -e 's/.* \[//' -e 's/\].*//' | grep -v .jpg | sort | uniq > videofiles.txt
for video and
cat 28c070f7906bf9674d93ad36_mwoffliner.log | grep ".ogg" | grep -v api.php | grep Downloading | sed -e 's/.* \[//' -e 's/\].*//' | grep -v .jpg | grep -v '.png' | grep -v maps | grep -v load.php | sort | uniq > audiofiles.txt
for audio