I can never remember how to download the ETCSL corpus, because it’s hosted on the the Oxford Text Archive (DOI: 20.500.12024/2518). Here are the instructions to download a local copy to the current directory.
curl -o etcsl-download.zip https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2518/allzip
unzip etcsl-download.zip
rm etcsl-download.zip
The texts themselves are contained within a zip archive, which we also want to unzip.
unzip etcsl.zip
rm etcsl.zip
The current directory should now have a copy of the ETCSL corpus.
tree
You might also want a .gitignore
file containing the files we
unzipped:
/etcsl/
/contents.txt
/corphdr.xml
/etcsl-extensions.dtd
/etcsl-extensions.ent
/etcslfullcat.html
/etcslmanual.html
/etcsl-sux.ent
/etcsl.xml
/header2518.xml
/readme.txt