Skip to content

Instantly share code, notes, and snippets.

@SamusAranX
Last active September 28, 2017 23:42
Show Gist options
  • Save SamusAranX/78ff97806d9b1a37d1da832f9099e4c2 to your computer and use it in GitHub Desktop.
Save SamusAranX/78ff97806d9b1a37d1da832f9099e4c2 to your computer and use it in GitHub Desktop.
A little bash snippet to download Apache directory listings while circumventing robots.txt rules and simple user-agent blocks. This script also uses Python 3 to adjust the --cut-dirs value to avoid folder clutter.
function apachelisting() {
CUT_DIRS=$(python3 -c "from urllib.parse import urlparse; import sys; print(len([d for d in urlparse(sys.argv[1]).path.split('/') if d]))" "$1")
wget -r --no-parent --reject "index.html*" -e robots=off --restrict-file-names=nocontrol --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" -nH --cut-dirs="$CUT_DIRS" "$1"
}
# usage: apachelisting http://someserver.tld/ftp/a/bunch/of/folders/
# this will recursively download everything from that directory listing and cut unnecessary folders away
# for example, http://someserver.tld/ftp/a/bunch/of/folders/stuff.zip will become ./stuff.zip on your machine
# likewise, http://someserver.tld/ftp/a/bunch/of/folders/evenmorefolders/junk.rar will become ./junk.rar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment