Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
wget --spider -o wget.log -e robots=off --wait 3 -r -p -S http://
grep -ri 'http://' wget.log | grep -E -v '(files/|\.jpg|\.jpeg|\.gif|\.css|\.js|\.pdf|\.png|\.xls)' | awk '{print $3}'|sort|uniq|sort > site_map.txt
cat $1 |grep -i -E -v '(\.jpg|\.jpeg|\.gif|\.css|\.js|\.pdf|\.png|\.xls|\.ico|\.txt|\.doc|yandexbot|googlebot|YandexDirect|\/upload\/|" 404 |" 301 |" 302 )'|perl -MURI::Escape -lne 'print uri_unescape($_)'|grep yandsearch|awk '{print $1}'|sort|uniq|wc -l
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment