Skip to content

Instantly share code, notes, and snippets.

@shelu16
Forked from pikpikcu/crawling.md
Created August 28, 2020 08:52
Show Gist options
  • Save shelu16/2a5861bf182c28ce4510123ddfd94141 to your computer and use it in GitHub Desktop.
Save shelu16/2a5861bf182c28ce4510123ddfd94141 to your computer and use it in GitHub Desktop.
debian@pikpikcu~$ cat subdo.txt | hakrawler | grep 'http' | cut -d '' -f 2 > crawler.txt 
debian@pikpikcu~$ gau -subs domain.com >>  crawler.txt
debian@pikpikcu~$ waybackurls domain.com >> crawler.txt 
debian@pikpikcu~$ cat crawling.txt | grep "?" | unfurl --unique format %s://%d%p > base.txt
debian@pikpikcu~$ cat base.txt | parallel -j50 -q grep {} -m5 crawling.txt | tee -a final.txt
debian@pikpikcu~$ cat final.txt | egrep -iv ".(jpg|jpeg|gif|css|tif|tiff|woff|woff2|ico|pdf|svg|txt|js)" > final_bos.txt 
debian@pikpikcu~$ rm -rf base.txt final.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment