Using Requests and Beautiful Soup, with the most recent Beautiful Soup 4 docs.
Install our tools (preferably in a new virtualenv):
pip install beautifulsoup4
Using Requests and Beautiful Soup, with the most recent Beautiful Soup 4 docs.
Install our tools (preferably in a new virtualenv):
pip install beautifulsoup4
NOTE: This is a question I found on StackOverflow which I’ve archived here, because the answer is so effing phenomenal.
If you are not into long explanations, see [Paolo Bergantino’s answer][2].
#!/usr/bin/python | |
""" | |
Copyright (C) 2016 Jeffrey S. McAnarney | |
This program is free software; you can redistribute it and/or | |
modify it under the terms of the GNU General Public License | |
as published by the Free Software Foundation; either version 2 | |
of the License, or (at your option) any later version. | |
This program is distributed in the hope that it will be useful, |
import requests | |
import sys | |
import json | |
def waybackurls(host, with_subs): | |
if with_subs: | |
url = 'http://web.archive.org/cdx/search/cdx?url=*.%s/*&output=json&fl=original&collapse=urlkey' % host | |
else: | |
url = 'http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey' % host |
A combination of my own methodology and the Web Application Hacker's Handbook Task checklist, as a Github-Flavored Markdown file
curl -L -k -s https://www.example.com | tac | sed "s#\\\/#\/#g" | egrep -o "src['\"]?\s*[=:]\s*['\"]?[^'\"]+.js[^'\"> ]*" | awk -F '//' '{if(length($2))print "https://"$2}' | sort -fu | xargs -I '%' sh -c "curl -k -s \"%\" | sed \"s/[;}\)>]/\n/g\" | grep -Po \"(['\\\"](https?:)?[/]{1,2}[^'\\\"> ]{5,})|(\.(get|post|ajax|load)\s*\(\s*['\\\"](https?:)?[/]{1,2}[^'\\\"> ]{5,})\"" | awk -F "['\"]" '{print $2}' | sort -fu | |
# using linkfinder | |
function ejs() { | |
URL=$1; | |
curl -Lks $URL | tac | sed "s#\\\/#\/#g" | egrep -o "src['\"]?\s*[=:]\s*['\"]?[^'\"]+.js[^'\"> ]*" | sed -r "s/^src['\"]?[=:]['\"]//g" | awk -v url=$URL '{if(length($1)) if($1 ~/^http/) print $1; else if($1 ~/^\/\//) print "https:"$1; else print url"/"$1}' | sort -fu | xargs -I '%' sh -c "echo \"\n##### %\";wget --no-check-certificate --quiet \"%\"; basename \"%\" | xargs -I \"#\" sh -c 'linkfinder.py -o cli -i #'" | |
} | |
# with file download (the new best one): | |
# but there is a bug if you don't provide a root url |
Credits to https://lobotuerto.com/blog/how-to-setup-full-disk-encryption-on-a-secondary-hdd-in-linux/ .
lsblk