Skip to content

Instantly share code, notes, and snippets.

@mhmdiaa
Last active March 7, 2023 12:41
Embed
What would you like to do?
import requests
import sys
import json
def waybackurls(host, with_subs):
if with_subs:
url = 'http://web.archive.org/cdx/search/cdx?url=*.%s/*&output=json&fl=original&collapse=urlkey' % host
else:
url = 'http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&fl=original&collapse=urlkey' % host
r = requests.get(url)
results = r.json()
return results[1:]
if __name__ == '__main__':
argc = len(sys.argv)
if argc < 2:
print('Usage:\n\tpython3 waybackurls.py <url> <include_subdomains:optional>')
sys.exit()
host = sys.argv[1]
with_subs = False
if argc > 3:
with_subs = True
urls = waybackurls(host, with_subs)
json_urls = json.dumps(urls)
if urls:
filename = '%s-waybackurls.json' % host
with open(filename, 'w') as f:
f.write(json_urls)
print('[*] Saved results to %s' % filename)
else:
print('[-] Found nothing')
@albvt
Copy link

albvt commented Dec 25, 2018

bruh how are you

@sunilb77
Copy link

sunilb77 commented Sep 6, 2020

how to fix this issue?

kali@kali:~/Desktop/tools/waybackurls$ python3 WAYBACKTEST.py evil.com
Traceback (most recent call last):
File "WAYBACKTEST.py", line 1, in
import requests
ModuleNotFoundError: No module named 'requests'

@p0lr
Copy link

p0lr commented Sep 8, 2020

pip3 install requests

@rrampage
Copy link

rrampage commented Sep 30, 2020

A bash function which uses jq (not for sub-domain search but works for any URL prefix). It gives the full web archive url which is generally of format https://web.archive.org/web/$TIMESTAMP/$ORIGINAL:

wb () 
{ 
    if [[ -z $1 ]]; then
        echo "Usage: $0 URL";
    else
        curl "http://web.archive.org/cdx/search/cdx?url=$1/*&output=json&fl=original,timestamp" 2> /dev/null | jq '.[1:][] |"https://web.archive.org/web/" +.[1] + "/" + .[0]' 2> /dev/null;
    fi
}

This can be added to the ~/.bashrc or relevant shell profile.

Usage: wb gist.github.com/mhmdiaa

@akamhy
Copy link

akamhy commented Oct 2, 2020

Hi,
Just wanted to tell you that I used your Idea in https://github.com/akamhy/waybackpy. [commit]

Usage :

pip3 install waybackpy
waybackpy --url akamhy.github.io --user_agent "my-user-agent" --known_urls

Output:

http://akamhy.github.io
https://akamhy.github.io/favicon.ico
https://akamhy.github.io/robots.txt
https://akamhy.github.io/waybackpy/
https://akamhy.github.io/waybackpy/assets/css/style.css?v=a418a4e4641a1dbaad8f3bfbf293fad21a75ff11
https://akamhy.github.io/waybackpy/assets/css/style.css?v=f881705d00bf47b5bf0c58808efe29eecba2226c
6 URLs found and saved in ./akamhy.github.io-6-urls.txt

Flags:

  1. '--alive' will only fetch URLs that are not dead. alive will be slower for websites with too many archived URLs e.g. google
  2. '--subdomain' will include URLs from subdomains.

See live use @ https://repl.it/@akamhy/Waybackpy-Known-Urls#main.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment