Skip to content

Instantly share code, notes, and snippets.

@zabir-nabil
Created August 5, 2021 10:53
Show Gist options
  • Save zabir-nabil/3febf028fcac83c45008ca9aba039aa6 to your computer and use it in GitHub Desktop.
Save zabir-nabil/3febf028fcac83c45008ca9aba039aa6 to your computer and use it in GitHub Desktop.
Find all the links from a list of websites and check if certain keywords are present in the homepage or not.
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--no-sandbox')
driver = webdriver.Chrome('/usr/bin/chromedriver', options=chrome_options)
ips = open("ip.txt", "r")
keywords = ["login", "denied", "username", "password", "Apache2"]
for ip in ips.readlines():
try:
print("http://" + ip.strip(), end=" ")
driver.get("http://" + ip.strip())
found = False
for k in keywords:
if k in driver.page_source:
print(f" [ found ]")
found = True
break
if found == False:
print(f" [ not found ]")
soup = BeautifulSoup(driver.page_source)
a_s = soup.find_all("a")
for a in a_s:
if a.get("href", None) != None:
if a["href"].startswith("http"):
print(a["href"])
print("---------------------------------------------------")
except:
print("driver faild")
@zabir-nabil
Copy link
Author

http://google.com  [ not found ]
https://about.google/?fg=1&utm_source=google-US&utm_medium=referral&utm_campaign=hp-header
https://store.google.com/US?utm_source=hp_header&utm_medium=google_ooo&utm_campaign=GS100042&hl=en-US
https://mail.google.com/mail/&ogbl
https://www.google.com/imghp?hl=en&ogbl
https://www.google.com/intl/en/about/products
https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/%3Fgws_rd%3Dssl&ec=GAZAmgQ
https://www.google.com/url?q=https://www.google.com/search%3Fq%3DOlympics%26source%3Dsmp.Olympics2021.10&source=hpp&id=19025478&ct=3&usg=AFQjCNHhJ12UEwBKPXMN8EURQbajLmsJtg&sa=X&ved=0ahUKEwih9su61pnyAhU_FVkFHavPDsUQ8IcBCBA
https://www.google.com/intl/en_us/ads/?subid=ww-ww-et-g-awa-a-g_hpafoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpafooter&fg=1
https://www.google.com/services/?subid=ww-ww-et-g-awa-a-g_hpbfoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpbfooter&fg=1
https://google.com/search/howsearchworks/?fg=1
https://sustainability.google/commitments/?utm_source=googlehpfooter&utm_medium=housepromos&utm_campaign=bottom-footer&utm_content=
https://policies.google.com/privacy?hl=en&fg=1
https://policies.google.com/terms?hl=en&fg=1
https://www.google.com/preferences?hl=en&fg=1
https://support.google.com/websearch/?p=ws_results_help&hl=en&fg=1
---------------------------------------------------
http://facebook.com  [ found ]
https://www.facebook.com/recover/initiate/?ars=facebook_login&privacy_mutation_token=eyJ0eXBlIjowLCJjcmVhdGlvbl90aW1lIjoxNjI4MTU5NDExLCJjYWxsc2l0ZV9pZCI6MzgxMjI5MDc5NTc1OTQ2fQ%3D%3D
https://es-la.facebook.com/
https://fr-fr.facebook.com/
https://zh-cn.facebook.com/
https://ar-ar.facebook.com/
https://pt-br.facebook.com/
https://it-it.facebook.com/
https://ko-kr.facebook.com/
https://de-de.facebook.com/
https://hi-in.facebook.com/
https://ja-jp.facebook.com/
https://messenger.com/
https://www.facebook.com/watch/
https://pay.facebook.com/
https://www.oculus.com/
https://portal.facebook.com/
https://l.facebook.com/l.php?u=https%3A%2F%2Fwww.instagram.com%2F&h=AT1SypqwqPpcRA3YffqEsC1rV3KFsnIHJVTnGoCTdg1Jbo6_HdGXtCzQ9SnQYDYElHxDTU6zgfyKxaygtQNk-gTzodO_zJ-2FQzkVolfBpdjDjpkQVI6IxyesaI3FbI9FBO4oqiDapgpMQ
https://about.facebook.com/
https://developers.facebook.com/?ref=pf
https://www.facebook.com/help/568137493302217
---------------------------------------------------
http://werku.ddns.net  [ found ]
http://httpd.apache.org/docs/2.4/mod/mod_userdir.html
https://bugs.launchpad.net/ubuntu/+source/apache2
---------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment