Skip to content

Instantly share code, notes, and snippets.

@Datalators
Last active July 26, 2020 14:59
Show Gist options
  • Save Datalators/8341bddddf6145aead144d55e71d4e51 to your computer and use it in GitHub Desktop.
Save Datalators/8341bddddf6145aead144d55e71d4e51 to your computer and use it in GitHub Desktop.
Datalators - Emailator

Emailator is a strong email extractor python code block which can extract email adresses from a website or a list of URLS.

Feel free to use it in your program.

www.datalators.com

original_url = input('Enter The Url You Want To Emailate: ')
import re
import requests
from urllib.parse import urlsplit
from collections import deque
from bs4 import BeautifulSoup
unscraped = deque([original_url])
scraped = set()
emails = set()
while len(unscraped):
url = unscraped.popleft()
scraped.add(url)
parts = urlsplit(url)
base_url = "{0.scheme}://{0.netloc}".format(parts)
if '/' in parts.path:
path = url[:url.rfind('/')+1]
else:
path = url
# print("Datalating %s" % url)
try:
response = requests.get(url, timeout=20)
except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
#return 'no'
continue
new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.com", response.text, re.I))
emails.update(new_emails)
soup = BeautifulSoup(response.text, 'lxml')
@Datalators
Copy link
Author

A fast but fairly strong Email Extractor which will extract email from any website you want to scrape from.
You can use this block or make a function with this block to extract from a list of websites/Urls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment