Skip to content

Instantly share code, notes, and snippets.

@eng-rodrigocunha
Last active March 12, 2023 01:59
Show Gist options
  • Save eng-rodrigocunha/3d67371c8b1b27b81c4515047dbb43c6 to your computer and use it in GitHub Desktop.
Save eng-rodrigocunha/3d67371c8b1b27b81c4515047dbb43c6 to your computer and use it in GitHub Desktop.
Realiza web scrapping para coletar todos os e-mails de determinado conjunto de páginas web
#!pip install requests
#!pip install beautifulsoup4
# https://stackoverflow.com/questions/63533115/extract-valid-email-address-using-regular-expression-and-beautifulsoup
import requests
import re
from bs4 import BeautifulSoup
email = re.compile(r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}')
email_list = set()
for i in range(1,7):
url = f"http://www.eeffto.ufmg.br/eeffto/graduacao/educacao_fasica_graduacao/corpo_docente/lista/{i}"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
email_list = email_list.union(set(email.findall(soup.get_text())))
#print(email_list)
for mail in email_list:
print(f"{mail};")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment