Skip to content

Instantly share code, notes, and snippets.

@mickeyperlstein
Created January 18, 2024 18:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mickeyperlstein/52e6d272b047592eaeded8c06e49064c to your computer and use it in GitHub Desktop.
Save mickeyperlstein/52e6d272b047592eaeded8c06e49064c to your computer and use it in GitHub Desktop.
blog prompt
can you write me some python async code that reads urls off of a kafka topic called incoming_urls and then uses headless selenium to download the webpages html , parses that links from it, and send the following json to a second kafka topic called outgoing html:
json{
html: $pagebody,
links: [] $urls_parsed_from_the_html
headers: $html_headers,
machineip: $private_ip,
log: $/var/logs/crawler.log,
errors: $errorsEncountered
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment