Skip to content

Instantly share code, notes, and snippets.

View mickeyperlstein's full-sized avatar

Mickey Perlstein mickeyperlstein

View GitHub Profile
can you write me some python async code that reads urls off of a kafka topic called incoming_urls and then uses headless selenium to download the webpages html , parses that links from it, and send the following json to a second kafka topic called outgoing html:
html: $pagebody,
links: [] $urls_parsed_from_the_html
headers: $html_headers,
machineip: $private_ip,
log: $/var/logs/crawler.log,
errors: $errorsEncountered