Skip to content

Instantly share code, notes, and snippets.

@tomatosoupcan
Created August 20, 2020 21:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tomatosoupcan/aee3063ef76cfc896fa36bbbda74f478 to your computer and use it in GitHub Desktop.
Save tomatosoupcan/aee3063ef76cfc896fa36bbbda74f478 to your computer and use it in GitHub Desktop.
import re
import requests
from lxml import html
text = 'text goes here http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.com/AaaaaaaaaAAaAaAaaAAa click on that'
matches = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text)
gnomecount = 0
for match in matches:
source = requests.get(match)
gnomes = re.findall('.*gnome.*', source.text)
gnomecount += len(gnomes)
print("this message has a gnome count of",gnomecount)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment