Skip to content

Instantly share code, notes, and snippets.

@twolfe18
Created January 23, 2012 02:32
Show Gist options
  • Save twolfe18/1660105 to your computer and use it in GitHub Desktop.
Save twolfe18/1660105 to your computer and use it in GitHub Desktop.
command line tool to get URLs for a google search
#!/usr/bin/python
import sys, httplib, re
if len(sys.argv) < 2:
print 'provide a query'
exit(0)
c = httplib.HTTPConnection('www.google.com')
c.request('GET', '/search?q=%s' % ('_'.join(sys.argv[1:])))
r = c.getresponse()
html = r.read()
c.close()
# yes, i'm aware that HTML is a context free, not regular, language... this will do
p1 = re.compile('<li class="g">(.+?)</li>', re.DOTALL)
p2 = re.compile('<h3 class="r"><a href="(.+?)"', re.DOTALL)
for m1 in re.finditer(p1, html):
m2 = re.match(p2, m1.group(1))
if m2 is not None and m2.group(1).startswith('http'):
print m2.group(1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment