Skip to content

Instantly share code, notes, and snippets.

@edsu
Created November 13, 2009 16:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edsu/233959 to your computer and use it in GitHub Desktop.
Save edsu/233959 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
import sys
import urlparse
import robotparser # yes it's part of the python core!
try:
ua, url = sys.argv[0:2]
except:
print "usage: crawlable googlebot http://example.com/awesome.html"
sys.exit(1)
robots_url = 'http://%s/robots.txt' % urlparse.urlparse(url).netloc
p = robotparser.RobotFileParser()
p.set_url(robots_url)
if p.can_fetch(ua, url):
print "yup"
else:
print "nope"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment