Skip to content

Instantly share code, notes, and snippets.

@formido
Created November 17, 2008 18:17
Show Gist options
  • Save formido/25850 to your computer and use it in GitHub Desktop.
Save formido/25850 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
import ruya, logging
def aftercrawl(caller, eventargs):
page = eventargs.document
print 'Url: ' + page.uri.url
print 'Title: ' + page.title
if('__main__'== __name__):
url = 'http://directory.google.com/Top/Computers/Internet/Web_Design_and_Development/Promotion/'
page = ruya.Document(ruya.Uri(url))
c = ruya.Config(ruya.Config.CrawlConfig(crawlscope=ruya.CrawlScope.SCOPE_PATH), ruya.Config.RedirectConfig(), logging.getLogger())
spider = ruya.SingleDomainDelayCrawler(c)
spider.bind('aftercrawl', aftercrawl, None)
spider.crawl(page)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment