The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.
from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')
select a.comment_author_email, max(a.comment_date) as date, b.comment_author from wp_comments a, wp_comments b where a.comment_author_email = b.comment_author_email and a.comment_date = b.comment_date and a.comment_approved=1 and a.comment_author_email <> "" and a.user_id = 0 and a.comment_author_email not like '%YOUREMAILDOMAIN%' group by a.comment_author_email; |
The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.
from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')