Skip to content

Instantly share code, notes, and snippets.

@dangra
Created January 23, 2014 01:10
Show Gist options
  • Save dangra/8570891 to your computer and use it in GitHub Desktop.
Save dangra/8570891 to your computer and use it in GitHub Desktop.
~$ scrapy shell http://www.jobberman.com/jobs-in-nigeria/3/by-industry/vacancies-in-ict-telecommunications-companies-in-nigeria/
2014-01-22 23:09:26-0200 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-01-22 23:09:26-0200 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2014-01-22 23:09:26-0200 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-01-22 23:09:27-0200 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled item pipelines:
2014-01-22 23:09:28-0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2014-01-22 23:09:28-0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-01-22 23:09:28-0200 [default] INFO: Spider opened
2014-01-22 23:09:30-0200 [default] DEBUG: Crawled (200) <GET http://www.jobberman.com/jobs-in-nigeria/3/by-industry/vacancies-in-ict-telecommunications-companies-in-nigeria/> (referer: None)
[s] Available Scrapy objects:
[s] item {}
[s] request <GET http://www.jobberman.com/jobs-in-nigeria/3/by-industry/vacancies-in-ict-telecommunications-companies-in-nigeria/>
[s] response <200 http://www.jobberman.com/jobs-in-nigeria/3/by-industry/vacancies-in-ict-telecommunications-companies-in-nigeria/>
[s] sel <Selector xpath=None data=u'<html xmlns="http://www.w3.org/1999/xhtm'>
[s] settings <CrawlerSettings module=None>
[s] spider <Spider 'default' at 0x2d44a90>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
In [1]: sel.xpath('//div[@class="jobpost" and contains(span/span, "Yesterday")]')
Out[1]:
[<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>,
<Selector xpath='//div[@class="jobpost" and contains(span/span, "Yesterday")]' data=u'<div class="jobpost"><span class="label '>]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment