Skip to content

Instantly share code, notes, and snippets.

@elacuesta
Last active June 26, 2019 16:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save elacuesta/edfb297fdb0eaa0e5e415835c148c564 to your computer and use it in GitHub Desktop.
Save elacuesta/edfb297fdb0eaa0e5e415835c148c564 to your computer and use it in GitHub Desktop.
Scrapy - Inject CookieJars into callbacks (requires https://github.com/scrapy/scrapy/pull/3563)
from scrapy import Spider
from scrapy.http.cookies import CookieJar
from scrapy.downloadermiddlewares.cookies import CookiesMiddleware
class InjectCookiesMiddleware(CookiesMiddleware):
def process_request(self, request, spider):
result = super().process_request(request, spider)
if result is None:
callback = request.callback or spider.parse
annotations = getattr(callback, '__annotations__', {})
for key, value in annotations.items():
if value is CookieJar:
cookiejarkey = request.meta.get('cookiejar')
request.cb_kwargs[key] = self.jars[cookiejarkey]
break
return result
class CookieJarSpider(Spider):
name = 'cookiejar'
start_urls = ['https://httpbin.org/cookies/set/foo/bar']
custom_settings = {
'DOWNLOADER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': None,
__name__ + '.InjectCookiesMiddleware': 700,
}
}
def parse(self, response, cookiejar: CookieJar):
print('CookieJar:', cookiejar)
print('Cookies:\n', cookiejar._cookies)
$ scrapy runspider cookiejar.py
(...)
2019-06-26 13:26:46 [scrapy.core.engine] INFO: Spider opened
2019-06-26 13:26:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-06-26 13:26:46 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-06-26 13:26:47 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://httpbin.org/cookies> from <GET https://httpbin.org/cookies/set/foo/bar>
2019-06-26 13:26:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://httpbin.org/cookies> (referer: None)
CookieJar: <scrapy.http.cookies.CookieJar object at 0x7f63413ba978>
Cookies:
{'httpbin.org': {'/': {'foo': Cookie(version=0, name='foo', value='bar', port=None, port_specified=False, domain='httpbin.org', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)}}}
2019-06-26 13:26:47 [scrapy.core.engine] INFO: Closing spider (finished)
(...)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment