Skip to content

Instantly share code, notes, and snippets.

@redapple
Created February 1, 2016 11:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save redapple/ba015fab046310b49cdc to your computer and use it in GitHub Desktop.
Save redapple/ba015fab046310b49cdc to your computer and use it in GitHub Desktop.
$ scrapy shell
2016-02-01 12:41:35 [scrapy] INFO: Scrapy 1.1.0dev1 started (bot: scrapybot)
2016-02-01 12:41:35 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-02-01 12:41:35 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-02-01 12:41:35 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-02-01 12:41:35 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-02-01 12:41:35 [scrapy] INFO: Enabled item pipelines:
[]
2016-02-01 12:41:35 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-02-01 12:41:35 [root] DEBUG: Using default logger
2016-02-01 12:41:35 [root] DEBUG: Using default logger
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f7ab7596b90>
[s] item {}
[s] settings <scrapy.settings.Settings object at 0x7f7ab7596b10>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
In [1]: settings
Out[1]:
{'AJAXCRAWL_ENABLED': False,
'AUTOTHROTTLE_DEBUG': False,
'AUTOTHROTTLE_ENABLED': False,
'AUTOTHROTTLE_MAX_DELAY': 60.0,
'AUTOTHROTTLE_START_DELAY': 5.0,
'AUTOTHROTTLE_TARGET_CONCURRENCY': 1.0,
'BOT_NAME': 'scrapybot',
'CLOSESPIDER_ERRORCOUNT': 0,
'CLOSESPIDER_ITEMCOUNT': 0,
'CLOSESPIDER_PAGECOUNT': 0,
'CLOSESPIDER_TIMEOUT': 0,
'COMMANDS_MODULE': '',
'COMPRESSION_ENABLED': True,
'CONCURRENT_ITEMS': 100,
'CONCURRENT_REQUESTS': 16,
'CONCURRENT_REQUESTS_PER_DOMAIN': 8,
'CONCURRENT_REQUESTS_PER_IP': 0,
'COOKIES_DEBUG': False,
'COOKIES_ENABLED': True,
'DEFAULT_ITEM_CLASS': 'scrapy.item.Item',
'DEFAULT_REQUEST_HEADERS': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en'},
'DEPTH_LIMIT': 0,
'DEPTH_PRIORITY': 0,
'DEPTH_STATS': True,
'DNSCACHE_ENABLED': True,
'DNSCACHE_SIZE': 10000,
'DNS_TIMEOUT': 60,
'DOWNLOADER': 'scrapy.core.downloader.Downloader',
'DOWNLOADER_CLIENTCONTEXTFACTORY': 'scrapy.core.downloader.contextfactory.ScrapyClientContextFactory',
'DOWNLOADER_HTTPCLIENTFACTORY': 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory',
'DOWNLOADER_MIDDLEWARES': {},
'DOWNLOADER_MIDDLEWARES_BASE': {'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware': 830,
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 550,
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 500,
'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 400},
'DOWNLOADER_STATS': True,
'DOWNLOAD_DELAY': 0,
'DOWNLOAD_HANDLERS': {},
'DOWNLOAD_HANDLERS_BASE': {'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
'ftp': 'scrapy.core.downloader.handlers.ftp.FTPDownloadHandler',
'http': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
'https': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler'},
'DOWNLOAD_MAXSIZE': 1073741824,
'DOWNLOAD_TIMEOUT': 180,
'DOWNLOAD_WARNSIZE': 33554432,
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
'EDITOR': 'vi',
'EXTENSIONS': {},
'EXTENSIONS_BASE': {'scrapy.extensions.closespider.CloseSpider': 0,
'scrapy.extensions.corestats.CoreStats': 0,
'scrapy.extensions.feedexport.FeedExporter': 0,
'scrapy.extensions.logstats.LogStats': 0,
'scrapy.extensions.memdebug.MemoryDebugger': 0,
'scrapy.extensions.memusage.MemoryUsage': 0,
'scrapy.extensions.spiderstate.SpiderState': 0,
'scrapy.extensions.telnet.TelnetConsole': 0,
'scrapy.extensions.throttle.AutoThrottle': 0},
'FEED_EXPORTERS': {},
'FEED_EXPORTERS_BASE': {'csv': 'scrapy.exporters.CsvItemExporter',
'jl': 'scrapy.exporters.JsonLinesItemExporter',
'json': 'scrapy.exporters.JsonItemExporter',
'jsonlines': 'scrapy.exporters.JsonLinesItemExporter',
'marshal': 'scrapy.exporters.MarshalItemExporter',
'pickle': 'scrapy.exporters.PickleItemExporter',
'xml': 'scrapy.exporters.XmlItemExporter'},
'FEED_EXPORT_FIELDS': None,
'FEED_FORMAT': 'jsonlines',
'FEED_STORAGES': {},
'FEED_STORAGES_BASE': {'': 'scrapy.extensions.feedexport.FileFeedStorage',
'file': 'scrapy.extensions.feedexport.FileFeedStorage',
'ftp': 'scrapy.extensions.feedexport.FTPFeedStorage',
's3': 'scrapy.extensions.feedexport.S3FeedStorage',
'stdout': 'scrapy.extensions.feedexport.StdoutFeedStorage'},
'FEED_STORE_EMPTY': False,
'FEED_URI': None,
'FEED_URI_PARAMS': None,
'HTTPCACHE_ALWAYS_STORE': False,
'HTTPCACHE_DBM_MODULE': 'anydbm',
'HTTPCACHE_DIR': 'httpcache',
'HTTPCACHE_ENABLED': False,
'HTTPCACHE_EXPIRATION_SECS': 0,
'HTTPCACHE_GZIP': False,
'HTTPCACHE_IGNORE_HTTP_CODES': [],
'HTTPCACHE_IGNORE_MISSING': False,
'HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS': [],
'HTTPCACHE_IGNORE_SCHEMES': ['file'],
'HTTPCACHE_POLICY': 'scrapy.extensions.httpcache.DummyPolicy',
'HTTPCACHE_STORAGE': 'scrapy.extensions.httpcache.FilesystemCacheStorage',
'HTTPPROXY_AUTH_ENCODING': 'latin-1',
'ITEM_PIPELINES': {},
'ITEM_PIPELINES_BASE': {},
'ITEM_PROCESSOR': 'scrapy.pipelines.ItemPipelineManager',
'KEEP_ALIVE': True,
'LOGSTATS_INTERVAL': 0,
'LOG_DATEFORMAT': '%Y-%m-%d %H:%M:%S',
'LOG_ENABLED': True,
'LOG_ENCODING': 'utf-8',
'LOG_FILE': None,
'LOG_FORMAT': '%(asctime)s [%(name)s] %(levelname)s: %(message)s',
'LOG_FORMATTER': 'scrapy.logformatter.LogFormatter',
'LOG_LEVEL': 'DEBUG',
'LOG_STDOUT': False,
'LOG_UNSERIALIZABLE_REQUESTS': False,
'MAIL_FROM': 'scrapy@localhost',
'MAIL_HOST': 'localhost',
'MAIL_PASS': None,
'MAIL_PORT': 25,
'MAIL_USER': None,
'MEMDEBUG_ENABLED': False,
'MEMDEBUG_NOTIFY': [],
'MEMUSAGE_CHECK_INTERVAL_SECONDS': 60.0,
'MEMUSAGE_ENABLED': False,
'MEMUSAGE_LIMIT_MB': 0,
'MEMUSAGE_NOTIFY_MAIL': [],
'MEMUSAGE_REPORT': False,
'MEMUSAGE_WARNING_MB': 0,
'METAREFRESH_ENABLED': True,
'METAREFRESH_MAXDELAY': 100,
'NEWSPIDER_MODULE': '',
'RANDOMIZE_DOWNLOAD_DELAY': True,
'REACTOR_THREADPOOL_MAXSIZE': 10,
'REDIRECT_ENABLED': True,
'REDIRECT_MAX_TIMES': 20,
'REDIRECT_PRIORITY_ADJUST': 2,
'REFERER_ENABLED': True,
'RETRY_ENABLED': True,
'RETRY_HTTP_CODES': [500, 502, 503, 504, 408],
'RETRY_PRIORITY_ADJUST': -1,
'RETRY_TIMES': 2,
'ROBOTSTXT_OBEY': False,
'SCHEDULER': 'scrapy.core.scheduler.Scheduler',
'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleLifoDiskQueue',
'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.LifoMemoryQueue',
'SPIDER_CONTRACTS': {},
'SPIDER_CONTRACTS_BASE': {'scrapy.contracts.default.ReturnsContract': 2,
'scrapy.contracts.default.ScrapesContract': 3,
'scrapy.contracts.default.UrlContract': 1},
'SPIDER_LOADER_CLASS': 'scrapy.spiderloader.SpiderLoader',
'SPIDER_MIDDLEWARES': {},
'SPIDER_MIDDLEWARES_BASE': {'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800},
'SPIDER_MODULES': [],
'STATSMAILER_RCPTS': [],
'STATS_CLASS': 'scrapy.statscollectors.MemoryStatsCollector',
'STATS_DUMP': True,
'TELNETCONSOLE_ENABLED': 1,
'TELNETCONSOLE_HOST': '127.0.0.1',
'TELNETCONSOLE_PORT': [6023, 6073],
'TEMPLATES_DIR': '/home/paul/src/scrapy/scrapy/templates',
'URLLENGTH_LIMIT': 2083,
'USER_AGENT': u'Scrapy/1.1.0dev1 (+http://scrapy.org)'}
In [2]:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment