Skip to content

Instantly share code, notes, and snippets.

View dangra's full-sized avatar
🦖

Daniel Graña dangra

🦖
View GitHub Profile
$ scrapy shell https://www.ssehl.co.uk/HALO/publicLogon.do -c "response.xpath('//title').extract()"
2014-05-08 16:33:22-0300 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-05-08 16:33:22-0300 [scrapy] INFO: Optional features available: ssl, http11
2014-05-08 16:33:22-0300 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled item pipelines:
2014-05-08
$ scrapy shell http://scrapy.org/images/logo.png
2014-04-21 23:53:11-0300 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-04-21 23:53:11-0300 [scrapy] INFO: Optional features available: ssl, http11
2014-04-21 23:53:11-0300 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled item pipelines:
2014-04-21 23:53:12-0300 [scrapy] DEBUG: Telnet console listen
------------------------------------------------------------
/home/daniel/envs/setup3/bin/pip run on Thu Mar 6 12:23:59 2014
Downloading/unpacking cryptography
Getting page https://pypi.python.org/simple/cryptography/
URLs to search for versions for cryptography:
* https://pypi.python.org/simple/cryptography/
Analyzing links from page https://pypi.python.org/simple/cryptography/
Skipping https://pypi.python.org/packages/cp26/c/cryptography/cryptography-0.2-cp26-none-win32.whl#md5=13e5c4b19520e7dc6f07c6502b3f74e2 (from https://pypi.python.org/simple/cryptography/) because it is not compatible with this Python
Skipping https://pypi.python.org/packages/cp26/c/cryptography/cryptography-0.2.1-cp26-none-win32.whl#md5=00e733648ee5cdb9e58876238b1328f8 (from https://pypi.python.org/simple/cryptography/) because it is not compatible with this Python
Skipping https://pypi.python.org/packages/cp26/c/cryptography/cryptography-0.2.2-cp26-none-win32.whl#md5=b52f9b5f5c980ebbe090f945a44be2a5 (from https:/
Downloading cryptography-0.2.2.tar.gz (13.8MB): 13.8MB downloaded
Running setup.py (path:/tmp/pip_build_root/cryptography/setup.py) egg_info for package cryptography
no previously-included directories found matching 'documentation/_build'
zip_safe flag not set; analyzing archive contents...
six: module references __file__
Installed /tmp/pip_build_root/cryptography/six-1.5.2-py2.7.egg
Searching for cffi>=0.8
Reading http://33.33.33.41:3141/vagrant/dev/+simple/cffi/
Best match: cffi 0.8.1
# global parameters
global
# log on syslog of 127.0.0.1 udp port 514 (default) using local0 facility.
log 127.0.0.1 local0
# maximum number of concurrent connections
maxconn 4096
# drop privileges after port binding
user nobody
group nogroup
# run in daemon mode
import encodings
import lxml.etree
for enc in set(encodings.aliases.aliases.values()):
try:
parser = lxml.etree.HTMLParser(recover=True, encoding=enc)
except LookupError as exc:
print str(exc)
import encodings
import lxml.etree
for enc in set(encodings.aliases.aliases.values()):
try:
parser = lxml.etree.HTMLParser(recover=True, encoding=enc)
except LookupError as exc:
print str(exc)
~$ scrapy shell http://www.jobberman.com/jobs-in-nigeria/3/by-industry/vacancies-in-ict-telecommunications-companies-in-nigeria/
2014-01-22 23:09:26-0200 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-01-22 23:09:26-0200 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2014-01-22 23:09:26-0200 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-01-22 23:09:27-0200 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-01-22 23:09:28-0200 [scrapy]

0.22.0 (released 2014-01-16)

Enhancements

  • Backwards incompatible Switched HTTPCacheMiddleware backend to filesystem (541) To restore old backend set HTTPCACHE_STORAGE to scrapy.contrib.httpcache.DbmCacheStorage
  • Proxy https:// urls using CONNECT method (392, 397)
  • Add a middleware to crawl ajax crawleable pages as defined by google (343)
----------
ID: app
Function: docker.running
Result: False
Comment: Container 'shipyard' cannot be started
Traceback (most recent call last):
File "/var/cache/salt/minion/extmods/modules/dockerio.py", line 904, in start
for k, v in port_bindings.iteritems():
AttributeError: 'list' object has no attribute 'iteritems'
Changes: