This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ docker run -it --rm scrapinghub/scrapinghub-stack-hworker scrapy fetch https://dk.trustpilot.com/review/www.telia.dk | |
[sudo] password for daniel: | |
Unable to find image 'scrapinghub/scrapinghub-stack-hworker:latest' locally | |
latest: Pulling from scrapinghub/scrapinghub-stack-hworker | |
4edf76921243: Already exists | |
044c0d9e0cd9: Already exists | |
331fbd6c3dec: Already exists | |
8f76788f1cb3: Already exists | |
a3ed95caeb02: Already exists |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
description = "DNS parser" | |
short_description = "dns packet parser" | |
category = "misc" | |
args = {} | |
function on_init() | |
io.stdout:setvbuf 'line' | |
sysdig.set_snaplen(512) | |
chisel.set_filter("fd.port=53 and evt.dir=< and evt.type=sendmsg") | |
chisel.set_event_formatter("") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ py.test test-json-envvar.py | |
======================================== test session starts ======================================== | |
platform linux2 -- Python 2.7.6, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 | |
rootdir: /home/daniel, inifile: | |
plugins: hypothesis-3.4.0 | |
collected 1 items | |
test-json-envvar.py . | |
===================================== 1 passed in 5.29 seconds ====================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import sys | |
import uuid | |
import time | |
from argparse import ArgumentParser | |
from shove import Shove | |
from loremipsum import Generator | |
def _generator(count): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lxc-start 1448649277.700 DEBUG lxc_start - sigchild handler set | |
lxc-start 1448649277.700 INFO lxc_start - 'foo' is initialized | |
lxc-start 1448649277.701 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp | |
lxc-start 1448649277.701 DEBUG lxc_conf - instanciated veth 'vethei3Ehx/vethcLzLk5', index is '21' | |
lxc-start 1448649277.701 INFO lxc_conf - opened /var/lib/lxc/hublxc/rootfs.hold as fd 7 | |
lxc-start 1448649277.702 DEBUG lxc_cgroup - checking '/' (rootfs) | |
lxc-start 1448649277.702 DEBUG lxc_cgroup - checking '/' (aufs) | |
lxc-start 1448649277.702 DEBUG lxc_cgroup - checking '/proc' (proc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class ReplacementClass(object): | |
@property | |
def selector(self): | |
return custom selector | |
def process_response(self, response): | |
cls = type(response) | |
newclass = type('newclass', (ReplacementClass, getmro(response.__class__), cls)) | |
class NewClass(ReplacementClass, response.__class__): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import scrapy | |
from scrapy.http import safeurl | |
class Spider(scrapy.Spider): | |
name = 'loremipsum' | |
start_urls = ('https://www.lipsum.com',) | |
def parse(self, response): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import scrapy | |
class Spider(scrapy.Spider): | |
name = 'loremipsum' | |
start_urls = ('https://www.lipsum.com',) | |
def parse(self, response): | |
for lnk in response.links(): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* | |
* Licensed to the Apache Software Foundation (ASF) under one | |
* or more contributor license agreements. See the NOTICE file | |
* distributed with this work for additional information | |
* regarding copyright ownership. The ASF licenses this file | |
* to you under the Apache License, Version 2.0 (the | |
* "License"); you may not use this file except in compliance | |
* with the License. You may obtain a copy of the License at | |
* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell https://www.ssehl.co.uk/HALO/publicLogon.do -c "response.xpath('//title').extract()" | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot) | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Optional features available: ssl, http11 | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0} | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled item pipelines: | |
2014-05-08 |