Skip to content

Instantly share code, notes, and snippets.

View dangra's full-sized avatar
🦖

Daniel Graña dangra

🦖
View GitHub Profile
$ docker run -it --rm scrapinghub/scrapinghub-stack-hworker scrapy fetch https://dk.trustpilot.com/review/www.telia.dk
[sudo] password for daniel:
Unable to find image 'scrapinghub/scrapinghub-stack-hworker:latest' locally
latest: Pulling from scrapinghub/scrapinghub-stack-hworker
4edf76921243: Already exists
044c0d9e0cd9: Already exists
331fbd6c3dec: Already exists
8f76788f1cb3: Already exists
a3ed95caeb02: Already exists
description = "DNS parser"
short_description = "dns packet parser"
category = "misc"
args = {}
function on_init()
io.stdout:setvbuf 'line'
sysdig.set_snaplen(512)
chisel.set_filter("fd.port=53 and evt.dir=< and evt.type=sendmsg")
chisel.set_event_formatter("")
$ py.test test-json-envvar.py
======================================== test session starts ========================================
platform linux2 -- Python 2.7.6, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /home/daniel, inifile:
plugins: hypothesis-3.4.0
collected 1 items
test-json-envvar.py .
===================================== 1 passed in 5.29 seconds ======================================
#!/usr/bin/env python
import sys
import uuid
import time
from argparse import ArgumentParser
from shove import Shove
from loremipsum import Generator
def _generator(count):
lxc-start 1448649277.700 DEBUG lxc_start - sigchild handler set
lxc-start 1448649277.700 INFO lxc_start - 'foo' is initialized
lxc-start 1448649277.701 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp
lxc-start 1448649277.701 DEBUG lxc_conf - instanciated veth 'vethei3Ehx/vethcLzLk5', index is '21'
lxc-start 1448649277.701 INFO lxc_conf - opened /var/lib/lxc/hublxc/rootfs.hold as fd 7
lxc-start 1448649277.702 DEBUG lxc_cgroup - checking '/' (rootfs)
lxc-start 1448649277.702 DEBUG lxc_cgroup - checking '/' (aufs)
lxc-start 1448649277.702 DEBUG lxc_cgroup - checking '/proc' (proc)
class ReplacementClass(object):
@property
def selector(self):
return custom selector
def process_response(self, response):
cls = type(response)
newclass = type('newclass', (ReplacementClass, getmro(response.__class__), cls))
class NewClass(ReplacementClass, response.__class__):
import scrapy
from scrapy.http import safeurl
class Spider(scrapy.Spider):
name = 'loremipsum'
start_urls = ('https://www.lipsum.com',)
def parse(self, response):
import scrapy
class Spider(scrapy.Spider):
name = 'loremipsum'
start_urls = ('https://www.lipsum.com',)
def parse(self, response):
for lnk in response.links():
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
$ scrapy shell https://www.ssehl.co.uk/HALO/publicLogon.do -c "response.xpath('//title').extract()"
2014-05-08 16:33:22-0300 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-05-08 16:33:22-0300 [scrapy] INFO: Optional features available: ssl, http11
2014-05-08 16:33:22-0300 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled item pipelines:
2014-05-08