Skip to content

Instantly share code, notes, and snippets.

View podolskyi's full-sized avatar

Oleksandr Podolskyi podolskyi

  • Ukraine
View GitHub Profile
@podolskyi
podolskyi / delay_repeat_spider.py
Created September 27, 2016 11:20 — forked from nyov/delay_repeat_spider.py
scrapy spider example on using reactor.callLater() for delays and repetition.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# A spider example on using reactor.callLater()
# for delays and repetition.
# scrapy 0.24
import scrapy
from twisted.internet import reactor, defer
@podolskyi
podolskyi / celery-crontab.py
Created July 30, 2016 09:28 — forked from alexanderjulo/celery-crontab.py
celery crontab example
from celery.schedules import crontab
from flask.ext.celery import Celery
CELERYBEAT_SCHEDULE = {
# executes every night at 4:15
'every-night': {
'task': 'user.checkaccounts',
'schedule': crontab(hour=4, minute=20)
}
}
@podolskyi
podolskyi / myspider.py
Created July 26, 2016 15:10 — forked from rmax/myspider.py
An example of a Scrapy spider returning a Twisted deferred.
from scrapy import Spider, Item, Field
from twisted.internet import defer, reactor
class MyItem(Item):
url = Field()
class MySpider(Spider):
@podolskyi
podolskyi / txspider.py
Created July 26, 2016 15:10 — forked from rmax/txspider.py
Using twisted deferreds in a scrapy spider!
$ scrapy runspider txspider.py
2016-07-05 23:11:39 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-07-05 23:11:39 [scrapy] INFO: Overridden settings: {}
2016-07-05 23:11:40 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
2016-07-05 23:11:40 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
@podolskyi
podolskyi / git command.markdown
Created May 22, 2016 17:09 — forked from nasirkhan/git command.markdown
`git` discard all local changes/commits and pull from upstream

git discard all local changes/commits and pull from upstream

git reset --hard origin/master

git pull origin master

@podolskyi
podolskyi / message_queue_pipeline.py
Created May 20, 2016 11:40 — forked from azizmb/message_queue_pipeline.py
Scrapy pipeline to enque scraped items to message queue using carrot
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.exceptions import DropItem
from scrapy.utils.serialize import ScrapyJSONEncoder
from carrot.connection import BrokerConnection
from carrot.messaging import Publisher
from twisted.internet.threads import deferToThread

Writing better python code


Swapping variables

Bad code

@podolskyi
podolskyi / gist:60d52fbd92ecc269a3d1
Created February 24, 2016 13:23 — forked from lxneng/gist:555aadfa60e2656df320
Simple Python Parallelism
from multiprocessing import Pool
from functools import partial
def parallel_function(f):
def parallize(f, seq):
pool = Pool()
pool.map(f, seq)
pool.close()
pool.join()
@podolskyi
podolskyi / css_resources.md
Last active August 29, 2015 14:23 — forked from jookyboi/css_resources.md
CSS libraries and guides to bring some order to the chaos.

Libraries

  • 960 Grid System - An effort to streamline web development workflow by providing commonly used dimensions, based on a width of 960 pixels. There are two variants: 12 and 16 columns, which can be used separately or in tandem.
  • Compass - Open source CSS Authoring Framework.
  • Bootstrap - Sleek, intuitive, and powerful mobile first front-end framework for faster and easier web development.
  • Font Awesome - The iconic font designed for Bootstrap.
  • Zurb Foundation - Framework for writing responsive web sites.
  • SASS - CSS extension language which allows variables, mixins and rules nesting.
  • Skeleton - Boilerplate for responsive, mobile-friendly development.

Guides

@podolskyi
podolskyi / python_resources.md
Last active August 29, 2015 14:23 — forked from jookyboi/python_resources.md
Python-related modules and guides.

Packages

  • lxml - Pythonic binding for the C libraries libxml2 and libxslt.
  • boto - Python interface to Amazon Web Services
  • Django - Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.
  • Fabric - Library and command-line tool for streamlining the use of SSH for application deployment or systems administration task.
  • PyMongo - Tools for working with MongoDB, and is the recommended way to work with MongoDB from Python.
  • Celery - Task queue to distribute work across threads or machines.
  • pytz - pytz brings the Olson tz database into Python. This library allows accurate and cross platform timezone calculations using Python 2.4 or higher.

Guides