Skip to content

Instantly share code, notes, and snippets.

View podolskyi's full-sized avatar

Oleksandr Podolskyi podolskyi

  • Ukraine
View GitHub Profile
@podolskyi
podolskyi / txspider.py
Created July 26, 2016 15:10 — forked from rmax/txspider.py
Using twisted deferreds in a scrapy spider!
$ scrapy runspider txspider.py
2016-07-05 23:11:39 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-07-05 23:11:39 [scrapy] INFO: Overridden settings: {}
2016-07-05 23:11:40 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
2016-07-05 23:11:40 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
@podolskyi
podolskyi / python_start
Created June 26, 2016 21:16
install library for python
sudo apt-get install -y python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
pip install virtualenv
@podolskyi
podolskyi / git command.markdown
Created May 22, 2016 17:09 — forked from nasirkhan/git command.markdown
`git` discard all local changes/commits and pull from upstream

git discard all local changes/commits and pull from upstream

git reset --hard origin/master

git pull origin master

@podolskyi
podolskyi / message_queue_pipeline.py
Created May 20, 2016 11:40 — forked from azizmb/message_queue_pipeline.py
Scrapy pipeline to enque scraped items to message queue using carrot
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.exceptions import DropItem
from scrapy.utils.serialize import ScrapyJSONEncoder
from carrot.connection import BrokerConnection
from carrot.messaging import Publisher
from twisted.internet.threads import deferToThread
@podolskyi
podolskyi / install.sh
Last active April 10, 2016 07:58
Installing components
sudo apt-get update
sudo apt-get install -y python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
pip install Scrapy
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install -y mongodb-org
sudo service mongod start
pip install pymongo
@podolskyi
podolskyi / curl_proxy.sh
Created March 28, 2016 08:01
curl with proxy
# -x, --proxy <[protocol://][user:password@]proxyhost[:port]>
#
# Use the specified HTTP proxy.
# If the port number is not specified, it is assumed at port 1080.
curl -x http://proxy_server:proxy_port --proxy-user username:password -L http://url
@podolskyi
podolskyi / datetime.py
Created March 22, 2016 01:31
Manipulate with datetime in Python. Convert from string.
import datetime
# datetime objest to string
datetime.datetime.today().strftime("%m/%d/%Y %H:%M")
# from string to datetime object
datetime.datetime.strptime('Mar 22, 2016 00:00', "%Y%m%d %H:%M")
# плюс день
datetime.datetime.today() + datetime.timedelta(days=1)
@podolskyi
podolskyi / xpath_following.py
Last active March 16, 2016 04:11
Example XPath following, next tag
# Following next tag
response.xpath('//h2[contains(text(), "Contact information")]/following::table[1]//text()').extract()
# Tag contains
//person[contains(firstname, 'Kerr') and contains(lastname, 'och')]
@podolskyi
podolskyi / .gitignore_python
Created March 1, 2016 19:02
.gitignore file focused on PyCharm
# Compiled source #
###################
*.com
*.class
*.dll
*.exe
*.o
*.so
*.pyc
@podolskyi
podolskyi / py_requests_download.py
Created February 24, 2016 14:15
Python requests download file
# http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
return local_filename