This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def queryset_generator(queryset, chunksize=1000): | |
""" | |
Iterate over a Django Queryset ordered by the primary key | |
This method loads a maximum of chunksize (default: 1000) rows in its | |
memory at the same time while django normally would load all rows in its | |
memory. Using the iterator() method only causes it to not preload all the | |
classes. | |
Note that the implementation of the generator does not support ordered query sets. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
queryset_generator and queryset_list_generator based on: | |
https://gist.github.com/897894 | |
''' | |
#=============================================================================== | |
# imports (in alphabetical order by package, then by name) | |
#=============================================================================== | |
# python standard libraries |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ======================================== | |
# Testing n-gram analysis in ElasticSearch | |
# ======================================== | |
curl -X DELETE localhost:9200/ngram_test | |
curl -X PUT localhost:9200/ngram_test -d ' | |
{ | |
"settings" : { | |
"index" : { | |
"analysis" : { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from scrapy import log | |
from scrapy.item import Item | |
from scrapy.http import Request | |
from scrapy.contrib.spiders import XMLFeedSpider | |
def NextURL(): | |
""" | |
Generate a list of URLs to crawl. You can query a database or come up with some other means | |
Note that if you generate URLs to crawl from a scraped URL then you're better of using a |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
import redis | |
import random | |
import pylibmc | |
import sys | |
r = redis.Redis(host = 'localhost', port = 6389) | |
mc = pylibmc.Client(['localhost:11222']) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-*- coding:utf-8 - *- | |
def load_dataset(): | |
"Load the sample dataset." | |
return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]] | |
def createC1(dataset): | |
"Create a list of candidate item sets of size one." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an example how to perform multi-select faceting in ElasticSearch. | |
Selecting multiple values from the same facet will result in an OR filter between each of the values: | |
(facet1.value1 OR facet1.value2) | |
Faceting on more than one facet will result in an AND filter between each facet: | |
(facet1.value1 OR facet1.value2) AND (facet2.value1) | |
I have chosen to update the counts for each facet the selected value DOES NOT belong to since we are performing an AND between each facet. I have included an example that shows how to keep the counts if you don't want to do this (filter0.sh). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from datetime import datetime | |
from datetime import timedelta | |
def queryset_generator(queryset, chunksize=1000): | |
""" | |
Iterate over a Django Queryset ordered by the primary key | |
This method loads a maximum of chunksize (default: 1000) rows in its | |
memory at the same time while django normally would load all rows in its | |
memory. Using the iterator() method only causes it to not preload all the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from kombu import Exchange | |
from kombu import Queue | |
from kombu import BrokerConnection | |
class ProduceConsume(object): | |
def __init__(self, exchange_name, **options): | |
exchange = Exchange(exchange_name, type='fanout', durable=False) | |
queue_name = options.get('queue', exchange_name+'_queue') | |
self.queue = Queue(queue_name ,exchange) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
""" | |
LICENSE: BSD (same as pandas) | |
example use of pandas with oracle mysql postgresql sqlite | |
- updated 9/18/2012 with better column name handling; couple of bug fixes. | |
- used ~20 times for various ETL jobs. Mostly MySQL, but some Oracle. | |
to do: | |
save/restore index (how to check table existence? just do select count(*)?), | |
finish odbc, |
OlderNewer