Skip to content

Instantly share code, notes, and snippets.

@oddskool
oddskool / gen.engine async fetch testing
Created February 3, 2012 14:35
Sample test case for a Tornado WS that asynchronously calls a third-party WS
from tornado import ioloop , gen
from tornado.httpclient import AsyncHTTPClient, HTTPRequest
from tornado.web import asynchronous, RequestHandler, Application
from tornado.testing import AsyncHTTPTestCase
import sys
from tornado.ioloop import IOLoop
class MainHandler(RequestHandler):
url = u'http://www.google.com'
@asynchronous
@gen.engine
@oddskool
oddskool / gist:5249033
Created March 26, 2013 20:39
epsilon greedy algorithm
import random
class EpsilonGreedyBandit(Bandit):
"""
The best action (as much as the algorithm knows so far) is selected for
a proportion 1 - \epsilon of the trials, and another action is randomly
selected (with uniform probability) for a proportion \epsilon.
Parameters
----------
from collections import defaultdict
import re
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction import FeatureHasher
from sklearn.linear_model.stochastic_gradient import SGDClassifier
from sklearn.externals import joblib
def tokens(doc):
@oddskool
oddskool / parse_aws_s3_billing.py
Created September 10, 2013 07:00
Simplistic script to parse the detailed AWS billing CSV file. Script displays cost of S3 operations broken down per region, bucket and usage type (either storage or network). It also sums up the amount of storage used per bucket. Output is filtered wrt to costs < 1$. See http://docs.aws.amazon.com/awsaccountbilling/latest/about/programaccess.html
# -*- coding:utf-8 -*-
'''
Simplistic script to parse the detailed AWS billing CSV file.
Script displays cost of S3 operations broken down per region, bucket and usage
type (either storage or network). It also sums up the amount of storage used per bucket.
Output is filtered wrt to costs < 1$.
See http://docs.aws.amazon.com/awsaccountbilling/latest/about/programaccess.html for
how to set up programmatic access to your billing.
@oddskool
oddskool / gist:6509465
Created September 10, 2013 13:33
Sums size of subdirs in a S3 bucket (and per storage class)
import sys
import boto
from collections import defaultdict
s3 = boto.connect_s3()
bucket = s3.lookup(sys.argv[1])
total_bytes = defaultdict(int)
def process(key):
@oddskool
oddskool / gist:7300982
Last active December 27, 2015 08:59
Benchmark SGD prediction time with dense/sparse coefficients. invoke with $ kernprof.py -l sparsity_benchmark.py && python -m line_profiler sparsity_benchmark.py.lprof
from scipy.sparse.csr import csr_matrix
import sys
import numpy as np
from scipy.sparse.base import issparse
from sklearn.linear_model.stochastic_gradient import SGDRegressor
from sklearn.metrics import r2_score
np.random.seed(42)
n_samples, n_features = 300, 30
@oddskool
oddskool / gist:8824062
Created February 5, 2014 13:56
Julia tutorial
{
"metadata": {
"language": "Julia",
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@oddskool
oddskool / ddm.ipynb
Created July 17, 2014 07:38
Drop Detection Methods
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@oddskool
oddskool / gist:27476a1e22df357de798
Last active January 20, 2023 17:07
load CSV data to CSR matrix
import array
import csv
import numpy as np
from scipy.sparse import csr_matrix
def csv_to_csr(f):
"""Read content of CSV file f, return as CSR matrix."""
data = array.array("f")
indices = array.array("i")
indptr = array.array("i", [0])