Skip to content

Instantly share code, notes, and snippets.

View msukmanowsky's full-sized avatar

Mike Sukmanowsky msukmanowsky

View GitHub Profile
from collections import defaultdict
try:
import cStringIO as StringIO
except ImportError:
import StringIO
class EscapedLineReader(object):
"""Custom reader for files where we could have escaped new lines.
@msukmanowsky
msukmanowsky / AspectRatio.java
Last active May 11, 2018 12:38
A little Python script and a Java Pig UDF showing how to produce aspect ratios for any arbitrary screen resolution.
package com.parsely.pig.screens;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
/**
* This UDF takes a tuple with two positive ints specified 0 -> width
* and 1 -> height and returns the aspect ratio for the resolution.
@msukmanowsky
msukmanowsky / image_file_validator.py
Last active November 13, 2017 19:51
An ImageFieldRequired validator for a Flask WTForm.
from flask.ext.wtf import Form
from flask.ext.wtf.file import FileField
import imghdr
class ImageFileRequired(object):
"""
Validates that an uploaded file from a flask_wtf FileField is, in fact an
image. Better than checking the file extension, examines the header of
@msukmanowsky
msukmanowsky / storm_transactional_mongodb.md
Last active August 29, 2015 13:56
Thoughts on how to achieve strict and opaque transactional Trident topologies with a hypothetical Mongo Trident state.

Storm Transactional Topologies with Mongo DB State

Strict Transactional State

  1. Batches for a given txid are always the same. Replays of batches for a txid will exact same set of tuples as the first time that batch was emitted for that txid.
  2. There's no overlap between batches of tuples (tuples are in one batch or another, never multiple).
  3. Every tuple is in a batch (no tuples are skipped)

Current DB document State

@msukmanowsky
msukmanowsky / save_dict_list.py
Created March 12, 2014 14:42
Handy little decorator to cache a list of dictionaries returned from some long running operation like web queries.
from functools import wraps
import csv
import os.path
def save_dict_list(filename, **kwargs):
"""Decorator to take the results of a function call (assumed to be a
``list`` of ``dicts``) and cache them in a local file via
csv.DictWriter and serialize them with csv.DictReader"""
def decorator(f):
@msukmanowsky
msukmanowsky / cbloomfilter.pyx
Last active August 29, 2015 13:59
An implementation of a Bloom Filter using Cython, still has a memory leak to debug.
# ported from https://github.com/jvirkki/libbloom
from cpython cimport bool
from libc.stdlib cimport malloc, calloc, free
from libc.string cimport memset
from libc.stdio cimport printf
from libc.math cimport log, ceil
from cpython.mem cimport PyMem_Malloc, PyMem_Free
DEF LN2_SQUARED = 0.480453013918201 # ln(2)^2
{
"metadata": {
"name": "",
"signature": "sha256:d0e242ec0ee3bf0798a38aba54eda99ab710de1890ddab0f0b6fd91939170314"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
@msukmanowsky
msukmanowsky / url_monkey.py
Created June 23, 2014 20:59
Monkey patches needed to fix a bug in how Unicode percent-encoded strings are handled in Python's unquote function.
import urlparse
import urllib
import urllib2
def patch_unquote():
urllib.unquote = unquote
urllib2.unquote = unquote
urlparse.unquote = unquote

Python 2.7 contains a bug when dealing with percent-encoded Unicode strings such as:

>>> import urlparse
>>> url = u"http%3A%2F%2F%C5%A1%C4%BC%C5%AB%C4%8D.org%2F"
>>> print "{!r}".format(urlparse.unquote(url))
u'http://\xc5\xa1\xc4\xbc\xc5\xab\xc4\x8d.org/'
>>> print urlparse.unquote(url)
http://šļū�.org/
<!DOCTYPE html>
<html lang="en">
<head>
<title>TODO</title>
<!-- CSS -->
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css">
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap-theme.min.css">
<style>
.done {
text-decoration: line-through;