Skip to content

Instantly share code, notes, and snippets.

View msukmanowsky's full-sized avatar
🥳
Building the future of how companies work with elvex!

Mike Sukmanowsky msukmanowsky

🥳
Building the future of how companies work with elvex!
View GitHub Profile
@msukmanowsky
msukmanowsky / storm_transactional_mongodb.md
Last active August 29, 2015 13:56
Thoughts on how to achieve strict and opaque transactional Trident topologies with a hypothetical Mongo Trident state.

Storm Transactional Topologies with Mongo DB State

Strict Transactional State

  1. Batches for a given txid are always the same. Replays of batches for a txid will exact same set of tuples as the first time that batch was emitted for that txid.
  2. There's no overlap between batches of tuples (tuples are in one batch or another, never multiple).
  3. Every tuple is in a batch (no tuples are skipped)

Current DB document State

@msukmanowsky
msukmanowsky / save_dict_list.py
Created March 12, 2014 14:42
Handy little decorator to cache a list of dictionaries returned from some long running operation like web queries.
from functools import wraps
import csv
import os.path
def save_dict_list(filename, **kwargs):
"""Decorator to take the results of a function call (assumed to be a
``list`` of ``dicts``) and cache them in a local file via
csv.DictWriter and serialize them with csv.DictReader"""
def decorator(f):
@msukmanowsky
msukmanowsky / cbloomfilter.pyx
Last active August 29, 2015 13:59
An implementation of a Bloom Filter using Cython, still has a memory leak to debug.
# ported from https://github.com/jvirkki/libbloom
from cpython cimport bool
from libc.stdlib cimport malloc, calloc, free
from libc.string cimport memset
from libc.stdio cimport printf
from libc.math cimport log, ceil
from cpython.mem cimport PyMem_Malloc, PyMem_Free
DEF LN2_SQUARED = 0.480453013918201 # ln(2)^2
{
"metadata": {
"name": "",
"signature": "sha256:d0e242ec0ee3bf0798a38aba54eda99ab710de1890ddab0f0b6fd91939170314"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
@msukmanowsky
msukmanowsky / url_monkey.py
Created June 23, 2014 20:59
Monkey patches needed to fix a bug in how Unicode percent-encoded strings are handled in Python's unquote function.
import urlparse
import urllib
import urllib2
def patch_unquote():
urllib.unquote = unquote
urllib2.unquote = unquote
urlparse.unquote = unquote

Python 2.7 contains a bug when dealing with percent-encoded Unicode strings such as:

>>> import urlparse
>>> url = u"http%3A%2F%2F%C5%A1%C4%BC%C5%AB%C4%8D.org%2F"
>>> print "{!r}".format(urlparse.unquote(url))
u'http://\xc5\xa1\xc4\xbc\xc5\xab\xc4\x8d.org/'
>>> print urlparse.unquote(url)
http://šļū�.org/
<!DOCTYPE html>
<html lang="en">
<head>
<title>TODO</title>
<!-- CSS -->
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css">
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap-theme.min.css">
<style>
.done {
text-decoration: line-through;
@msukmanowsky
msukmanowsky / custom_code_bolt.py
Last active August 29, 2015 14:05
A custom code execution bolt, not yet tested.
import logging
from streamparse.bolt import Bolt
log = logging.getLogger("custom_code_bolt")
class CustomCodeBolt(Bolt):
@msukmanowsky
msukmanowsky / storm_version.py
Last active August 29, 2015 14:07
Parse Apache Storm versions in Python and do easy comparisons on them. You could probably even import something from here https://github.com/pypa/pip/blob/19e29fc2e8e57a671e584726655bbb42c6e15eee/pip/_vendor/distlib/version.py and it'd work just fine but haven't tested.
import re
class InvalidVersionException(Exception): pass
class StormVersion(object):
VERSION_RE = re.compile(r"(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)"
"(?P<older_patch>\.\d+)?(?P<other>.*)")
RC_RE = re.compile(r"-rc(?P<release_candidate>\d+)", re.IGNORECASE)
@msukmanowsky
msukmanowsky / CassandraConverters.scala
Last active August 29, 2015 14:08
Custom version of CassandraConverters.scala in the spark/examples/src/main/scala/org/apache/spark/examples/pythonconverters/CassandraConverters.scala. Provides better (though not perfect) serialization of keys and values for CqlOutputFormat.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*