Skip to content

Instantly share code, notes, and snippets.

View tomachalek's full-sized avatar

Tomas Machalek tomachalek

View GitHub Profile
@tomachalek
tomachalek / gist:6122926
Last active December 20, 2015 11:19
URL shortener (based on md5 alg.)
# an URL shortener
from hashlib import md5
chars = (
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'
)
@tomachalek
tomachalek / gist:6206674
Last active December 20, 2015 22:39
Run your tasks asynchronously and do some work when all is done.
(function (exports) {
'use strict';
/**
*
* @constructor
*/
function DoAll() {
this.actions = [];
this.confirms = [];
@tomachalek
tomachalek / gist:699037e4f6237f569649
Last active August 29, 2015 14:02
Processing of 50 million rows benchmark file using my Orzo.js (http://www.orzojs.org/). The task is to calculate count, max, min, avg. Inspired by an article at http://padak.keboola.com/agregace-v-mongodb-oracle-redshift-bigquery-voltdb-vertica-elasticsearch-a-gooddata. Processing time on Intel i2700 (average from several attempts): 142sec
dataChunks(4, function (idx) {
return orzo.fileChunkReader(env.inputArgs[0], idx, null, 1);
});
applyItems(function (dataChunk, map) {
while (dataChunk.hasNext()) {
map(dataChunk.next());
}
});
@tomachalek
tomachalek / prototype-quiz.js
Created July 30, 2014 07:11
JavaScript prototype inheritance quiz
(function () {
'use strict';
var foo, bar1, bar2;
function Foo () {
this.counter = { i : 0};
this.counter2 = 0;
}
@tomachalek
tomachalek / gist:2e6fdce35e565ee2dccf
Created May 4, 2015 09:13
KonText vs. Manatee - multi-dimensional frequency distribution
"""
KonText vs. Manatee - multi-dimensional frequency distribution
words = manatee.StrVector()
freqs = manatee.NumVector()
norms = manatee.NumVector()
crit = 'opus.rokvyd 0 opus.genre 0'
corpus.freq_dist(range_stream, crit, limit, words, freqs, norms)
"""
@tomachalek
tomachalek / imgshuffle.py
Created October 30, 2015 13:30
A script to shuffle wallpapers from a source directory to a dest one on our home NAS
import os
import argparse
import random
import hashlib
from functools import partial
import shutil
class TreeWalker(object):
@tomachalek
tomachalek / transform.py
Created January 13, 2016 17:40
transform prev. version of logs in ES
import sys
import json
import hashlib
def process_item(item):
rec = item['_source']
rec['isQuery'] = rec.pop('entryQuery')
if rec.get('action', None) in ('wsketch', 'thes', 'wsdiff'):
rec['isQuery'] = True
@tomachalek
tomachalek / quickselect_algorithm.py
Last active January 27, 2016 20:49
An implementation of the 'QuickSelect' algorithm (a.k.a. Hoare's selection algorithm)
def partition(data, left, right, pivot_idx):
pivot_value = data[pivot_idx]
real_pivot_idx = left
data[pivot_idx], data[right] = data[right], data[pivot_idx]
for i in range(left, right + 1):
if data[i] < pivot_value:
data[i], data[real_pivot_idx] = data[real_pivot_idx], data[i]
real_pivot_idx += 1
data[right], data[real_pivot_idx] = data[real_pivot_idx], data[right]
return real_pivot_idx
@tomachalek
tomachalek / ucnkdynfn.cc
Last active June 7, 2016 08:15
A dynamic attribute function(s) for the Manatee corpus engine
#include <string>
/*
How to install:
1) compile the module:
g++ -Wall -fPIC -DPIC -shared -o ucnkdynfn.so ucnkdynfn.cc
function Nothing() {}
function Just(x) {
this.x = x;
}
function divide(a, b) {
if (typeof a !== 'number' || typeof b !== 'number' || b === 0) {
return new Nothing();