Skip to content

Instantly share code, notes, and snippets.

View tomachalek's full-sized avatar

Tomas Machalek tomachalek

View GitHub Profile
@tomachalek
tomachalek / fdist_gen.py
Created June 7, 2017 14:55
Generate critical values for F-distribution
import scipy.stats
import json
import sys
df_vals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 24, 30, 40, 60, 120, 1000]
def calc(alpha):
ans = []
for df1 in df_vals:
@tomachalek
tomachalek / cqlgrammar.txt
Last active October 7, 2016 10:13
CQL grammar (tmp)
/*
* Copyright (c) 1999-2015 Pavel Rychly, Milos Jakubicek
* Copyright (c) 2016 Tomas Machalek
*
* This is a PEG version of an original CQL grammar distributed with manatee-open
* corpus search engine.
*/
Query =
Sequence _ (BINAND GlobPart)? (NOT? (KW_WITHIN / KW_CONTAINING) _ WithinContainingPart)* SEMI
function Nothing() {}
function Just(x) {
this.x = x;
}
function divide(a, b) {
if (typeof a !== 'number' || typeof b !== 'number' || b === 0) {
return new Nothing();
@tomachalek
tomachalek / ucnkdynfn.cc
Last active June 7, 2016 08:15
A dynamic attribute function(s) for the Manatee corpus engine
#include <string>
/*
How to install:
1) compile the module:
g++ -Wall -fPIC -DPIC -shared -o ucnkdynfn.so ucnkdynfn.cc
@tomachalek
tomachalek / quickselect_algorithm.py
Last active January 27, 2016 20:49
An implementation of the 'QuickSelect' algorithm (a.k.a. Hoare's selection algorithm)
def partition(data, left, right, pivot_idx):
pivot_value = data[pivot_idx]
real_pivot_idx = left
data[pivot_idx], data[right] = data[right], data[pivot_idx]
for i in range(left, right + 1):
if data[i] < pivot_value:
data[i], data[real_pivot_idx] = data[real_pivot_idx], data[i]
real_pivot_idx += 1
data[right], data[real_pivot_idx] = data[real_pivot_idx], data[right]
return real_pivot_idx
@tomachalek
tomachalek / transform.py
Created January 13, 2016 17:40
transform prev. version of logs in ES
import sys
import json
import hashlib
def process_item(item):
rec = item['_source']
rec['isQuery'] = rec.pop('entryQuery')
if rec.get('action', None) in ('wsketch', 'thes', 'wsdiff'):
rec['isQuery'] = True
@tomachalek
tomachalek / imgshuffle.py
Created October 30, 2015 13:30
A script to shuffle wallpapers from a source directory to a dest one on our home NAS
import os
import argparse
import random
import hashlib
from functools import partial
import shutil
class TreeWalker(object):
@tomachalek
tomachalek / gist:2e6fdce35e565ee2dccf
Created May 4, 2015 09:13
KonText vs. Manatee - multi-dimensional frequency distribution
"""
KonText vs. Manatee - multi-dimensional frequency distribution
words = manatee.StrVector()
freqs = manatee.NumVector()
norms = manatee.NumVector()
crit = 'opus.rokvyd 0 opus.genre 0'
corpus.freq_dist(range_stream, crit, limit, words, freqs, norms)
"""
@tomachalek
tomachalek / prototype-quiz.js
Created July 30, 2014 07:11
JavaScript prototype inheritance quiz
(function () {
'use strict';
var foo, bar1, bar2;
function Foo () {
this.counter = { i : 0};
this.counter2 = 0;
}
@tomachalek
tomachalek / gist:699037e4f6237f569649
Last active August 29, 2015 14:02
Processing of 50 million rows benchmark file using my Orzo.js (http://www.orzojs.org/). The task is to calculate count, max, min, avg. Inspired by an article at http://padak.keboola.com/agregace-v-mongodb-oracle-redshift-bigquery-voltdb-vertica-elasticsearch-a-gooddata. Processing time on Intel i2700 (average from several attempts): 142sec
dataChunks(4, function (idx) {
return orzo.fileChunkReader(env.inputArgs[0], idx, null, 1);
});
applyItems(function (dataChunk, map) {
while (dataChunk.hasNext()) {
map(dataChunk.next());
}
});