Skip to content

Instantly share code, notes, and snippets.

View avibryant's full-sized avatar

Avi Bryant avibryant

  • Galiano Island, BC
View GitHub Profile
class File
def seek_to(str)
until eof?
start = pos
buf = read(10000)
if(offset = buf.index(str))
seek(start + offset + str.size)
return true
else
seek(start + 5000)
require 'stringio'
require 'base64'
def read_varint(io)
value = index = 0
begin
byte = io.readchar
value |= (byte & 0x7f) << (7 * index)
index += 1
end while (byte & 0x80).nonzero?
@avibryant
avibryant / loess.js
Created August 17, 2011 15:45
Loess smoothing
//adapted from the LoessInterpolator in org.apache.commons.math
function loess_pairs(pairs, bandwidth)
{
var xval = pairs.map(function(pair){return pair[0]});
var yval = pairs.map(function(pair){return pair[1]});
console.log(xval);
console.log(yval);
var res = loess(xval, yval, bandwidth);
console.log(res);
return xval.map(function(x,i){return [x, res[i]]});
@avibryant
avibryant / preg.rb
Last active September 29, 2015 15:47
require 'date'
DUE_DATE = "2013-05-19"
#data taken from http://spacefem.com/pregnant/charts/duedate2.php
#starts at day 222
DATA = [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 2, 1, 4, 2, 2, 1, 4, 6, 7, 5, 1, 5, 8, 7, 9, 10, 11, 13, 13, 18,
14, 13, 9, 27, 29, 27, 31, 27, 26, 36, 43, 43, 51, 67, 74, 60, 47,
@avibryant
avibryant / Main.java
Created July 17, 2012 21:50 — forked from ceteri/ Main.java
Cascading for the Impatient, part 3
class WordCount(args : Args) extends Job(args) {
Tsv(args("input"), ('doc_id, 'text))
.flatMapTo('text -> 'token){line : String => line.split("[ \\[\\]\\(\\),.]")}
.map('token -> 'token){token : String => token.trim.toLowerCase}
.filter('token){token : String => token.length > 0}
.groupBy('token){g => g.size}
.write(Tsv(args("output")))
}
@avibryant
avibryant / gist:3200554
Created July 29, 2012 17:47
The Regex Game
The Regex Game
Avi Bryant
This is a game for two programmers, although it's easy to imagine variations for more.
It can be played over email, twitter, or IM, but it's easy to imagine a custom web app for it, and I encourage someone to build one.
Each player starts by thinking of a regular expression. The players should decide beforehand on dialect and length restrictions (eg, has to be JavaScript-compatible and under 20 characters).
They don't reveal the Regex, but if playing over email etc, should send each other a difficult to brute force hash (eg bcrypt) of the Regex for later verification.
They do reveal two strings: one which the Regex will match, and one which it will not.
@avibryant
avibryant / swimlines.rb
Created September 5, 2012 21:25
Simple visualization of hadoop job history files
# visualize the output with gg.js
# gg({layers: [{ geometry: 'line', mapping: { x: 'minutes', y: 'task', group: 'stage', color: 'type'}}]});
def parse(line)
output = {}
parts = line.split(/[ "]/)
output["TYPE"] = parts.shift
while(parts.size > 0)
next_part = parts.shift
if next_part =~ /^(\w+)=$/
@avibryant
avibryant / gist:3802616
Created September 28, 2012 23:36
Likelihood ratio test for binomials
def likelihoodRatio(k1 : Int, n1 : Int, k2 : Int, n2 : Int) = {
def kLogP(k : Int, p : Double) = if(k == 0) 0 else k * math.log(p)
def logL(p : Double, k : Int, n : Int) = kLogP(k, p) + kLogP(n - k, 1 - p)
val p1 = k1.toDouble / n1.toDouble
val p2 = k2.toDouble / n2.toDouble
val p = (k1 + k2).toDouble / (n1 + n2).toDouble
logL(p1, k1, n1) + logL(p2, k2, n2) - logL(p, k1, n1) - logL(p, k2, n2)
}
=begin
Adapatation of streaming DISCO algorithm to parallel streams/monoids.
Each instance of Disco maintains @counts which is just #(x)
as well as @h which stores (q,n) for each (x,y) pair.
n corresponds conceptually to the number of emitted values for that pair, q corresponds (as in the paper)
to the probability with which they were emitted.
A separate instance is created for each dimension, these are then merged in any order.
Initialize with a single dimension by setting counts(w) to 1 for each w
in the dimension, and h(q,n) to (1,1) for each (w1,w2).
package com.twitter.algebird
case class DyadicRange(maxValue : Long = Long.MaxValue) {
val levels = math.ceil(math.log(maxValue) / math.log(2)).toInt
def indicesForPoint(v : Long) = (1 to levels).map{level => (level, indexForPoint(v, level))}
def indicesForRange(start : Long, end : Long) : List[(Int,Long)] = indicesForRange(start, end, levels)
def indexForPoint(v : Long, level : Int) = v >> (level - 1)
def rangeForIndex(i : Long, level : Int) = (i << (level-1), ((i+1) << (level-1)) - 1)