Skip to content

Instantly share code, notes, and snippets.

Dan Brown dbro

Block or report user

Report or block dbro

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@dbro
dbro / partition
Created Apr 24, 2014
partition lines of incoming data into separate files
View partition
#!/bin/bash
# write the incoming data on stdin to separate files depending on their contents
# for example, take a file that has different dates in it:
# 2014-02-15+12334567 hello there this is the first line
# 2014-02-16+23345678 hello there this is the second line
# this file can be used to send the first line to a file called /tmp/session.log-20140215-randomnumber
# and the second line to another file called /tmp/session.log-20140214-randomnumber
# it takes the first N characters from the line for use in the output filename
USAGE="usage: $0 -p \"/tmp/session-logs-ready-to-merge-\" [-s \"-ready-for-merge\" -r -c 10 -d'-'] [input_filename] [another_input_filename]
\tp\tprefix path
View correlation
#!/usr/bin/awk -f
# input should be parallel sets of numbers, one set on each line, tab-separated.
# the input does not need to be sorted
# non-numeric input anywhere on the line will cause the entire line to be ignored
# this uses a naive algorithm that may lose precision in some situations.
# see http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for an alternate algorithm
BEGIN {
FS="\t";
OFS=FS;
n=0; # count of rows (which contain a full set of data)
@dbro
dbro / summary-stats
Created Apr 24, 2014
summary statistics
View summary-stats
#!/usr/bin/awk -f
# input should be a set of numbers, one on each line. can be unsorted.
# non-numeric input will be ignored
# 2-pass algorithm, stores a copy of each number in an array in memory
# this could be changed to assume the input is sorted, but would still
# need to know in advance how many numbers to expect in the full set
# in order to calculate percentiles and the trimmed mean.
BEGIN {
FS="\t";
OFS=FS;
@dbro
dbro / urldecode.awk
Created Apr 24, 2014
command line url decoder
View urldecode.awk
#!/usr/bin/awk -f
BEGIN {
hextab["0"] = 0; hextab["8"] = 8;
hextab["1"] = 1; hextab["9"] = 9;
hextab["2"] = 2; hextab["A"] = 10; hextab["a"] = 10;
hextab["3"] = 3; hextab["B"] = 11; hextab["b"] = 11;
hextab["4"] = 4; hextab["C"] = 12; hextab["c"] = 12;
hextab["5"] = 5; hextab["D"] = 13; hextab["d"] = 13;
hextab["6"] = 6; hextab["E"] = 14; hextab["e"] = 14;
hextab["7"] = 7; hextab["F"] = 15; hextab["f"] = 15;
@dbro
dbro / Redis-lua-hyperloglog.py
Created Apr 1, 2014
Implementation of Hyper Log-Log probabilistic counting methods in lua inside redis, via python
View Redis-lua-hyperloglog.py
# Lua routines for use inside the Redis datastore
# Hyperloglog cardinality estimation
# ported from http://stackoverflow.com/questions/5990713/loglog-algorithm-for-counting-of-large-cardinalities
#
# Dan Brown, 2012. https://github.com/dbro
#
# note that lua needs to have the bitlib and murmur3 modules built in, and loaded by redis
#
# suitable for counting unique items from 0 to billions
# choose a k value to balance storage and precision objectives
@dbro
dbro / weeklyupdate.js
Created Apr 26, 2013
This is a Google Apps Script (https://script.google.com) that replicates the "Snippets" messaging process as used by Google internally. See the notes at the bottom of this page for more info.
View weeklyupdate.js
/* **************************************
Weekly Update Scripts
by Dan Brown, March 2013
For automatic collection of weekly
update messages from employees.
* sends reminder messages
* posts to public sites pages
@dbro
dbro / csvcut
Last active Aug 1, 2019 — forked from JoeGermuska/csvcut
Command line 'cut' utility that can handle csv quoting. This allows proper handling of fields that contain delimiters, both field and record delimiters like commas and newlines. Thanks to github.com/JoeGermuska for the initial version of the code.
View csvcut
#!/usr/bin/env python
"""
from https://gist.github.com/JoeGermuska/561347
Like cut, but for CSVs. To be used from a shell command line.
Note that fields are 1-based, similar to the UNIX 'cut' command.
Should use something better than getopt, but this works...
Usage:
You can’t perform that action at this time.