Skip to content

Instantly share code, notes, and snippets.

View mrflip's full-sized avatar

Philip (flip) Kromer mrflip

View GitHub Profile
@mrflip
mrflip / gist:3886038
Created October 13, 2012 20:29
gitconfig
[core]
excludesfile = ~/.gitignore
editor = nano
askpass = /Users/flip/bin/git-password
# pager = less -FRSX
# whitespace = fix,-indent-with-non-tab,trailing-space,cr-at-eol
[credential]
helper = osxkeychain
[color]
diff = auto
require 'configliere' ; Settings.use :commandline
require 'gorillib'
require 'gorillib/data_munging'
require 'pry'
Settings.define :data_root, default: 's3n://bigdata.chimpy.us', description: "directory root for data to process"
Settings.resolve!
Pathname.register_paths(
@mrflip
mrflip / wu-lign-1.0
Created August 17, 2012 16:14
wu-lign makes your .tsv line up pretty
#!/usr/bin/env ruby
warn [ARGV.inspect, $0]
WULIGN_VERSION = "1.0"
USAGE= %Q{
# h1. wulign -- format a tab-separated file as aligned columns
#
# wulign will intelligently reformat a tab-separated file into a tab-separated,
{
"fancy_chars.rb": {},
"hello.txt|jinja": { "allinputs": true }
}
@mrflip
mrflip / datasets.md
Created August 9, 2012 20:01
Overview of Datasets

== Overview of Datasets ==

The examples in this book use the "Chimpmark" datasets: a set of freely-redistributable datasets, converted to simple standard formats, with traceable provenance and documented schema. They are the same datasets as used in the upcoming Chimpmark Challenge big-data benchmark. The datasets are:

  • Wikipedia English-language Article Corpus (wikipedia_corpus; 38 GB, 619 million records, 4 billion tokens): the full text of every English-language wikipedia article, in

  • Wikipedia Pagelink Graph (wikipedia_pagelinks; ) --

  • Wikipedia Pageview Stats (wikipedia_pageviews; 2.3 TB, about 250 billion records (FIXME: verify num records)) -- hour-by-hour pageview

@mrflip
mrflip / kindlegen.rb
Created August 8, 2012 08:12
Updated kindlegen recipe for homebrew
require 'formula'
class Kindlegen < Formula
url 'http://s3.amazonaws.com/kindlegen/KindleGen_Mac_i386_v2_5.zip'
homepage 'http://www.amazon.com/gp/feature.html?docId=1000234621'
md5 '8daf6956d54df8030b12ec9116945482'
version '2.5'
skip_clean 'bin'
# lib/silverware
require 'configliere'
require 'gorillib'
require 'gorillib/model'
require 'gorillib/collection'
require 'gorillib/collection/model_collection'
# bin/iron_cuke
@mrflip
mrflip / d3_graph_viewer.js
Created May 10, 2012 19:45
D3 Graph Visualizer
var cJSON_FILE = "miserables.json";
var width = 960,
height = 500;
var color = d3.scale.category20();
var force = d3.layout.force()
.charge(-120)
.linkDistance(30)
.size([width, height]);

"If you, as curators and archivists and generally anyone involved in the preservation of promotion of cultural heritage, think that the authority record is the pinnacle of your careers – that is, the most important thing you will leave behind – then you are about to be eaten by robots."

Aaron Straup-Cope, (Authority Records, Future Computers and Other) Unfinished Histories

@mrflip
mrflip / notes_and_links.md
Created March 21, 2012 16:38
Notes for ISchool class