Skip to content

Instantly share code, notes, and snippets.

View mrflip's full-sized avatar

Philip (flip) Kromer mrflip

View GitHub Profile
javascript:(function(){var hasInnerText=(document.getElementsByTagName("body")[0].innerText != undefined) ? true : false; str = (hasInnerText ? document.body.innerText : document.body.textContent ) ;location.href='http://dilbot.heroku.com/projects/2/coverage/new?url='+encodeURIComponent(window.location.href)+'&title='+encodeURIComponent(document.title)+'&v=1&snippet='+encodeURIComponent(str.substring(0,1900))})()
#!/usr/bin/env ruby
# run like so:
# $> ruby normalize.rb --run=local data/sizes.tsv data/normalized_sizes.tsv
require 'rubygems'
require 'wukong'
require 'active_support/core_ext/enumerable' # for array#sum
module Normalize
class Mapper < Wukong::Streamer::RecordStreamer
def process(country, *sizes)
ruby -rfileutils -e 'Dir["*.bz2"].each do |f| nf = f.dup; nf.gsub!(/(\d{4})-(\d\d)-(\d\d)[\.\-]/, "\\1\\2\\3-") ; nf.gsub!(/(koko_\w+)-(\d+)((?:-[^k]|.).*)/, "koko-\\2-\\1\\3") ; nf = "koko-"+nf if (nf =~ /^[0-9]{8}/) ; if nf != f ; then puts "%-80s\t%s"%[f,nf] ; FileUtils.mv(f,nf) ; end ; end'
# unscrewup the hostname every way I can think of
sudo hostname chef.infinitemonkeys.info ;
sudo bash -c 'echo "184.72.52.22 chef.infinitemonkeys.info " >> /etc/hosts' ;
sudo bash -c 'echo "chef.infinitemonkeys.info" > /etc/hostname' ;
sudo sysctl -w kernel.hostname=chef.infinitemonkeys.info ;
# Broaden the apt universe
sudo sed -i 's/universe/multiverse universe/' /etc/apt/sources.list
sudo bash -c 'echo "deb http://us.archive.ubuntu.com/ubuntu karmic main restricted" >> /etc/apt/sources.list ';
sudo bash -c 'echo "deb http://us.archive.ubuntu.com/ubuntu karmic universe multiverse" >> /etc/apt/sources.list ';
file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz"
We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
11867 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (2003-2007 Test Data) data/11867/15355/LEGACY/infochimps_dataset_11867_download_15355-csv.zip data/11867/15355/LEGACY/infochimps_dataset_11867_download_15355-csv.tar.bz2
13382 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (2003-2007 Test Data) data/11867/16679/20100217162425/infochimps_dataset_11867_download_16679-csv.tar.bz2 data/11867/16679/20100217162425/infochimps_dataset_11867_download_16679-csv.zip
13380 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (Single Year Data (2003)) data/11867/16146/20100211021551/infochimps_dataset_11867_download_16146-csv.tar.bz2 data/11867/16146/20100211021551/infochimps_dataset_11867_download_16146-csv.zip
13381 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (Single Year Data (2004)) data/11867/16147/20100217155746/infochimps_dataset_11867_download_16147-csv.tar.bz2 data/11867/16147/20100217155746/infochimps_dataset_1
id title title id id fmt archive pkg_size path
83 Teenagers -- Births and Birth Rates, by Age, Race, and Hispanic Origin: 1990 to 2005 Original Data 247 493 csv zip 7052 ics/data/pkgd/demographics/us/statisticalabstract/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR-csv.zip
83 Teenagers -- Births and Birth Rates, by Age, Race, and Hispanic Origin: 1990 to 2005 Original Data 247 494 csv tar.bz2 7052 ics/data/pkgd/demographics/us/statisticalabstract/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR-csv.
#
# For a stream process that sees a significant number of duplicated heavyweight
# objects, it may be better to deduplicate them midflight (rather than, say,
# using a reducer to effectively `cat | sort | uniq` the data).
#
# This uses a cassandra key-value store to track unique IDs and prevent output
# of any record already present in the database
#
# Things you have to do:
#
def should_emit? record
key = conditional_output_key(record)
if record.class.mutable?
cached = key_cache.get(key)
(cached['t'] < record.timestamp)
else
super(record)
end
end
@mrflip
mrflip / unscrewup_hostname_ec2.sh
Created April 5, 2010 19:47
Un-screwup the hostname on ec2 -- otherwise rabbitmq will kick your dog and chef will stop returning your phone calls
# unscrewup the hostname every way I can think of. Rabbitmq is a real buttmunch
# about the hostname -- it will hang forever on bootstrap if `hostname -s`
# doesn't resolve back to this host. One of the following fixes this, not sure which.
export HOSTNAME=chef.YOURDOMAIN.COM ;
PUBLIC_IP=XXX.XXX.XX.XX
sudo kill `cat /var/run/dhclient.eth0.pid` # kill dhclient
sudo bash -c "echo '$HOSTNAME' > /etc/hostname" ;
sudo hostname -F /etc/hostname ;
sudo sysctl -w kernel.hostname=$HOSTNAME ;