This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
javascript:(function(){var hasInnerText=(document.getElementsByTagName("body")[0].innerText != undefined) ? true : false; str = (hasInnerText ? document.body.innerText : document.body.textContent ) ;location.href='http://dilbot.heroku.com/projects/2/coverage/new?url='+encodeURIComponent(window.location.href)+'&title='+encodeURIComponent(document.title)+'&v=1&snippet='+encodeURIComponent(str.substring(0,1900))})() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
# run like so: | |
# $> ruby normalize.rb --run=local data/sizes.tsv data/normalized_sizes.tsv | |
require 'rubygems' | |
require 'wukong' | |
require 'active_support/core_ext/enumerable' # for array#sum | |
module Normalize | |
class Mapper < Wukong::Streamer::RecordStreamer | |
def process(country, *sizes) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ruby -rfileutils -e 'Dir["*.bz2"].each do |f| nf = f.dup; nf.gsub!(/(\d{4})-(\d\d)-(\d\d)[\.\-]/, "\\1\\2\\3-") ; nf.gsub!(/(koko_\w+)-(\d+)((?:-[^k]|.).*)/, "koko-\\2-\\1\\3") ; nf = "koko-"+nf if (nf =~ /^[0-9]{8}/) ; if nf != f ; then puts "%-80s\t%s"%[f,nf] ; FileUtils.mv(f,nf) ; end ; end' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# unscrewup the hostname every way I can think of | |
sudo hostname chef.infinitemonkeys.info ; | |
sudo bash -c 'echo "184.72.52.22 chef.infinitemonkeys.info " >> /etc/hosts' ; | |
sudo bash -c 'echo "chef.infinitemonkeys.info" > /etc/hostname' ; | |
sudo sysctl -w kernel.hostname=chef.infinitemonkeys.info ; | |
# Broaden the apt universe | |
sudo sed -i 's/universe/multiverse universe/' /etc/apt/sources.list | |
sudo bash -c 'echo "deb http://us.archive.ubuntu.com/ubuntu karmic main restricted" >> /etc/apt/sources.list '; | |
sudo bash -c 'echo "deb http://us.archive.ubuntu.com/ubuntu karmic universe multiverse" >> /etc/apt/sources.list '; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
file_cache_path "/tmp/chef-solo" | |
cookbook_path "/tmp/chef-solo/cookbooks" | |
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz" |
We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11867 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (2003-2007 Test Data) data/11867/15355/LEGACY/infochimps_dataset_11867_download_15355-csv.zip data/11867/15355/LEGACY/infochimps_dataset_11867_download_15355-csv.tar.bz2 | |
13382 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (2003-2007 Test Data) data/11867/16679/20100217162425/infochimps_dataset_11867_download_16679-csv.tar.bz2 data/11867/16679/20100217162425/infochimps_dataset_11867_download_16679-csv.zip | |
13380 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (Single Year Data (2003)) data/11867/16146/20100211021551/infochimps_dataset_11867_download_16146-csv.tar.bz2 data/11867/16146/20100211021551/infochimps_dataset_11867_download_16146-csv.zip | |
13381 Texas Assessment of Knowedge and Skills (TAKS Exams) 2003-2007 (Single Year Data (2004)) data/11867/16147/20100217155746/infochimps_dataset_11867_download_16147-csv.tar.bz2 data/11867/16147/20100217155746/infochimps_dataset_1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
id | title | title | id | id | fmt | archive | pkg_size | path | |
---|---|---|---|---|---|---|---|---|---|
83 | Teenagers -- Births and Birth Rates, by Age, Race, and Hispanic Origin: 1990 to 2005 | Original Data | 247 | 493 | csv | zip | 7052 | ics/data/pkgd/demographics/us/statisticalabstract/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR-csv.zip | |
83 | Teenagers -- Births and Birth Rates, by Age, Race, and Hispanic Origin: 1990 to 2005 | Original Data | 247 | 494 | csv | tar.bz2 | 7052 | ics/data/pkgd/demographics/us/statisticalabstract/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR/statab2008_0083_Teenagers_BirthsAndBirthRatesByAgeR-csv. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# For a stream process that sees a significant number of duplicated heavyweight | |
# objects, it may be better to deduplicate them midflight (rather than, say, | |
# using a reducer to effectively `cat | sort | uniq` the data). | |
# | |
# This uses a cassandra key-value store to track unique IDs and prevent output | |
# of any record already present in the database | |
# | |
# Things you have to do: | |
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def should_emit? record | |
key = conditional_output_key(record) | |
if record.class.mutable? | |
cached = key_cache.get(key) | |
(cached['t'] < record.timestamp) | |
else | |
super(record) | |
end | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# unscrewup the hostname every way I can think of. Rabbitmq is a real buttmunch | |
# about the hostname -- it will hang forever on bootstrap if `hostname -s` | |
# doesn't resolve back to this host. One of the following fixes this, not sure which. | |
export HOSTNAME=chef.YOURDOMAIN.COM ; | |
PUBLIC_IP=XXX.XXX.XX.XX | |
sudo kill `cat /var/run/dhclient.eth0.pid` # kill dhclient | |
sudo bash -c "echo '$HOSTNAME' > /etc/hostname" ; | |
sudo hostname -F /etc/hostname ; | |
sudo sysctl -w kernel.hostname=$HOSTNAME ; |