Skip to content

Instantly share code, notes, and snippets.

@mrflip
Forked from fredrik/normalize.rb
Created January 15, 2010 15:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mrflip/278136 to your computer and use it in GitHub Desktop.
Save mrflip/278136 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
# run like so:
# $> ruby normalize.rb --run=local data/sizes.tsv data/normalized_sizes.tsv
require 'rubygems'
require 'wukong'
require 'active_support/core_ext/enumerable' # for array#sum
module Normalize
class Mapper < Wukong::Streamer::RecordStreamer
def process(country, *sizes)
sizes.map!(&:to_i)
sum = sizes.sum.to_f
normalized = sizes.map{|x| 100 * x/sum }
s = normalized.join(",")
yield [country, s]
end
end
end
Wukong::Script.new(Normalize::Mapper, nil).run
#!/usr/bin/env ruby
# run like so:
# $> ruby sizes.rb --run=local data/orders.tsv data/sizes
require 'rubygems'
require 'wukong'
module JeanSizes
class Mapper < Wukong::Streamer::RecordStreamer
def process(code,model,time,country,j1,j2,j3, n1,n2,c1, venue,n3,n4, *sizes)
yield [country, *sizes] if sizes.length == 13
end
end
#
# This uses a ListReducer. It's nice and simple, but requires first
# accumulating each key's records in memory.
#
class JeansListReducer < Wukong::Streamer::ListReducer
def finalize
return if values.empty?
sums = []; 13.times{ sums << 0 }
values.each do |country, *sizes|
sizes.map!(&:to_i)
sums = sums.zip(sizes).map{|sum, val| sum + val }
end
yield [key, *sums]
end
end
#
# This uses an AccumulatingReducer directly.
# It has the advantage of a minimal footprint.
#
class JeansAccumulatingReducer < Wukong::Streamer::AccumulatingReducer
attr_accessor :sums
# start the sum with 0 for each size
def start! *_
self.sums = []; 13.times{ self.sums << 0 }
end
# accumulate each size count into the sizes_sum
def accumulate country, *sizes
sizes.map!(&:to_i)
self.sums = self.sums.zip(sizes).map{|sum, val| sum + val }
end
# emit [country, size_0_sum, size_1_sum, ...]
def finalize
yield [key, sums].flatten
end
end
end
Wukong::Script.new(JeanSizes::Mapper, JeanSizes::JeansListReducer).run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment