Skip to content

Instantly share code, notes, and snippets.

@retospect
Created June 2, 2011 07:34
Show Gist options
  • Save retospect/1004073 to your computer and use it in GitHub Desktop.
Save retospect/1004073 to your computer and use it in GitHub Desktop.
Amazon Turk data extraction hack
#!/opt/local/bin/ruby
require "ruby-debug"
filename = 'ex_dataset.csv'
file = File.new(filename, 'r')
line_count = -1
col_headers = {}
results = Hash.new(0)
file.each_line do |row|
line_count = line_count + 1
ori_columns = row.gsub("\n",'').gsub("\r","").split(",")
columns = []
ori_columns.each{|c| columns.push(c.gsub('"',''))}
if line_count < 1
col_i = 3
columns.each do |col|
if col =~ /^Answer/
puts "#{col_i} #{col}\n"
col_headers[col_i] = col.gsub('Answer.','')
end
col_i += 1
end
next
end
col_i = 0
datafields = {}
columns.each do |col|
if col_headers[col_i]
puts "#{col_headers[col_i]} #{col}\n"
datafields[col_headers[col_i]] = col
end
col_i += 1
end
datafields.each do |category, value|
results["#{category}:#{value}:#{datafields['answer']}"] += 1
end
end
results.sort.each do |name, value|
puts "#{name}= #{value}"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment