Skip to content

Instantly share code, notes, and snippets.

@parrish
Created June 11, 2015 17:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save parrish/862d6a3e2d887d5a144e to your computer and use it in GitHub Desktop.
Save parrish/862d6a3e2d887d5a144e to your computer and use it in GitHub Desktop.
Combines chimp discussion data with chimp classification data
#!/usr/bin/env ruby
date = ARGV[0]
unless date
puts "usage: ruby #{ __FILE__ } <date>"
puts " eg: ruby #{ __FILE__ } 2015-06-07"
exit
end
classifications_filename = "#{ date }_chimp_classifications.csv"
discussions_filename = "#{ date }_chimp_discussions.csv"
combined_filename = "#{ date }_chimp_combined.csv"
discussion_data = { }
IO.foreach(discussions_filename).with_index do |line, index|
next if index == 0
subject_id, tags, mentioned_in = line.chomp.split ','
discussion_data[subject_id] = { 'tags' => tags, 'mentioned_in' => mentioned_in }
end
File.open(combined_filename, 'w') do |out|
IO.foreach(classifications_filename).with_index do |line, index|
line.chomp!
if index == 0
out.puts "#{ line },tags,mentioned_in"
next
end
subject_id = line.split(',')[2].gsub '"', ''
talk_data = discussion_data[subject_id]
out.puts "#{ line },#{ talk_data['tags'] },#{ talk_data['mentioned_in'] }"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment