Skip to content

Instantly share code, notes, and snippets.

@pgwillia
Created September 23, 2020 05:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pgwillia/4269aea629f0423e6593b210902cf288 to your computer and use it in GitHub Desktop.
Save pgwillia/4269aea629f0423e6593b210902cf288 to your computer and use it in GitHub Desktop.
conference speakers by frequency
require "csv"
headers = ["year", "date", "time", "speakers", "title", "abstract", "type", "tags", "slides", "video", "audio"]
# create new csv for the final output with the final header:w
CSV.open("all.csv", "wb", write_headers: true, headers: headers) do |csv|
Dir["csv/*.csv"].each do |path| # for each of your csv files
CSV.foreach(path, headers: true, return_headers: false) do |row| # don't output the headers in the rows
csv << row # append to the final file
end
end
end
table = CSV.parse(File.read("all.csv"), headers: true)
speakers = table.by_col["speakers"]
speakers = speakers.reject {|x| x.nil? }
speakers = speakers.map {|x| x.split "|" }.flatten
speakers = speakers.map {|x| x.gsub(/\(.*?\)/, '') }.flatten
speakers = speakers.map {|x| x.strip }.flatten
def frequency(a)
a.group_by do |e|
e
end.map do |key, values|
[key, values.size]
end
end
p frequency(speakers).sort_by(&:last)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment