Skip to content

Instantly share code, notes, and snippets.

@sixtyfive
Last active July 6, 2018 14:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sixtyfive/aa636b7befcded15d8d19736cab8d087 to your computer and use it in GitHub Desktop.
Save sixtyfive/aa636b7befcded15d8d19736cab8d087 to your computer and use it in GitHub Desktop.
Tiny Ruby/Nokogiri script to extract event data for Halle's Long Night of the Sciences into a more usable form than it is given in on their website
#!/usr/bin/env ruby
require 'nokogiri'
require 'csv'
CSV.open('LNDW.csv', 'a+') do |csv|
Dir.glob('html/*.html') do |filename|
puts "Processing #{filename}..."
doc = Nokogiri::HTML(File.read(filename))
doc.css('div.the-artist-horizontal').each do |event|
host = event.search('div.host span').text
title = event.search('.text-slider3 h3').text.strip.gsub(/\n/, '')
time = event.search('.text-slider3 h5 strong').text
info = event.search('.text-slider3 p')
info.search('span').each {|span| span.remove}
info = info.text.strip
place = info.scan(/(\n.*)/).last.first.strip.gsub(/\s+/, ' ')
info = info.gsub(/\n/, '').gsub(/\s+/, ' ')
csv << [time, place, host, title, info]
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment