Skip to content

Instantly share code, notes, and snippets.

@mattb
Created September 22, 2012 10:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mattb/3765807 to your computer and use it in GitHub Desktop.
Save mattb/3765807 to your computer and use it in GitHub Desktop.
Screenscraping London Open House venues into GeoJSON
# curl -O 'http://events.londonopenhouse.org/Venues?q=&Page=[1-80]'
# grep 'href.*/building/' * | cut -d\" -f 2 | sort -u | sed -e 's/.*/http:\/\/events.londonopenhouse.org\/&/' > buildings.txt
# gem install nokogiri
# ruby this_script.rb > loh.json
# ogr2ogr -f KML loh.kml loh.json
require 'nokogiri'
require 'json'
DETAILED = false # Google Maps complains if the KML gets too big
features = []
Dir.glob("[0-9-]*").each { |f|
html = open(f).read
doc = Nokogiri::HTML(html)
if f.match(/^-/)
f = f.slice(1,1000)
end
url = "http://events.londonopenhouse.org/building/#{f}"
x, lat, lng = doc.to_s.match(/new Microsoft.Maps.Location\((.*),(.*)\)/).to_a
lat = lat.to_f
lng = lng.to_f
if DETAILED
description = doc.css("#body").children.to_s.strip
else
description = doc.css("address").first.children.to_s.strip + "<br><a href='#{url}'>#{url}</a>"
end
properties = {
'name' => doc.title,
'description' => description,
'href' => url
}
if DETAILED
doc.css("fieldset.venueView > table > tr").each { |kv|
label = kv.css("td label").first.children.to_s.strip
value = kv.css("td")[1].children.to_s.strip
if label == "Event Types"
value = kv.css("td")[1].children.css("a").children.to_s.strip
end
properties[label] = value
}
end
feature = {
'type' => "Feature",
"id" => url,
"properties" => properties,
"geometry" => {
"type" => "Point",
"coordinates" => [lng, lat]
}
}
if lat != 0 and lng != 0
features << feature
end
}
data = {
'type' => "FeatureCollection",
'features' => features
}
puts JSON.generate(data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment