Skip to content

Instantly share code, notes, and snippets.

@mduvall
Created May 14, 2012 20:57
Show Gist options
  • Save mduvall/2696885 to your computer and use it in GitHub Desktop.
Save mduvall/2696885 to your computer and use it in GitHub Desktop.
Scrape a bunch of idioms, or something.
def scrape_all_the_idioms
url = "http://www.learnenglishfeelgood.com/americanidioms/lefgidioms"
idioms = {}
("b".."z").each do |letter|
idioms[letter] = {}
doc = Nokogiri::HTML(open(url + "_" + letter + ".html"))
idiom_definition = 3
doc.css("#content .blue").each do |nodes|
definition = doc.xpath("//*[@id='content']/text()[#{idiom_definition}]")
idioms[nodes.content] = doc.xpath("//*[@id='content']/text()[#{idiom_definition}]").text.strip!
puts nodes.content + " " + doc.xpath("//*[@id='content']/text()[#{idiom_definition}]").text.strip!
idiom_definition += 1
end
end
idioms
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment