Skip to content

Instantly share code, notes, and snippets.

@dalibor
Created December 29, 2011 12:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dalibor/1533943 to your computer and use it in GitHub Desktop.
Save dalibor/1533943 to your computer and use it in GitHub Desktop.
Ruby simple HTML parser
class HtmlParser
attr_accessor :url, :selector
def initialize(url, selector)
@url = url
@selector = selector
end
def content
doc = Nokogiri::HTML(open(url))
html_elements = doc.search(selector)
html_elements.map { |element| clean_whitespace(element.text) }.join(' ')
end
private
def clean_whitespace(text)
text.gsub(/\s{2,}|\t|\n/, ' ').strip
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment