Skip to content

Instantly share code, notes, and snippets.

@rklemme
Created October 19, 2010 20:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rklemme/635002 to your computer and use it in GitHub Desktop.
Save rklemme/635002 to your computer and use it in GitHub Desktop.
Extract piece of an HTML fragment with Nokogiri
#!/bin/env ruby19
require 'nokogiri'
# Nokogiri should have this as well
REPL = {
'&lt;' => '<',
'&le;' => '<=',
'&gt;' => '>',
'&ge;' => '>=',
'&amp;' => '&',
}
def REPL.unescape(s)
s.gsub /&\w+;/ do |m|
self[m] || m
end
end
str="&lt;p&gt;\n &lt;strong&gt;Location:&lt;/strong&gt; Columbus, Ohio\n&lt;/p&gt;\n\n&lt;"
s2 = REPL.unescape(str)
# p s2
doc = Nokogiri.HTML s2
puts doc
# p doc
puts 'elements'
doc.xpath('//p[strong]/text()').each do |elm|
p elm
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment