Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Extract piece of an HTML fragment with Nokogiri
#!/bin/env ruby19
require 'nokogiri'
# Nokogiri should have this as well
REPL = {
'&lt;' => '<',
'&le;' => '<=',
'&gt;' => '>',
'&ge;' => '>=',
'&amp;' => '&',
}
def REPL.unescape(s)
s.gsub /&\w+;/ do |m|
self[m] || m
end
end
str="&lt;p&gt;\n &lt;strong&gt;Location:&lt;/strong&gt; Columbus, Ohio\n&lt;/p&gt;\n\n&lt;"
s2 = REPL.unescape(str)
# p s2
doc = Nokogiri.HTML s2
puts doc
# p doc
puts 'elements'
doc.xpath('//p[strong]/text()').each do |elm|
p elm
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment