Skip to content

Instantly share code, notes, and snippets.

@singpolyma
Created November 20, 2015 19:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save singpolyma/38d945d0e7a7dccb160d to your computer and use it in GitHub Desktop.
Save singpolyma/38d945d0e7a7dccb160d to your computer and use it in GitHub Desktop.
Example algorithm to textify HTML of tweetish posts
require 'nokogiri'
def content_text(nodes)
nodes.map do |el|
if el.text? || el.attributes['class'].to_s.match(/\b(?:h\-card|vcard|h\-x\-username)\b/) || el.attributes['rel'].to_s.match(/\btag\b/)
el.text
elsif el.name == 'a'
href = el.attributes['href'].to_s
if el.text.strip == ''
''
elsif el.text.match(/[\/\.]/) && href.gsub(/[^\w\-_\/]/, '').include?(el.text.gsub(/[^\w\-_\/]/, ''))
href
else
el.text + " #{href}"
end
elsif el.name == 'img'
el.attributes['src'].to_s
else
content_text(el.children)
end
end.join
end
content_text(Nokogiri::HTML.fragment(item[:content].to_s).children)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment