Skip to content

Instantly share code, notes, and snippets.

@kch
Created April 21, 2009 09:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kch/99037 to your computer and use it in GitHub Desktop.
Save kch/99037 to your computer and use it in GitHub Desktop.
my super duper html stripper regexp
class String
def strip_html!
gsub!(/
<
\/? # optional end tag
([\w:-]+) # tag name (capturing)
(?: # optional attribute set (allowing even for end tags)
(?: # group for attribute repetition
\s+ # mandatory space before first attribute
[\w:-]+ # attribute name
(?: # optional attribute value
\s*=\s* # optionally space-wrapped equal sign
(?: # attribute value group (for |)
'[^']*' | # either a single quoted attribute '#happy color coding
"[^"]*" | # or a double quoted attribute "#happy color coding
[^\s>]+ # or a non-space non tag end value
) # end attribute value group
)? # attr value is optional
)* # can have zero or more attributes
)? # may not have attributes at all
\s* # optional trailing spaces
\/? # optional self-closing empty tag
> # end tag
/ix, '')
self
end
def strip_html
dup.strip_html!
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment