The following benchmark output was generated from the codes at http://github.com/flavorjones/loofah/tree/master/benchmark
These results show the performance of Loofah scrubbing methods against comparable methods from other common open-source libraries:
- ActionView sanitize() and strip_tags()
- Sanitize sanitize()
- HTML5lib sanitize()
- HtmlFilter filter()
HTML of various sizes is tested:
- a large document (~98 KB)
- a sizable fragment (~3 KB)
- a small snippet (58 bytes)
Loofah wins by about 20% on large documents and fragments, but loses on small snippets.
Loofah's comparative slowness for small snippets is because Nokogiri uses libxml2, which has a constant "startup overhead" that is incurred before parsing HTML regardless of size. ActionPack's regular expressions have no such startup overhead.
The win for ActionView on small snippets comes at a cost, though. From the ActionView comments:
Please note that sanitizing user-provided text [with ActionView]
does not guarantee that the resulting markup is valid (conforming
to a document type) or even well-formed. The output may still
contain e.g. unescaped '<', '>', '&' characters and confuse
browsers.
Loofah will always generate well-formed and valid HTML with proper encoding and escaping. Something to keep in mind when choosing a sanitizing library. Just sayin'.
Loofah wins by between 60% and 100% on large documents and fragments, but loses again on small snippets.
See previous section for explanation and commentary.
Loofah wins on HTML of all sizes, between 13% and 280%.
Loofah wins on HTML of all sizes, between 300% and 1450%.
Yes. Not a typo. REXML is that slow.
Loofah wins by a factor of two on large and medium documents, but loses on small snippets.
HtmlFilter also uses regular expressions and hence cannot guarantee that the output markup is well-formed or valid.
Here's a more up to date comparison between Loofah, Sanitize, and HTMLFilter: https://github.com/rgrove/sanitize/blob/master/COMPARISON.md#performance-comparison