Skip to content

Instantly share code, notes, and snippets.

@flavorjones
Created August 19, 2009 05:46
Show Gist options
  • Save flavorjones/170193 to your computer and use it in GitHub Desktop.
Save flavorjones/170193 to your computer and use it in GitHub Desktop.
Comparison of Loofah against other Ruby HTML sanitization libraries

Overview of the Benchmark

The following benchmark output was generated from the codes at http://github.com/flavorjones/loofah/tree/master/benchmark

These results show the performance of Loofah scrubbing methods against comparable methods from other common open-source libraries:

  • ActionView sanitize() and strip_tags()
  • Sanitize sanitize()
  • HTML5lib sanitize()
  • HtmlFilter filter()

HTML of various sizes is tested:

  • a large document (~98 KB)
  • a sizable fragment (~3 KB)
  • a small snippet (58 bytes)

Head to Head against ActionView sanitize()

Loofah wins by about 20% on large documents and fragments, but loses on small snippets.

Loofah's comparative slowness for small snippets is because Nokogiri uses libxml2, which has a constant "startup overhead" that is incurred before parsing HTML regardless of size. ActionPack's regular expressions have no such startup overhead.

The win for ActionView on small snippets comes at a cost, though. From the ActionView comments:

Please note that sanitizing user-provided text [with ActionView]
does not guarantee that the resulting markup is valid (conforming
to a document type) or even well-formed.  The output may still
contain e.g. unescaped '<', '>', '&' characters and confuse
browsers.

Loofah will always generate well-formed and valid HTML with proper encoding and escaping. Something to keep in mind when choosing a sanitizing library. Just sayin'.

Head to Head against ActionView strip_tags()

Loofah wins by between 60% and 100% on large documents and fragments, but loses again on small snippets.

See previous section for explanation and commentary.

Head to Head against Sanitize sanitize()

Loofah wins on HTML of all sizes, between 13% and 280%.

Head to Head against HTML5lib sanitize()

Loofah wins on HTML of all sizes, between 300% and 1450%.

Yes. Not a typo. REXML is that slow.

Head to Head against HtmlFilter filter()

Loofah wins by a factor of two on large and medium documents, but loses on small snippets.

HtmlFilter also uses regular expressions and hence cannot guarantee that the output markup is well-formed or valid.

Nokogiri version: {"warnings"=>[], "libxml"=>{"loaded"=>"2.7.5", "binding"=>"extension", "compiled"=>"2.7.5"}, "nokogiri"=>"1.4.0"}
Loofah version: "0.4.1"
---------- rehearsal ----------
(... omitted for brevity ...)
---------- realsies ----------
HeadToHeadRailsSanitize
Large document, 98282 bytes (x100)
total single rel
Loofah::Helpers.sanitize 17.019 (0.170191) -
ActionView sanitize 21.525 (0.215252) 1.26x
Small fragment, 3178 bytes (x1000)
total single rel
Loofah::Helpers.sanitize 5.559 (0.005559) -
ActionView sanitize 5.653 (0.005653) 1.02x
Text snippet, 58 bytes (x10000)
total single rel
Loofah::Helpers.sanitize 4.272 (0.000427) -
ActionView sanitize 1.170 (0.000117) 0.27x
HeadToHeadRailsStripTags
Large document, 98282 bytes (x100)
total single rel
Loofah::Helpers.strip_tags 8.019 (0.080195) -
ActionView strip_tags 14.615 (0.146151) 1.82x
Small fragment, 3178 bytes (x1000)
total single rel
Loofah::Helpers.strip_tags 2.197 (0.002197) -
ActionView strip_tags 4.220 (0.004220) 1.92x
Text snippet, 58 bytes (x10000)
total single rel
Loofah::Helpers.strip_tags 2.070 (0.000207) -
ActionView strip_tags 0.931 (0.000093) 0.45x
HeadToHeadSanitizerSanitize
Large document, 98282 bytes (x100)
total single rel
Loofah :strip 9.919 (0.099188) -
Sanitize.clean 27.625 (0.276255) 2.79x
Small fragment, 3178 bytes (x1000)
total single rel
Loofah :strip 5.317 (0.005317) -
Sanitize.clean 5.811 (0.005811) 1.09x
Text snippet, 58 bytes (x10000)
total single rel
Loofah :strip 4.156 (0.000416) -
Sanitize.clean 4.235 (0.000423) 1.02x
HeadToHeadHtml5LibSanitize
Large document, 98282 bytes (x100)
total single rel
Loofah :escape 8.643 (0.086426) -
HTML5lib.sanitize 125.315 (1.253149) 14.50x
Small fragment, 3178 bytes (x1000)
total single rel
Loofah :escape 4.715 (0.004715) -
HTML5lib.sanitize 36.438 (0.036438) 7.73x
Text snippet, 58 bytes (x10000)
total single rel
Loofah :escape 3.881 (0.000388) -
HTML5lib.sanitize 11.641 (0.001164) 3.00x
HeadToHeadHTMLFilter
Large document, 98282 bytes (x100)
total single rel
Loofah::Helpers.sanitize 15.579 (0.155785) -
HTMLFilter.filter 32.654 (0.326540) 2.10x
Small fragment, 3178 bytes (x1000)
total single rel
Loofah::Helpers.sanitize 5.097 (0.005097) -
HTMLFilter.filter 12.034 (0.012034) 2.36x
Text snippet, 58 bytes (x10000)
total single rel
Loofah::Helpers.sanitize 3.822 (0.000382) -
HTMLFilter.filter 1.876 (0.000188) 0.49x
@rgrove
Copy link

rgrove commented Jun 23, 2014

Here's a more up to date comparison between Loofah, Sanitize, and HTMLFilter: https://github.com/rgrove/sanitize/blob/master/COMPARISON.md#performance-comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment