flavorjones/README.markdown

## README.markdown

      
    Raw
  

              README.markdown
            
          
    Overview of the Benchmark

The following benchmark output was generated from the codes at
http://github.com/flavorjones/loofah/tree/master/benchmark
These results show the performance of Loofah scrubbing methods against
comparable methods from other common open-source libraries:

ActionView sanitize() and strip_tags()
Sanitize sanitize()
HTML5lib sanitize()
HtmlFilter filter()

HTML of various sizes is tested:

a large document (~98 KB)
a sizable fragment (~3 KB)
a small snippet (58 bytes)

Head to Head against ActionView sanitize()

Loofah wins by about 20% on large documents and fragments, but loses
on small snippets.
Loofah's comparative slowness for small snippets is because Nokogiri
uses libxml2, which has a constant "startup overhead" that is incurred
before parsing HTML regardless of size. ActionPack's regular
expressions have no such startup overhead.
The win for ActionView on small snippets comes at a cost,
though. From the ActionView comments:
Please note that sanitizing user-provided text [with ActionView]
does not guarantee that the resulting markup is valid (conforming
to a document type) or even well-formed.  The output may still
contain e.g. unescaped '<', '>', '&' characters and confuse
browsers.

Loofah will always generate well-formed and valid HTML with proper
encoding and escaping. Something to keep in mind when choosing a
sanitizing library. Just sayin'.
Head to Head against ActionView strip_tags()

Loofah wins by between 60% and 100% on large documents and fragments, but loses
again on small snippets.
See previous section for explanation and commentary.
Head to Head against Sanitize sanitize()

Loofah wins on HTML of all sizes, between 13% and 280%.
Head to Head against HTML5lib sanitize()

Loofah wins on HTML of all sizes, between 300% and 1450%.
Yes. Not a typo. REXML is that slow.
Head to Head against HtmlFilter filter()

Loofah wins by a factor of two on large and medium documents, but
loses on small snippets.
HtmlFilter also uses regular expressions and hence cannot guarantee
that the output markup is well-formed or valid.

  
## results.txt
Nokogiri version: {"warnings"=>[], "libxml"=>{"loaded"=>"2.7.5", "binding"=>"extension", "compiled"=>"2.7.5"}, "nokogiri"=>"1.4.0"}
Loofah version: "0.4.1"
---------- rehearsal ----------
(... omitted for brevity ...)

---------- realsies ----------
HeadToHeadRailsSanitize
  Large document, 98282 bytes (x100)
                                   total    single    rel
        Loofah::Helpers.sanitize  17.019 (0.170191)     -
             ActionView sanitize  21.525 (0.215252)  1.26x

  Small fragment, 3178 bytes (x1000)
                                   total    single    rel
        Loofah::Helpers.sanitize   5.559 (0.005559)     -
             ActionView sanitize   5.653 (0.005653)  1.02x

  Text snippet, 58 bytes (x10000)
                                   total    single    rel
        Loofah::Helpers.sanitize   4.272 (0.000427)     -
             ActionView sanitize   1.170 (0.000117)  0.27x

HeadToHeadRailsStripTags
  Large document, 98282 bytes (x100)
                                   total    single    rel
      Loofah::Helpers.strip_tags   8.019 (0.080195)     -
           ActionView strip_tags  14.615 (0.146151)  1.82x

  Small fragment, 3178 bytes (x1000)
                                   total    single    rel
      Loofah::Helpers.strip_tags   2.197 (0.002197)     -
           ActionView strip_tags   4.220 (0.004220)  1.92x

  Text snippet, 58 bytes (x10000)
                                   total    single    rel
      Loofah::Helpers.strip_tags   2.070 (0.000207)     -
           ActionView strip_tags   0.931 (0.000093)  0.45x

HeadToHeadSanitizerSanitize
  Large document, 98282 bytes (x100)
                                   total    single    rel
                   Loofah :strip   9.919 (0.099188)     -
                  Sanitize.clean  27.625 (0.276255)  2.79x

  Small fragment, 3178 bytes (x1000)
                                   total    single    rel
                   Loofah :strip   5.317 (0.005317)     -
                  Sanitize.clean   5.811 (0.005811)  1.09x

  Text snippet, 58 bytes (x10000)
                                   total    single    rel
                   Loofah :strip   4.156 (0.000416)     -
                  Sanitize.clean   4.235 (0.000423)  1.02x

HeadToHeadHtml5LibSanitize
  Large document, 98282 bytes (x100)
                                   total    single    rel
                  Loofah :escape   8.643 (0.086426)     -
               HTML5lib.sanitize 125.315 (1.253149) 14.50x

  Small fragment, 3178 bytes (x1000)
                                   total    single    rel
                  Loofah :escape   4.715 (0.004715)     -
               HTML5lib.sanitize  36.438 (0.036438)  7.73x

  Text snippet, 58 bytes (x10000)
                                   total    single    rel
                  Loofah :escape   3.881 (0.000388)     -
               HTML5lib.sanitize  11.641 (0.001164)  3.00x

HeadToHeadHTMLFilter
  Large document, 98282 bytes (x100)
                                   total    single    rel
        Loofah::Helpers.sanitize  15.579 (0.155785)     -
               HTMLFilter.filter  32.654 (0.326540)  2.10x

  Small fragment, 3178 bytes (x1000)
                                   total    single    rel
        Loofah::Helpers.sanitize   5.097 (0.005097)     -
               HTMLFilter.filter  12.034 (0.012034)  2.36x

  Text snippet, 58 bytes (x10000)
                                   total    single    rel
        Loofah::Helpers.sanitize   3.822 (0.000382)     -
               HTMLFilter.filter   1.876 (0.000188)  0.49x
	Nokogiri version: {"warnings"=>[], "libxml"=>{"loaded"=>"2.7.5", "binding"=>"extension", "compiled"=>"2.7.5"}, "nokogiri"=>"1.4.0"}
	Loofah version: "0.4.1"
	---------- rehearsal ----------
	(... omitted for brevity ...)

	---------- realsies ----------
	HeadToHeadRailsSanitize
	Large document, 98282 bytes (x100)
	total single rel
	Loofah::Helpers.sanitize 17.019 (0.170191) -
	ActionView sanitize 21.525 (0.215252) 1.26x

	Small fragment, 3178 bytes (x1000)
	total single rel
	Loofah::Helpers.sanitize 5.559 (0.005559) -
	ActionView sanitize 5.653 (0.005653) 1.02x

	Text snippet, 58 bytes (x10000)
	total single rel
	Loofah::Helpers.sanitize 4.272 (0.000427) -
	ActionView sanitize 1.170 (0.000117) 0.27x

	HeadToHeadRailsStripTags
	Large document, 98282 bytes (x100)
	total single rel
	Loofah::Helpers.strip_tags 8.019 (0.080195) -
	ActionView strip_tags 14.615 (0.146151) 1.82x

	Small fragment, 3178 bytes (x1000)
	total single rel
	Loofah::Helpers.strip_tags 2.197 (0.002197) -
	ActionView strip_tags 4.220 (0.004220) 1.92x

	Text snippet, 58 bytes (x10000)
	total single rel
	Loofah::Helpers.strip_tags 2.070 (0.000207) -
	ActionView strip_tags 0.931 (0.000093) 0.45x

	HeadToHeadSanitizerSanitize
	Large document, 98282 bytes (x100)
	total single rel
	Loofah :strip 9.919 (0.099188) -
	Sanitize.clean 27.625 (0.276255) 2.79x

	Small fragment, 3178 bytes (x1000)
	total single rel
	Loofah :strip 5.317 (0.005317) -
	Sanitize.clean 5.811 (0.005811) 1.09x

	Text snippet, 58 bytes (x10000)
	total single rel
	Loofah :strip 4.156 (0.000416) -
	Sanitize.clean 4.235 (0.000423) 1.02x

	HeadToHeadHtml5LibSanitize
	Large document, 98282 bytes (x100)
	total single rel
	Loofah :escape 8.643 (0.086426) -
	HTML5lib.sanitize 125.315 (1.253149) 14.50x

	Small fragment, 3178 bytes (x1000)
	total single rel
	Loofah :escape 4.715 (0.004715) -
	HTML5lib.sanitize 36.438 (0.036438) 7.73x

	Text snippet, 58 bytes (x10000)
	total single rel
	Loofah :escape 3.881 (0.000388) -
	HTML5lib.sanitize 11.641 (0.001164) 3.00x

	HeadToHeadHTMLFilter
	Large document, 98282 bytes (x100)
	total single rel
	Loofah::Helpers.sanitize 15.579 (0.155785) -
	HTMLFilter.filter 32.654 (0.326540) 2.10x

	Small fragment, 3178 bytes (x1000)
	total single rel
	Loofah::Helpers.sanitize 5.097 (0.005097) -
	HTMLFilter.filter 12.034 (0.012034) 2.36x

	Text snippet, 58 bytes (x10000)
	total single rel
	Loofah::Helpers.sanitize 3.822 (0.000382) -
	HTMLFilter.filter 1.876 (0.000188) 0.49x