Skip to content

Instantly share code, notes, and snippets.

@flavorjones
Created November 17, 2008 18:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save flavorjones/25854 to your computer and use it in GitHub Desktop.
Save flavorjones/25854 to your computer and use it in GitHub Desktop.
For an html snippet 2374 bytes long ...
user system total real
regex * 1000 0.160000 0.010000 0.170000 ( 0.182207)
nokogiri * 1000 1.440000 0.060000 1.500000 ( 1.537546)
hpricot * 1000 5.740000 0.650000 6.390000 ( 6.401207)
it took an average of 0.0015 seconds for Nokogiri to parse and operate on an HTML snippet 2374 bytes long
it took an average of 0.0064 seconds for Hpricot to parse and operate on an HTML snippet 2374 bytes long
For an html snippet 97517 bytes long ...
user system total real
regex * 10 0.100000 0.020000 0.120000 ( 0.122117)
nokogiri * 10 0.310000 0.020000 0.330000 ( 0.322290)
hpricot * 10 3.190000 0.300000 3.490000 ( 3.502819)
it took an average of 0.0322 seconds for Nokogiri to parse and operate on an HTML snippet 97517 bytes long
it took an average of 0.3503 seconds for Hpricot to parse and operate on an HTML snippet 97517 bytes long
<p>Yesterday was a big day, and I nearly missed it, since I spent nearly all of the sunlight hours at the wheel of a car. Nine hours sitting on your butt is no way to ... oh wait, that's actually how I spend every day. Just usually not in a rental Hyundai. Never mind, I digress.
</p>
<p>It was a big day because <a href='http://nokogiri.rubyforge.org/nokogiri/'>Nokogiri</a> was released. I've spent quite a bit of time over the last couple of months working with <a href='http://tenderlovemaking.com/'>Aaron Patterson</a> (of <a href='http://rubyforge.org/projects/mechanize/'>Mechanize</a> fame) on this excellent library, and so I'm walking around, feeling satisfied.
</p>
<p>"What's Nokogiri?" Good question, I'm glad I asked it.
</p>
<p>Nokogiri is the best damn XML/HTML parsing library out there in Rubyland. What makes it so good? You can search by XPath. You can search by CSS. You can search by both XPath <i>and</i> CSS. Plus, it uses <a href='http://xmlsoft.org/'>libxml2</a> as the parsing engine, <a href='http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html'>so it's fast</a>. But the best part is, it's got a dead-simple interface that we shamelessly lifted from <a href='http://code.whytheluckystiff.net/hpricot/'>Hpricot</a>, everyone's favorite delightful parser.
</p>
<p>I had big plans to do a series of posts with examples and benchmarks, but right now I'm in <a href='http://www.google.com/search?q=dst+hell'>DST Hell</a> and don't have the quality time to invest.
</p>
<p>So, as I am wont to do, I'm punting. Thankfully, Aaron was his usual prolific self, and has kindly provided lots of documentation and examples:
<ul>
<li><a href='http://tenderlovemaking.com/2008/10/30/nokogiri-is-released/'>Aaron's blog post</a>
<li><a href='http://nokogiri.rubyforge.org/nokogiri/'>Documentation (RDoc)</a>
<li><a href='http://github.com/tenderlove/nokogiri/wikis'>Nokogiri-the-Wiki</a>
<li><a href='http://rubyforge.org/projects/nokogiri'>Nokogiri on Rubyforge</a>
<li><a href='http://gist.github.com/18533'>Benchmarks</a>
<li><a href='http://github.com/tenderlove/nokogiri/'>Git repository</a>
</ul>
</p>
<p>Use it in good health! Carry on.</p>
<p>P.S. Please start following Aaron on <a href='http://twitter.com/tenderlove'>Twitter</a>. :)</p>
<object>dumb-object</object>
<embed>dumb-embed</embed>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment