Skip to content

Instantly share code, notes, and snippets.

@freshtonic
Last active December 25, 2015 12:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save freshtonic/6974873 to your computer and use it in GitHub Desktop.
Save freshtonic/6974873 to your computer and use it in GitHub Desktop.
Nokogiri <pre>, serialize() and whitespace failing spec
require 'spec_helper'
def serialize(thing, format=:as_html)
case thing
when Nokogiri::HTML::Document
document.serialize(:encoding => 'UTF-8') do |config|
config.format.send format
end
when Nokogiri::HTML::DocumentFragment
thing.serialize
end
end
describe "Nokogiri significant whitespace preservation in <pre> and <code> blocks" do
let(:html) do
"
<div>
<pre>
<code>
<img src='about:blank'>
<img src='about:blank'>
</code>
</pre>
</div>
"
end
let(:document) { Nokogiri::HTML html }
let(:fragment) { Nokogiri::HTML::fragment html }
context 'use a fragment (serialize with default args)' do
it "should output the image tags on the same line" do
serialize(fragment).lines.count{|l| l =~ /img/}.should == 2
end
end
context 'use a document (serialize as XML)' do
it "should output the image tags on the same line" do
serialize(document, :as_xml).lines.count{|l| l =~ /img/}.should == 2
end
end
context 'use a document (serialize as HTML)' do
it "should output the image tags on the same line" do
serialize(document, :as_html).lines.count{|l| l =~ /img/}.should == 2
end
end
end
FFF
Failures:
1) Nokogiri significant whitespace preservation in <pre> and <code> blocks use a document (serialize as XML) should output the image tags on the same line
Failure/Error: serialize(fragment, :as_xml).lines.count{|l| l =~ /img/}.should == 2
expected: 2
got: 1 (using ==)
# ./spec/ce/whitespace_spec.rb:41:in `block (3 levels) in <top (required)>'
2) Nokogiri significant whitespace preservation in <pre> and <code> blocks use a document (serialize as HTML) should output the image tags on the same line
Failure/Error: serialize(fragment, :as_html).lines.count{|l| l =~ /img/}.should == 2
expected: 2
got: 1 (using ==)
# ./spec/ce/whitespace_spec.rb:47:in `block (3 levels) in <top (required)>'
3) Nokogiri significant whitespace preservation in <pre> and <code> blocks use a fragment (serialize with default args) should output the image tags on the same line
Failure/Error: serialize(fragment).lines.count{|l| l =~ /img/}.should == 2
expected: 2
got: 1 (using ==)
# ./spec/ce/whitespace_spec.rb:35:in `block (3 levels) in <top (required)>'
Finished in 0.0541 seconds
3 examples, 3 failures
Failed examples:
rspec ./spec/ce/whitespace_spec.rb:40 # Nokogiri significant whitespace preservation in <pre> and <code> blocks use a document (serialize as XML) should output the image tags on the same line
rspec ./spec/ce/whitespace_spec.rb:46 # Nokogiri significant whitespace preservation in <pre> and <code> blocks use a document (serialize as HTML) should output the image tags on the same line
rspec ./spec/ce/whitespace_spec.rb:34 # Nokogiri significant whitespace preservation in <pre> and <code> blocks use a fragment (serialize with default args) should output the image tags on the same line
@freshtonic
Copy link
Author

It's weird how #serialize take args and a block versus no args depending on whether we have a document or a fragment. Stinky API.

@juliocesar
Copy link

Maybe change the required libs so it says:

require 'rspec'
require 'nokogiri'

Needing Rails for this sucks :) I'll see about other people taking a look at this gist.

@juliocesar
Copy link

Let me see if I can organise my thoughts here.

The immediate problem we're having is Nokogiri is adding newlines to any tags, namely <span>, when outputting HTML. Which fucks with <pre> tags. Solving that, we'll be good, and I believe everything else will be ok too since I distinctly recall the content extractor doing a good job bookmarking code from sites like css-tricks.com.

So am I correct in assuming that finding how not to pretty-print when outputting stuff with Nokogiri would solve this?

@freshtonic
Copy link
Author

No. The problem is tags with nothing but whitespace between them get put on one line.

@freshtonic
Copy link
Author

Well, maybe it's a mixture of pretty-printing and stripping new lines. Not sure. I will continue on this tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment