Created
October 13, 2014 14:43
-
-
Save miyohide/ba1de0f8040b2ef713e9 to your computer and use it in GitHub Desktop.
JRuby 1.7.15 & Nokogiri 1.6.3.1(java) encoding problem?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[~/work/hypermicrodata]$ ruby -v | |
jruby 1.7.15 (1.9.3p392) 2014-09-03 82b5cc3 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_11-b12 +jit [darwin-x86_64] | |
[~/work/hypermicrodata]$ bundle exec gem list | grep nokogiri | |
nokogiri (1.6.3.1 java) | |
[~/work/hypermicrodata]$ cat test/data/example.html | |
<!doctype html> | |
<html> | |
<!-- shameless --> | |
<head> | |
<title>Jason Ronallo</title> | |
</head> | |
<body> | |
<span itemscope itemtype="http://schema.org/Person" | |
itemid="http://ronallo.com#me"> | |
<a itemprop="url" href="http://twitter.com/ronallo"> | |
<span itemprop="name">Jason Ronallo</span> | |
</a> is the | |
<span itemprop="jobTitle">Associate Head of Digital Library Initiatives</span> at | |
<span itemprop="affiliation" itemscope itemtype="http://schema.org/Library" itemid="http://lib.ncsu.edu"> | |
<span itemprop="name"> | |
<a itemprop="url" href="http://www.lib.ncsu.edu">NCSU Libraries</a> | |
</span> | |
</span>. | |
</span> | |
</body> | |
</html> | |
[~/work/hypermicrodata]$ sw_vers | |
ProductName: Mac OS X | |
ProductVersion: 10.9.5 | |
BuildVersion: 13F34 | |
[~/work/hypermicrodata]$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
irb(main):001:0> require 'nokogiri' | |
=> true | |
irb(main):002:0> Nokogiri.HTML(open('test/data/example.html')) | |
=> #<Nokogiri::HTML::Document:0xca0 name="document" children=[#<Nokogiri::XML::Element:0xc9e name="html" children=[#<Nokogiri::XML::Element:0xc9a name="head">, #<Nokogiri::XML::Element:0xc9c name="body">]>]> | |
irb(main):003:0> Nokogiri.HTML(open('test/data/example.html'), nil, 'UTF-8') | |
=> #<Nokogiri::HTML::Document:0xd02 name="document" children=[#<Nokogiri::XML::DTD:0xca2 name="html">, #<Nokogiri::XML::Element:0xd00 name="html" children=[#<Nokogiri::XML::Text:0xca4 "\n ">, #<Nokogiri::XML::Comment:0xca6 " shameless ">, #<Nokogiri::XML::Text:0xca8 "\n ">, #<Nokogiri::XML::Element:0xcb2 name="head" children=[#<Nokogiri::XML::Text:0xcaa "\n ">, #<Nokogiri::XML::Element:0xcae name="title" children=[#<Nokogiri::XML::Text:0xcac "Jason Ronallo">]>, #<Nokogiri::XML::Text:0xcb0 "\n ">]>, #<Nokogiri::XML::Text:0xcb4 "\n\n ">, #<Nokogiri::XML::Element:0xcfe name="body" children=[#<Nokogiri::XML::Text:0xcb6 "\n ">, #<Nokogiri::XML::Element:0xcfa name="span" attributes=[#<Nokogiri::XML::Attr:0xcb8 name="itemid" value="http://ronallo.com#me">, #<Nokogiri::XML::Attr:0xcba name="itemscope">, #<Nokogiri::XML::Attr:0xcbc name="itemtype" value="http://schema.org/Person">] children=[#<Nokogiri::XML::Text:0xcbe "\n ">, #<Nokogiri::XML::Element:0xcce name="a" attributes=[#<Nokogiri::XML::Attr:0xcc0 name="href" value="http://twitter.com/ronallo">, #<Nokogiri::XML::Attr:0xcc2 name="itemprop" value="url">] children=[#<Nokogiri::XML::Text:0xcc4 "\n ">, #<Nokogiri::XML::Element:0xcca name="span" attributes=[#<Nokogiri::XML::Attr:0xcc6 name="itemprop" value="name">] children=[#<Nokogiri::XML::Text:0xcc8 "Jason Ronallo">]>, #<Nokogiri::XML::Text:0xccc "\n ">]>, #<Nokogiri::XML::Text:0xcd0 " is the \n ">, #<Nokogiri::XML::Element:0xcd6 name="span" attributes=[#<Nokogiri::XML::Attr:0xcd2 name="itemprop" value="jobTitle">] children=[#<Nokogiri::XML::Text:0xcd4 "Associate Head of Digital Library Initiatives">]>, #<Nokogiri::XML::Text:0xcd8 " at \n ">, #<Nokogiri::XML::Element:0xcf6 name="span" attributes=[#<Nokogiri::XML::Attr:0xcda name="itemid" value="http://lib.ncsu.edu">, #<Nokogiri::XML::Attr:0xcdc name="itemprop" value="affiliation">, #<Nokogiri::XML::Attr:0xcde name="itemscope">, #<Nokogiri::XML::Attr:0xce0 name="itemtype" value="http://schema.org/Library">] children=[#<Nokogiri::XML::Text:0xce2 "\n ">, #<Nokogiri::XML::Element:0xcf2 name="span" attributes=[#<Nokogiri::XML::Attr:0xce4 name="itemprop" value="name">] children=[#<Nokogiri::XML::Text:0xce6 "\n ">, #<Nokogiri::XML::Element:0xcee name="a" attributes=[#<Nokogiri::XML::Attr:0xce8 name="href" value="http://www.lib.ncsu.edu">, #<Nokogiri::XML::Attr:0xcea name="itemprop" value="url">] children=[#<Nokogiri::XML::Text:0xcec "NCSU Libraries">]>, #<Nokogiri::XML::Text:0xcf0 "\n ">]>, #<Nokogiri::XML::Text:0xcf4 "\n ">]>, #<Nokogiri::XML::Text:0xcf8 ".\n ">]>, #<Nokogiri::XML::Text:0xcfc "\n \n">]>]>]> | |
irb(main):004:0> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment