-
-
Save moritz/2000876 to your computer and use it in GitHub Desktop.
Mojo::DOM and UTF-8 on bleadperl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
use Mojo::DOM; | |
use 5.010; | |
use strict; | |
use warnings; | |
binmode STDOUT, ':encoding(UTF-8)'; | |
my $filename = 'test.xml'; | |
open my $h, '<', $filename or die $!; | |
my $contents = do { local $/; <$h> }; | |
my $d = Mojo::DOM->new(xml => 1, charset => 'UTF-8')->parse($contents); | |
say $d->at('title')->text; # mojibake | |
# versions: This is perl 5, version 15, subversion 8 (v5.15.8-90-ga752ff7) built for x86_64-linux | |
# Mojolicious (2.57, Leaf Fluttering In Wind) | |
# output: | |
# Låt den rätte komma in : [skräckroman] | |
# should be: | |
# Låt den rätte komma in : [skräckroman] | |
# and it works that way on older perls (like, 5.14.1), and on bleadperl if I remove the "binmode STDOUT, ':encoding(UTF-8)';" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<rdf:rdf xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:libris="http://libris.kb.se/vocabulary/experimental#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rda="http://RDVocab.info/Elements/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> | |
<rdf:description rdf:about="http://libris.kb.se/resource/bib/9604288"> | |
<libris:held_by rdf:resource="http://libris.kb.se/resource/library/Q" /> | |
<dc:title xml:lang="sv">Låt den rätte komma in : [skräckroman]</dc:title> | |
</rdf:description> | |
</rdf:rdf> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# output of Dump $d->at('title')->text; | |
SV = PV(0x95b2c0) at 0x7739d0 | |
REFCNT = 1 | |
FLAGS = (TEMP,POK,pPOK) | |
PV = 0xb0bf50 "L\303\245t den r\303\244tte komma in : [skr\303\244ckroman]"\0 | |
CUR = 41 | |
LEN = 48 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment