Skip to content

Instantly share code, notes, and snippets.

@moritz
Created March 8, 2012 12:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save moritz/2000876 to your computer and use it in GitHub Desktop.
Save moritz/2000876 to your computer and use it in GitHub Desktop.
Mojo::DOM and UTF-8 on bleadperl
use Mojo::DOM;
use 5.010;
use strict;
use warnings;
binmode STDOUT, ':encoding(UTF-8)';
my $filename = 'test.xml';
open my $h, '<', $filename or die $!;
my $contents = do { local $/; <$h> };
my $d = Mojo::DOM->new(xml => 1, charset => 'UTF-8')->parse($contents);
say $d->at('title')->text; # mojibake
# versions: This is perl 5, version 15, subversion 8 (v5.15.8-90-ga752ff7) built for x86_64-linux
# Mojolicious (2.57, Leaf Fluttering In Wind)
# output:
# Låt den rätte komma in : [skräckroman]
# should be:
# Låt den rätte komma in : [skräckroman]
# and it works that way on older perls (like, 5.14.1), and on bleadperl if I remove the "binmode STDOUT, ':encoding(UTF-8)';"
<rdf:rdf xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:libris="http://libris.kb.se/vocabulary/experimental#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rda="http://RDVocab.info/Elements/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:description rdf:about="http://libris.kb.se/resource/bib/9604288">
<libris:held_by rdf:resource="http://libris.kb.se/resource/library/Q" />
<dc:title xml:lang="sv">Låt den rätte komma in : [skräckroman]</dc:title>
</rdf:description>
</rdf:rdf>
# output of Dump $d->at('title')->text;
SV = PV(0x95b2c0) at 0x7739d0
REFCNT = 1
FLAGS = (TEMP,POK,pPOK)
PV = 0xb0bf50 "L\303\245t den r\303\244tte komma in : [skr\303\244ckroman]"\0
CUR = 41
LEN = 48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment