Skip to content

Instantly share code, notes, and snippets.

Created Dec 26, 2011
What would you like to do?
Decodes UTF-8, interpreting ill-formed UTF-8 sequences as CP1252, posted in reply to <>
use strict;
use warnings;
use Encode qw[find_encoding];
use Unicode::UTF8 qw[decode_utf8];
my $encoding = find_encoding('Windows-1252')
or die q/Couldn't find Windows-1252 encoding/;
my $fallback = sub {
my ($octets, $is_usv) = @_;
return $is_usv ? "\x{FFFD}" : $encoding->decode($octets);
sub fix_latin {
@_ == 1 || die q/Usage: fix_latin($octets)/;
no warnings 'utf8';
return decode_utf8($_[0], $fallback);
my $octets = "\x91 Foo \xE2\x98\xBA \x92";
printf "<%s>\n",
join ' ', map { sprintf 'U+%.4X', ord $_ } split //, fix_latin($octets);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment