Skip to content

Instantly share code, notes, and snippets.

@hinnerk-a
Created August 31, 2011 14:41
Show Gist options
  • Save hinnerk-a/1183704 to your computer and use it in GitHub Desktop.
Save hinnerk-a/1183704 to your computer and use it in GitHub Desktop.
Perl: handle malformed UTF-8 strings with Encode::encode
sub encode_utf_8 {
my $string = @_;
my $utf8_encoded = '';
eval {
$utf8_encoded = Encode::encode('UTF-8', $string, Encode::FB_CROAK);
};
if ($@) {
# sanitize malformed UTF-8
$utf8_encoded = '';
my @chars = split(//, $string);
foreach my $char (@chars) {
my $utf_8_char = eval { Encode::encode('UTF-8', $char, Encode::FB_CROAK) }
or next;
$utf8_encoded .= $utf_8_char;
}
}
return $utf8_encoded;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment