Skip to content

Instantly share code, notes, and snippets.

@yakovsh
Last active January 17, 2016 15:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yakovsh/6b9c880d0d132788f269 to your computer and use it in GitHub Desktop.
Save yakovsh/6b9c880d0d132788f269 to your computer and use it in GitHub Desktop.
Cleaning Up Bad HTML in Perl
#
# Here is a short way to cleanup bad HTML input and convert to XML with Perl:
#
use HTML::TreeBuilder;
use XML::LibXML;
$html_code = '';
my $builder = HTML::TreeBuilder->new();
$xml_source = $builder->parse($html_code);
$xml_source->elementify();
$xml_source1 = $xml_source->as_XML();
my $parser = XML::LibXML->new();
$parser->recover(1);
my $doc = $parser->parse_string($xml_source1);
$xml_source2 = $doc->toString();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment