Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Cleaning Up Bad HTML in Perl
#
# Here is a short way to cleanup bad HTML input and convert to XML with Perl:
#
use HTML::TreeBuilder;
use XML::LibXML;
$html_code = '';
my $builder = HTML::TreeBuilder->new();
$xml_source = $builder->parse($html_code);
$xml_source->elementify();
$xml_source1 = $xml_source->as_XML();
my $parser = XML::LibXML->new();
$parser->recover(1);
my $doc = $parser->parse_string($xml_source1);
$xml_source2 = $doc->toString();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.