Skip to content

Instantly share code, notes, and snippets.

@knbknb
Last active January 19, 2024 08:02
Show Gist options
  • Save knbknb/e7cc2e71bb07cd6c46d56659c21c13af to your computer and use it in GitHub Desktop.
Save knbknb/e7cc2e71bb07cd6c46d56659c21c13af to your computer and use it in GitHub Desktop.
perl: HTML::Parser from command line
#!/usr/bin/env bash
# HTML::Parser has a convenient option to strip Declarations
# by adding a handler.
# (from Stackoverflow q 16358962)
perl -MHTML::Parser -we '
$p = HTML::Parser->new(default_h => [sub {print @_},"text"] );
$p->handler(declaration => "");
$p->parse_file(shift) or die "Cannot parse: $!"; ' infile.html
# HTML::TreeBuilder module is also useful to remove elements and attributes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment