Skip to content

Instantly share code, notes, and snippets.

@kyanny
Created May 19, 2009 15:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kyanny/114166 to your computer and use it in GitHub Desktop.
Save kyanny/114166 to your computer and use it in GitHub Desktop.
content_extract.pl Example of HTML::ExtractContent
#!/usr/bin/perl
use strict;
use warnings;
use HTML::ExtractContent;
use Encode;
use Encode::Detect;
use Perl6::Say;
use Getopt::Long;
use Pod::Usage;
my $help;
GetOptions(
'help' => ¥$help,
) or pod2usage(2);
$help and pod2usage(2);
my $html = do { local $/; <> };
my $decoded_html = decode('Detect', $html);
my $extractor = HTML::ExtractContent->new;
$extractor->extract($decoded_html);
say $extractor->as_text;
__END__
=head1 NAME
content_extract.pl -
=head1 SYNOPSIS
$ curl http://blog.livedoor.jp/tabbata/archives/50684381.html | ./content_extract.pl
=head1 DESCRIPTION
=cut
@standin000
Copy link

http://cn.wsj.com/gb/20150127/XIH195812.asp?source=mostpopular is NG for this method, could you check it again, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment