Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
content_extract.pl Example of HTML::ExtractContent
#!/usr/bin/perl
use strict;
use warnings;
use HTML::ExtractContent;
use Encode;
use Encode::Detect;
use Perl6::Say;
use Getopt::Long;
use Pod::Usage;
my $help;
GetOptions(
'help' => ¥$help,
) or pod2usage(2);
$help and pod2usage(2);
my $html = do { local $/; <> };
my $decoded_html = decode('Detect', $html);
my $extractor = HTML::ExtractContent->new;
$extractor->extract($decoded_html);
say $extractor->as_text;
__END__
=head1 NAME
content_extract.pl -
=head1 SYNOPSIS
$ curl http://blog.livedoor.jp/tabbata/archives/50684381.html | ./content_extract.pl
=head1 DESCRIPTION
=cut
@standin000

This comment has been minimized.

Copy link

@standin000 standin000 commented Mar 17, 2015

http://cn.wsj.com/gb/20150127/XIH195812.asp?source=mostpopular is NG for this method, could you check it again, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.