Skip to content

Instantly share code, notes, and snippets.

@suzusime
Created May 7, 2019 18:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save suzusime/fbdeabcf803d604d16bd6b3f8f324183 to your computer and use it in GitHub Desktop.
Save suzusime/fbdeabcf803d604d16bd6b3f8f324183 to your computer and use it in GitHub Desktop.
スクレイピングする簡単な例
#!/usr/bin/env perl
use strict;
use warnings;
use 5.014;
use utf8;
use open ':encoding(utf8)';
use Encode::Locale;
binmode(STDIN, ":encoding(console_in)");
binmode(STDOUT, ":encoding(console_out)");
binmode(STDERR, ":encoding(console_out)");
Encode::Locale::decode_argv;
use LWP::UserAgent;
use HTML::TreeBuilder;
my $ua = LWP::UserAgent->new(
timeout => 10
);
my $word = $ARGV[0] // "延喜式";
my $url = "https://ja.wikipedia.org/wiki/$word";
my $res = $ua->get($url);
die $res->status_line if !$res->is_success;
my $html = $res->decoded_content;
my $tree = HTML::TreeBuilder->new;
$tree->parse($html);
$tree->eof;
foreach ($tree->look_down(_tag => 'p')) {
say $_->as_text;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment