Skip to content

Instantly share code, notes, and snippets.

@xaicron
Created April 16, 2009 14:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xaicron/96447 to your computer and use it in GitHub Desktop.
Save xaicron/96447 to your computer and use it in GitHub Desktop.
#!/usr/bin/perl
use strict;
use warnings;
use Web::Scraper;
use YAML;
use URI;
binmode STDOUT => ':utf8';
my @url_list = qw{
http://www.comic1.jp/C3_circle_lista.htm
http://www.comic1.jp/C3_circle_listb.htm
http://www.comic1.jp/C3_circle_listc.htm
http://www.comic1.jp/C3_circle_listd.htm
http://www.comic1.jp/C3_circle_liste.htm
};
my $scraper = scraper {
process '/html/body/div/div[3]/table/tr/td', 'tables[]' => scraper {
process 'td', circle => 'TEXT';
process 'td > a', link => [ '@href', sub { $_->as_string } ];
};
};
my $list = [];
for my $url (@url_list) {
my $res = $scraper->scrape(URI->new($url));
push @$list, @{$res->{tables}};
warn "OK: $url";
}
print YAML::Dump $list;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment