Skip to content

Instantly share code, notes, and snippets.

@jeffreykegler
Created March 3, 2014 00:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jeffreykegler/9316134 to your computer and use it in GitHub Desktop.
Save jeffreykegler/9316134 to your computer and use it in GitHub Desktop.
Scrape HTML tables
use 5.010;
use strict;
use warnings;
use Marpa::R2 2.082000;
use Marpa::R2::HTML qw(html);
my %handlers_to_keep_only_tables = (
table => sub { return Marpa::R2::HTML::original() },
':TOP' => sub { return \( join q{}, @{ Marpa::R2::HTML::values() } ) }
);
my @input = (
'Text<table><tr><td>I am a cell</table> More Text',
'Text<tr>I am a cell</table> More Text'
);
for my $input (@input) {
say "HTML with table: $input";
my $value_ref = html( \$input, \%handlers_to_keep_only_tables );
die "HTML parse failed" if not defined $value_ref;
say "scraped tables: ", ${$value_ref};
} ## end for my $input (@input)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment