Skip to content

Instantly share code, notes, and snippets.

@jberger
Created October 29, 2012 20:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jberger/3976227 to your computer and use it in GitHub Desktop.
Save jberger/3976227 to your computer and use it in GitHub Desktop.
data from imdb
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Mojo::DOM;
my $dom = Mojo::DOM->new( <<'HTML' );
<table border="0" cellpadding="2">
<tbody><tr><th class="xxxx">Country</th><th class="xxxx">Date</th></tr>
<tr><td><b><a href="/calendar/?region=US">USA</a></b></td>
<td align="right"><a href="/date/01-18/">18 January</a> <a href="/year/2009/">2009</a></td>
<td> (Sundance Film Festival)</td></tr>
<tr><td><b><a href="/calendar/?region=DE">Germany</a></b></td>
<td align="right"><a href="/date/02-06/">6 February</a> <a href="/year/2009/">2009</a></td>
<td> (European Film Market)</td></tr>
<tr><td><b><a href="/calendar/?region=US">USA</a></b></td>
<td align="right"><a href="/date/03-20/">20 March</a> <a href="/year/2009/">2009</a></td>
<td> (limited)</td></tr>
</tbody></table>
HTML
sub get_row_data {
my $col = $_->find('td, a');
return () unless $col->size;
my $data = $col->pluck('text');
return [grep { length } $data->each];
}
my @rows = $dom->find('tr')->map( \&get_row_data )->each;
use DDP;
p @rows;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment