Skip to content

Instantly share code, notes, and snippets.

@kraih
Created September 26, 2012 21:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kraih/3790657 to your computer and use it in GitHub Desktop.
Save kraih/3790657 to your computer and use it in GitHub Desktop.
use 5.16.1;
use Mojo::UserAgent;
# Fetch web site
my $ua = Mojo::UserAgent->new;
my $tx = $ua->get('mojolicio.us/perldoc');
# Extract title
say 'Title: ', $tx->res->dom->at('head > title')->text;
# Extract headings
$tx->res->dom('h1, h2, h3')->each(sub {
say 'Heading: ', shift->all_text;
});
# Extract all text without trimming, including alt text for images
sub extract {
my $elements = shift;
for my $e ($elements->each) {
print $e->text_before(0);
print $e->{alt} if $e->type eq 'img';
my $children = $e->children;
@$children ? extract($children) : print $e->text(0);
}
say $elements->[-1]->text_after(0);
}
extract($tx->res->dom->children);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment