Skip to content

Instantly share code, notes, and snippets.

@kberov
Created November 9, 2012 21:32
Show Gist options
  • Save kberov/4048389 to your computer and use it in GitHub Desktop.
Save kberov/4048389 to your computer and use it in GitHub Desktop.
Get only what you want from html using HTML::TreeBuilder and HTML::Element
#!/usr/bin/env perl
#Mojo::DOM and Mojo::UserAgent just rock, but sometimes you are not allowed to use them.
use 5.14.2;
use HTML::TreeBuilder;
my $url = 'http://contao.org/en/extension-list/view/i18nl10n.html';
my $root = HTML::TreeBuilder->new_from_url($url);
$root->eof(); # done parsing for this tree
my $h1 = $root->find_by_tag_name('h1');
say $h1->as_trimmed_text; #h1
#get only what we want as a well formated, pretty xhtml chunk
say $root->find_by_attribute('class', 'extension')->look_down(
_tag => "td",
colspan => 4,
)->as_HTML('<>&', ' ', {});
$root->destroy();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment