Skip to content

Instantly share code, notes, and snippets.

@zoul
Created January 5, 2010 18:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zoul/269598 to your computer and use it in GitHub Desktop.
Save zoul/269598 to your computer and use it in GitHub Desktop.
Simple web crawler in Perl
#!/usr/bin/perl
use Modern::Perl;
use WWW::Mechanize;
my $root = 'http://naima:3000/cs/';
my $domain = 'http://naima';
my $mech = WWW::Mechanize->new;
sub visit {
my $url = shift;
my $indent = shift || 0;
my $visited = shift || {};
my $tab = ' ' x $indent;
# Already seen that.
return if $visited->{$url}++;
# Leaves domain.
if ($url !~ /^$domain/) {
say $tab, "-> $url";
return;
}
# Not seen yet.
say $tab, "- $url ";
$mech->get($url);
visit($_, $indent+2, $visited) for
map {$_->url_abs} $mech->links;
}
visit($root);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment