Skip to content

Instantly share code, notes, and snippets.

@cincodenada
Created March 19, 2011 05:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cincodenada/877261 to your computer and use it in GitHub Desktop.
Save cincodenada/877261 to your computer and use it in GitHub Desktop.
A basic script to pull down xkcd comics and create a TSV index of them.
#!/usr/local/bin/perl
#use strict;
#use warnings;
use LWP::Simple;
my $baseurl = 'http://www.xkcd.com/';
my $comicnum = 1;
open (TSV, '>>index.csv');
while($pagecontent = get("$baseurl$comicnum/")) {
$comicnum++;
if($pagecontent =~ /<br\/>[\r\n]+<br\/>[\r\n]+<img src\="([^"]+)" title\="([^"]+)" alt\="([^"]+)" \/>/) {
$imgurl = $1;
$imgtitle = $2;
$imgalt = $3;
if($imgurl =~ /\/([^\/]*)$/) {
$filename = $1;
}
print "Fetching $imgurl to $filename...\n";
getstore($imgurl, $filename);
print TSV "$filename\t$imgtitle\t$imgalt\n";
}
}
close (TSV);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment