Skip to content

Instantly share code, notes, and snippets.

@sergeyromanov
Last active October 12, 2015 10:27
Show Gist options
  • Save sergeyromanov/4012338 to your computer and use it in GitHub Desktop.
Save sergeyromanov/4012338 to your computer and use it in GitHub Desktop.
Scrape St. Petersburg ZIP codes by district
#!/usr/bin/env perl
use 5.014;
use Encode qw(decode);
use Mojo::UserAgent;
use List::MoreUtils qw(pairwise);
use YAML qw(DumpFile);
my $SOURCE_URL = 'http://www.spbindex.ru/listindex.html';
my $OUT_FILE = 'sp_indices.yml';
my(@temp1, @temp2);
my $ua = Mojo::UserAgent->new;
$ua->get($SOURCE_URL)->res->dom
->find("div.main_txt b")
->slice(2..19)
->each(sub {
my($el) = @_;
$el = decode("cp1251", $el);
$el =~ />(\w+)/;
push @temp1, $1;
});
$ua->get($SOURCE_URL)->res->dom
->find("div.main_txt ul")
->each(sub {
my($el) = @_;
$el = decode("cp1251", $el);
my(@idx) = $el =~ /index_([0-9]{6})/g;
push @temp2, \@idx;
});
my %result;
pairwise { $result{$a} = $b } @temp1, @temp2;
DumpFile($OUT_FILE, \%result);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment