Skip to content

Instantly share code, notes, and snippets.

@nanis
Last active August 29, 2015 14:05
Show Gist options
  • Save nanis/9a91b36935b0a75910c4 to your computer and use it in GitHub Desktop.
Save nanis/9a91b36935b0a75910c4 to your computer and use it in GitHub Desktop.
Extract information from CPANTesters' Platforms By YEAR/MONTH
#!/usr/bin/env perl
# This is a very quick and dirty script to parse information from the page
#
# http://stats.cpantesters.org/mplatforms.html
#
# This one works on a locally downloaded copy, the regex patterns used are
# the first one's that popped into my head.
#
# I hereby release this code snippet to the public domain.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# 2014/08/25 A. Sinan Unur <nanis@cpan.org>
use 5.020; # gratuitous, but why not?!
use strict;
use warnings;
use HTML::TableExtract;
use YAML::XS;
my $te = HTML::TableExtract->new(
attribs => { summary => 'Platforms By YEAR/MONTH' },
);
$te->parse_file('mplatforms.html');
my ($table) = $te->tables;
my $rows = $table->rows;
for my $i (1 .. $#$rows) {
my $row = $rows->[$i];
my ($year, $month) = ($row->[0] =~ /\A([0-9]{4})([0-9]{2})\z/);
unless (($year) && ($month)) {
die Dump $row;
}
unless ( ($year >= 1999) && ($month >= 1)) {
die Dump $row;
}
$month =~ s/\A0//;
my $count = $row->[1];
my @data = split ',', $row->[2];
for my $cell (@data) {
my ($n, $plat) = split /]\s/, $cell, 2;
($n) = ($n =~ /([0-9]+)/);
say join(
"\t",
$year,
$month,
$count,
$plat,
$n
);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment