Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Get the IANA Language Tag registry, parse it, and write it to a tab-delimited file
#!perl -w
# Parse out "http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
# And write it to a tab-delimited file.
# Format described at http://tools.ietf.org/html/bcp47#section-3.1.2
# TODO: use some Excel writer to split into sheets by Type
use LWP::Simple;
$_ = get("http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry") or die;
s{\n }{}g; # continuation lines in Comments
open STDOUT,">iana-lang-tags.txt" or die "can't open STDOUT: $!\n";
binmode STDOUT, ':encoding(UTF-8)' or die "can't set binmode UTF-8: $!\n";
# http://perldoc.perl.org/perlunifaq.html#Is-there-a-way-to-automatically-decode-or-encode%3f
my @cols = qw(Type Scope Prefix Tag Subtag Suppress-Script Description Macrolanguage Added Deprecated Preferred-Value Comments);
print join("\t",@cols), "\n";
foreach (split/\n%%\n/) { # %% separated block
next if /File-Date:/; # first block is date stamp, not data
my %hash;
foreach (split/\n/) {
my ($key,$val) = split(/: /,$_,2);
if ($hash{$key}) {$hash{$key} .= ", $val"} # 'Description', 'Comments', and 'Prefix' are multivalued
else {$hash{$key} = $val};
};
foreach (@cols) {
print $hash{$_} || "", "\t";
};
print "\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.