Skip to content

Instantly share code, notes, and snippets.

@kulp
Last active December 17, 2015 03:19
Show Gist options
  • Save kulp/de366e469e5f3c938682 to your computer and use it in GitHub Desktop.
Save kulp/de366e469e5f3c938682 to your computer and use it in GitHub Desktop.
Huffman encoder for files with lines like `symbol = freq`
#!/usr/bin/env perl
use strict;
use warnings;
my @queue = map { [ /(\S+)\s*=\s*(\d+)/, undef ] } <>;
my $total = 0;
my $bitlen = 0;
my $keys = 0;
while ((@queue = sort { $a->[1] <=> $b->[1] } @queue) > 1) {
my ($one, $two) = splice @queue, 0, 2;
my $count = $one->[1] + $two->[1];
my $new = [ undef, $count, [ $one, $two ] ];
# only add to total count if we're consuming a terminal
$total += $one->[1] if defined $one->[0];
$total += $two->[1] if defined $two->[0];
push @queue, $new;
}
sub walk
{
my ($node, $pos) = @_;
if (defined $node->[0]) {
my $len = length $pos;
printf "$node->[0] = $len\'b$pos, weight %4.2f\n", $node->[1];
my $prob = (length $pos) * $node->[1] / $total;
$bitlen += $prob;
$keys++;
} else {
walk($node->[2][0], $pos . "0");
walk($node->[2][1], $pos . "1");
}
}
walk($queue[0], "");
printf "Bit length = %6.6f (%d bits / %d entries) ; %d keys\n", $bitlen, int($bitlen * $total + .5), $total, $keys;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment