Skip to content

Instantly share code, notes, and snippets.

@jasonrdsouza
Last active July 13, 2022 00:33
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save jasonrdsouza/1c9c895f43497d15eb2e to your computer and use it in GitHub Desktop.
Save jasonrdsouza/1c9c895f43497d15eb2e to your computer and use it in GitHub Desktop.
Simple Huffman coding implementation
from queue import PriorityQueue
from collections import Counter
from dataclasses import dataclass, field
from typing import Any
@dataclass(order=True)
class Node:
count: int
value: Any=field(compare=False)
def huffman(symbol_list):
"""
This function generates the huffman tree for the given input.
The input is a list of "symbols".
"""
# figure out the frequency of each symbol
counts = Counter(symbol_list).most_common()
total = len(symbol_list)
if len(counts) < 2:
# 0 or 1 unique symbols, so no sense in performing huffman coding
return
queue = PriorityQueue()
for (val,count) in counts:
queue.put(Node(count=count, value=val))
# Create the huffman tree
largest_node_count = 0
while total != largest_node_count:
node1 = queue.get(False)
node2 = queue.get(False)
new_count = node1.count + node2.count
largest_node_count = new_count if new_count > largest_node_count else largest_node_count
queue.put(Node(count=new_count, value=(node1,node2)))
huffman_tree_root = queue.get(False)
# generate the symbol to huffman code mapping
lookup_table = huffman_tree_to_table(huffman_tree_root, "", {})
return lookup_table
def huffman_tree_to_table(root, prefix, lookup_table):
"""Converts the Huffman tree rooted at "root" to a lookup table"""
if type(root.value) != tuple:
# leaf node
lookup_table[root.value] = prefix
else:
huffman_tree_to_table(root.value[0], prefix + "0", lookup_table)
huffman_tree_to_table(root.value[1], prefix + "1", lookup_table)
return lookup_table
def text_to_huffman_code(input_text):
"""Helper function to convert an input string into its huffman symbol table"""
return huffman([c for c in input_text])
@canonicalchris
Copy link

The code as presented will fail if there are two items that have the same number of counts; the underlying priority queue code will then attempt to compare the symbols or sub-trees, which will fail in the merge with
TypeError: '<' not supported between instances of 'tuple' and 'str'

The easiest way to fix that is to mix in the arrival time (a monotonically increasing counter) as a tie-breaker (beware, Python indent is lost).
queue = PriorityQueue()
arrival = 0
total = 0
for (val,count) in counts:
queue.put((count, arrival, val))
arrival += 1
total += count

That requires edits to huffman_tree_to_table and to the merging code. The main disadvantage of that strategy is that the table created is now dependent on the input data. A more principled solution would hash off the symbols or subtrees to get a stable tie breaker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment