Skip to content

Instantly share code, notes, and snippets.

@kirelagin
Created October 13, 2012 21:44
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kirelagin/3886243 to your computer and use it in GitHub Desktop.
Save kirelagin/3886243 to your computer and use it in GitHub Desktop.
Very simple implementation of Huffman coding in Haskell
> module Huffman where
> import Control.Arrow
> import Data.List
> import qualified Data.Map as M
> import Data.Function
This typeclass is supposed to make life _a bit_ easier.
> class Eq a => Bits a where
> zer :: a
> one :: a
>
> instance Bits Int where
> zer = 0
> one = 1
>
> instance Bits Bool where
> zer = False
> one = True
Codemap is generated from a Huffman tree. It is used for fast encoding.
> type Codemap a = M.Map Char [a]
Huffman tree is a simple binary tree. Each leaf contains a Char and its weight.
Fork (node with children) also has weight = sum of weights of its children.
> data HTree = Leaf Char Int
> | Fork HTree HTree Int
> deriving (Show)
>
> weight :: HTree -> Int
> weight (Leaf _ w) = w
> weight (Fork _ _ w) = w
The only useful operation on Huffman trees is merging, that is we take
two trees and make them children of a new Fork-node.
> merge t1 t2 = Fork t1 t2 (weight t1 + weight t2)
`freqList` is an utility function. It takes a string and produces a list
of pairs (character, number of occurences of this character in the string).
> freqList :: String -> [(Char, Int)]
> freqList = M.toList . M.fromListWith (+) . map (flip (,) 1)
`buildTree` builds a Huffman tree from a list of character frequencies
(obtained, for example, from `freqList` or elsewhere).
It sorts the list in ascending order by frequency, turns each (char, freq) pair
into a one-leaf tree and keeps merging two trees with the smallest frequencies
until only one tree is remaining.
> buildTree :: [(Char, Int)] -> HTree
> buildTree = bld . map (uncurry Leaf) . sortBy (compare `on` snd)
> where bld (t:[]) = t
> bld (a:b:cs) = bld $ insertBy (compare `on` weight) (merge a b) cs
The next function traverses a Huffman tree to obtain a list of codes for
all characters and converts this list into a `Map`.
> buildCodemap :: Bits a => HTree -> Codemap a
> buildCodemap = M.fromList . buildCodelist
> where buildCodelist (Leaf c w) = [(c, [])]
> buildCodelist (Fork l r w) = map (addBit zer) (buildCodelist l) ++ map (addBit one) (buildCodelist r)
> where addBit b = second (b :)
Simple functions to get a Huffman tree or a `Codemap` from a `String`.
> stringTree :: String -> HTree
> stringTree = buildTree . freqList
>
> stringCodemap :: Bits a => String -> Codemap a
> stringCodemap = buildCodemap . stringTree
Time to do the real encoding and decoding!
Encoding function just represents each character of a string by corresponding
sequence of `Bit`s.
> encode :: Bits a => Codemap a -> String -> [a]
> encode m = concat . map (m M.!)
>
> encode' :: Bits a => HTree -> String -> [a]
> encode' t = encode $ buildCodemap t
Decoding is a little trickier. We have to traverse the tree until
we reach a leaf which means we've just finished reading a sequence
of `Bit`s corresponding to a single character.
We keep doing this to process the whole list of `Bit`s.
> decode :: Bits a => HTree -> [a] -> String
> decode tree = dcd tree
> where dcd (Leaf c _) [] = [c]
> dcd (Leaf c _) bs = c : dcd tree bs
> dcd (Fork l r _) (b:bs) = dcd (if b == zer then l else r) bs
@jumetaj
Copy link

jumetaj commented Jun 21, 2015

Hey, it is a great tutorial.
But I do not understand how you give then the command to the console.
Can you gimme a tip?
Thanx

@Nabster404
Copy link

Nabster404 commented Jul 3, 2020

How does decode work?

Copy link

ghost commented Aug 27, 2021

How do you use encode and decode?

@0x86f
Copy link

0x86f commented Feb 17, 2023

In case someone will stumble accross this nice gist, you can use ghci (interactive prompt) to work with the encode and decode functions:

*> let greeting = "Hello World"
*> let tree = stringTree greeting
*> (encode' tree greeting) :: [Int] -- or you can use the Bool-type
[1,1,0,1,0,0,0,1,0,1,0,1,1,1,1,1,0,0,0,1,0,1,1,1,0,0,1,1,0,0,1,1]

If you want to compile the file, you have to add a main-function and rename the module to "Main", e.g.

> module Main where
.....
> main :: IO ()
> main = putStrLn . concatMap show $ ((encode' tree greeting) :: [Int])
>   where greeting = "Hello World"
>         tree = stringTree greeting

and compile the file into an executable ghc <filename>, so you can execute it afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment