Skip to content

Instantly share code, notes, and snippets.

@willamesoares
Last active January 17, 2018 15:55
Show Gist options
  • Save willamesoares/6a94e8ea2546b0e0534a to your computer and use it in GitHub Desktop.
Save willamesoares/6a94e8ea2546b0e0534a to your computer and use it in GitHub Desktop.
Hufman's Algorithm

HUFFMAN'S CODE IMPLEMENTATION - PYTHON 2.7

HOW DOES IT WORK?

  • In this specific implementation, the correctness of the code generated can be prooved manually by following the workflow of the algorithm, described below:
  1. A dictionary in Python was used as the primary data structure. In order to deal with all the characters and calculate their frequencies, each character given in the message is added to this dictionary and the values for each key (character) represents its frequency. Due to that, the caracters within the dictionary are automatically sorted in alphabetical order.
  2. In every step, an auxiliar array is generated with sorted items from the dictionary, but this time they are sorted by the values for each key.
  3. With that, the first two items are summed in order to get the frequency for a new node, that represents those characters union.
  4. At this point, an array of nodes is created, so we can go over these nodes and their children recursively.
  5. The key for the the Huffman's code calculation is that nodes with smaller frequencies receive '0' and get it added to its current value, and nodes with bigger frequencies receive '1' instead.
  6. The 'createHuffman' function does exactly that, by recursively iterating through the array of combined nodes.
  7. At the end, the codes generated for each character are reversed so we can get the real code for that character. This is due to the fact that we are starting to generate the code from the leaves to the root, so obviously we would get the reversed code.
  8. The function returns a string object that represents a Huffman code for that message.

IMPROVEMENTS

  • This is still a simple code and does not represent the best capacity for a Python code. The techniques used in this implementation are limited to the programmer's skills. Therefore, it is certain that there are some pieces of code that can be improved and/or added to the file.
  • One example would be to implement an opposite function that receives a Huffman code and generates the message that it represents. Currently, the process is not bidirecional.

CREDITS

  • This snippet was implemented by Willame Soares as a simple practice for a Computer Science class and to improve/learn Python skills.
#Codigo de Huffman
import string, operator
alphabet = string.ascii_lowercase
#adicionar 0 ou 1 para cada caracter
def createHuffman(letter, b):
if not letter.startswith('union'):
dic_cdg[letter] += `b`
else:
for i in range(len(combinedArray)):
if combinedArray[i][0] == letter:
createHuffman(combinedArray[i][1], b)
createHuffman(combinedArray[i][2], b)
def getHuffman(message):
message = list(string.lower(message))
huffmanCode = ''
global dic_caracter
dic_caracter = {}
global combinedArray
combinedArray = []
global dic_cdg
dic_cdg = {}
#inicializar dicionario
for letter in alphabet:
dic_caracter[letter] = 0
#calcular frequencia de cada letra
for letter in message:
if letter != ' ':
dic_caracter[letter] += 1
dic_cdg[letter] = ''
#deletar letras com frequencia = 0
for item in dic_caracter.items():
if item[1] == 0:
del dic_caracter[item[0]]
#calcular codigo de Huffman
for i in range(len(dic_caracter)-1):
#ordenar e adicionar os dois primeiros caracateres
aux = sorted(dic_caracter.items(), key=operator.itemgetter(1))
dic_caracter['union'+`i`] = aux[0][1] + aux[1][1]
combinedArray.append(['union'+`i`,aux[0][0],aux[1][0]])
if aux[0][1] <= aux[1][1]:
createHuffman(aux[0][0],0)
createHuffman(aux[1][0],1)
else:
createHuffman(aux[1][0],0)
createHuffman(aux[0][0],1)
del dic_caracter[aux[0][0]]
del dic_caracter[aux[1][0]]
#reverter codigo obtido para cada caracter
for i in dic_cdg.items():
dic_cdg[i[0]] = dic_cdg[i[0]][::-1]
#definir codigo com base na mensagem
for letter in message:
if letter != ' ':
huffmanCode += dic_cdg[letter]
return huffmanCode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment