- This is a simple implementation of Huffman's algorithm, generally used in data compression.
- https://en.wikipedia.org/wiki/Huffman_coding
- In this specific implementation, the correctness of the code generated can be prooved manually by following the workflow of the algorithm, described below:
- A dictionary in Python was used as the primary data structure. In order to deal with all the characters and calculate their frequencies, each character given in the message is added to this dictionary and the values for each key (character) represents its frequency. Due to that, the caracters within the dictionary are automatically sorted in alphabetical order.
- In every step, an auxiliar array is generated with sorted items from the dictionary, but this time they are sorted by the values for each key.
- With that, the first two items are summed in order to get the frequency for a new node, that represents those characters union.
- At this point, an array of nodes is created, so we can go over these nodes and their children recursively.
- The key for the the Huffman's code calculation is that nodes with smaller frequencies receive '0' and get it added to its current value, and nodes with bigger frequencies receive '1' instead.
- The 'createHuffman' function does exactly that, by recursively iterating through the array of combined nodes.
- At the end, the codes generated for each character are reversed so we can get the real code for that character. This is due to the fact that we are starting to generate the code from the leaves to the root, so obviously we would get the reversed code.
- The function returns a string object that represents a Huffman code for that message.
- This is still a simple code and does not represent the best capacity for a Python code. The techniques used in this implementation are limited to the programmer's skills. Therefore, it is certain that there are some pieces of code that can be improved and/or added to the file.
- One example would be to implement an opposite function that receives a Huffman code and generates the message that it represents. Currently, the process is not bidirecional.
- This snippet was implemented by Willame Soares as a simple practice for a Computer Science class and to improve/learn Python skills.