Huffman coding is a statistical technique which attempts to reduce the amount of bits required to represent a string of symbols. The algorithm accomplishes its goals by allowing symbols to vary in length. Shorter codes are assigned to the most frequently used symbols, and longer codes to the symbols which appear less frequently in the string.
So, when we’ll translate the input we want to encode the most frequent symbols will take less space than they used to and the least frequent symbols will take more space but because they’re less frequent.
Building a Huffman Tree
The Huffman code for an alphabet (set of symbols) may be generated by constructing a binary tree with nodes containing the symbols to be encoded and their probabilities of occurrence. The tree may be constructed as follows:Step 1. Create a parentless node for each symbol. Each node should include the symbol and its probability.
Step 2. Select the two parentless nodes with the lowest probabilities.
Step 3. Create a new node which is the parent of the two lowest probability nodes.
Step 4. Assign the new node a probability equal to the sum of its children's probabilities.
Step 5. Repeat from Step 2 until there is only one parentless node left.
Huffman Code: Example
The following example bases on a data source using a set of five different symbols. The symbol's frequencies are:
Symbol Frequency
A 24
B 12
C 10
D 8
E 8
----> total 186 bit
(with 3 bit per code word)
The two rarest symbols 'E' and 'D' are connected first, followed by 'C' and 'D'. The new parent nodes have the frequency 16 and 22 respectively and are brought together in the next step. The resulting node and the remaining symbol 'A' are subordinated to the root node that is created in a final step.
Code Tree According To Huffman.
Symbol Frequency Code Code total
Length Length
A 24 0 1 24
B 12 100 3 36
C 10 101 3 30
D 8 110 3 24
E 8 111 3 24
---------------------------------------
ges. 186 bit tot. 138 bit
(3 bit code)
To decode a string of bits we just need to traverse the tree for each bit, if the bit is 0 we take a left step and if the bit is 1 we take a right step until we hit a leaf (which is the symbol we are looking for). For example, if we have the string “100 111 110 ″ and our tree, decoding it we’ll get the string “bed”.
It’s very important to observe that not one code is a prefix of another code for another symbol. In our example, if 0 is the code for ‘a’, 00 cannot be a code for any other symbol because there’s going to be a conflict. We’ll never reach that symbol because after taking steps for the first two bits we’ll get ‘a’, we’re never going the find the symbol for 00.
Here is the python code for Huffman coding.
Reference book?
ReplyDelete