![]() |
VOOZH | about |
We have discussed Huffman Encoding in a previous post. In this post, decoding is discussed.
Examples:
Input Data: AAAAAABCCCCCCDDEEEEE
Frequencies: A: 6, B: 1, C: 6, D: 2, E: 5Encoded Data: 0000000000001100101010101011111111010101010
Huffman Tree: '#' is the special character usedfor internal nodes as character field
is not needed for internal nodes.#(20)
/ \
#(12) #(8)
/ \ / \
A(6) C(6) E(5) #(3)
/ \
B(1) D(2)Code of 'A' is '00', code of 'C' is '01', ..
Decoded Data: AAAAAABCCCCCCDDEEEEE
Input Data: GeeksforGeeks
Character With there Frequencies
e 10, f 1100, g 011, k 00, o 010, r 1101, s 111Encoded Huffman data: 01110100011111000101101011101000111
Decoded Huffman Data: geeksforgeeks
Follow the below steps to solve the problem:
Note: To decode the encoded data we require the Huffman tree. We iterate through the binary encoded data. To find character corresponding to current bits, we use the following simple steps:
The below code takes a string as input, encodes it, and saves it in a variable encoded string. Then it decodes it and prints the original string.
Below is the implementation of the above approach:
Character With there Frequencies: e 10 f 1100 g 011 k 00 o 010 r 1101 s 111 Encoded Huffman data: 01110100011111000101101011101000111 Decoded Huffman Data: geeksforgeeks
Time complexity:
Time complexity of the Huffman coding algorithm is O(n log n), where n is the number of characters in the input string. The auxiliary space complexity is also O(n), where n is the number of characters in the input string.
In the given C++ implementation, the time complexity is dominated by the creation of the Huffman tree using the priority queue, which takes O(n log n) time. The space complexity is dominated by the maps used to store the frequency and codes of characters, which take O(n) space. The recursive functions used to print codes and store codes also contribute to the space complexity.
Comparing the input file size and the Huffman encoded output file. We can calculate the size of the output data in a simple way. Let's say our input is a string "geeksforgeeks" and is stored in a file input.txt.
Input File Size:
Input: "geeksforgeeks"
Total number of character i.e. input length: 13
Size: 13 character occurrences * 8 bits = 104 bits or 13 bytes.
Output File Size:
Input: "geeksforgeeks"
------------------------------------------------
Character | Frequency | Binary Huffman Value |
------------------------------------------------e | 4 | 10 |
f | 1 | 1100 |
g | 2 | 011 |
k | 2 | 00 |
o | 1 | 010 |
r | 1 | 1101 |
s | 2 | 111 |------------------------------------------------
So to calculate output size:
e: 4 occurrences * 2 bits = 8 bits
f: 1 occurrence * 4 bits = 4 bits
g: 2 occurrences * 3 bits = 6 bits
k: 2 occurrences * 2 bits = 4 bits
o: 1 occurrence * 3 bits = 3 bits
r: 1 occurrence * 4 bits = 4 bits
s: 2 occurrences * 3 bits = 6 bitsTotal Sum: 35 bits approx 5 bytes
Hence, we could see that after encoding the data we saved a large amount of data. The above method can also help us to determine the value of N i.e. the length of the encoded data.