Huffman Decoding

Last Updated : 7 Apr, 2023

We have discussed Huffman Encoding in a previous post. In this post, decoding is discussed.

Examples:

Input Data: AAAAAABCCCCCCDDEEEEE
Frequencies: A: 6, B: 1, C: 6, D: 2, E: 5

Encoded Data: 0000000000001100101010101011111111010101010

Huffman Tree: '#' is the special character usedfor internal nodes as character field
is not needed for internal nodes.

#(20)
/ \
#(12) #(8)
/ \ / \
A(6) C(6) E(5) #(3)
/ \
B(1) D(2)

Code of 'A' is '00', code of 'C' is '01', ..

Decoded Data: AAAAAABCCCCCCDDEEEEE

Input Data: GeeksforGeeks

Character With there Frequencies
e 10, f 1100, g 011, k 00, o 010, r 1101, s 111

Encoded Huffman data: 01110100011111000101101011101000111
Decoded Huffman Data: geeksforgeeks

Follow the below steps to solve the problem:

Note: To decode the encoded data we require the Huffman tree. We iterate through the binary encoded data. To find character corresponding to current bits, we use the following simple steps:

We start from the root and do the following until a leaf is found.
If the current bit is 0, we move to the left node of the tree.
If the bit is 1, we move to right node of the tree.
If during the traversal, we encounter a leaf node, we print the character of that particular leaf node and then again continue the iteration of the encoded data starting from step 1.

The below code takes a string as input, encodes it, and saves it in a variable encoded string. Then it decodes it and prints the original string.

Below is the implementation of the above approach:

Output

Character With there Frequencies:
e 10
f 1100
g 011
k 00
o 010
r 1101
s 111

Encoded Huffman data:
01110100011111000101101011101000111

Decoded Huffman Data:
geeksforgeeks

Time complexity:

Time complexity of the Huffman coding algorithm is O(n log n), where n is the number of characters in the input string. The auxiliary space complexity is also O(n), where n is the number of characters in the input string.

In the given C++ implementation, the time complexity is dominated by the creation of the Huffman tree using the priority queue, which takes O(n log n) time. The space complexity is dominated by the maps used to store the frequency and codes of characters, which take O(n) space. The recursive functions used to print codes and store codes also contribute to the space complexity.

Comparing Input file size and Output file size:

Comparing the input file size and the Huffman encoded output file. We can calculate the size of the output data in a simple way. Let's say our input is a string "geeksforgeeks" and is stored in a file input.txt.

Input File Size:

Input: "geeksforgeeks"
Total number of character i.e. input length: 13
Size: 13 character occurrences * 8 bits = 104 bits or 13 bytes.

Output File Size:

Input: "geeksforgeeks"

------------------------------------------------
Character | Frequency | Binary Huffman Value |
------------------------------------------------

e | 4 | 10 |
f | 1 | 1100 |
g | 2 | 011 |
k | 2 | 00 |
o | 1 | 010 |
r | 1 | 1101 |
s | 2 | 111 |

------------------------------------------------

So to calculate output size:

e: 4 occurrences * 2 bits = 8 bits
f: 1 occurrence * 4 bits = 4 bits
g: 2 occurrences * 3 bits = 6 bits
k: 2 occurrences * 2 bits = 4 bits
o: 1 occurrence * 3 bits = 3 bits
r: 1 occurrence * 4 bits = 4 bits
s: 2 occurrences * 3 bits = 6 bits

Total Sum: 35 bits approx 5 bytes

Hence, we could see that after encoding the data we saved a large amount of data. The above method can also help us to determine the value of N i.e. the length of the encoded data.

Comment

Article Tags:

DSA

URL: https://www.geeksforgeeks.org/dsa/huffman-decoding/

⇱ Huffman Decoding - GeeksforGeeks

Huffman Decoding

Comparing Input file size and Output file size:

Explore