Ticker

6/recent/ticker-posts

Huffman Tree | Huffman Coding | Data Structure & Algorithm




A binary code encodes each character as a binary string
or codeword. We would like to find a binary code that encodes the file using as few bits as possible, ie., compresses it as much as possible.

In a fixed-length code each codeword has the same length. In a variable length code codewords may have different lengths. Encoding Given a code (corresponding to some alphabet ) and a message it is easy to encode the message. Just replace the characters by the codewords. Decoding Given an encoded message, decoding is the process of turning it back into the original message. A message is uniquely decodable if it can only be decoded in one way. The unique decipherability property is needed in order for a code to be useful. Prefix-Codes Fixed-length codes are always uniquely decipherable. These do not always give the best compression so we prefer to use variable length codes. A code is called a prefix (free) code if no codeword is a prefix of another one. {a = 0, b = 110, c = 10, d = 111} is a prefix code. Every message encoded by a prefix free code is uniquely decipherable. Since no codeword is a prefix of any other we can always find the first codeword in a message, peel it off, and continue decoding. Fixed-Length Vs Variable-Length Codes Problem: Suppose we want to store messages madeup of 4 characters a, b, c, d with frequencies 60, 5,30, 5 (percents) respectively. What are the fixed length codes and prefix-free codes that use the least space? To store 100 of these characters, (1) the fixed-length code requires 100×2 = 200 bits (2) the prefix code uses only 60 × 1 + 5 × 3 + 30 × 2 + 5 × 3 = 150 25% saving!!!! Huffman Coding Step 1: Pick two letters x, y from alphabet A with the smallest frequencies and create a subtree that has these two characters as leaves. (greedy idea) Label the root of this subtree as z. Step 2: Set frequency f(z) = f(x) + f(y). Remove x, y and add z creating new alphabet. Repeat this procedure, called merge, with new alphabet until an alphabet with only one symbol is left. The resulting tree is the Huffman Tree. Time Complexity The time complexity of the Huffman algorithm is O(nlogn). Using a heap to store the weight of each tree, each iteration requires O(logn) time to determine the cheapest weight and insert the new weight. There are O(n) iterations, one for each item. For more Details click here https://rb.gy/lj3byc


Post a Comment

0 Comments