Printable Version

2-6 Wavelet Compression

02/04/2006

Atlantis Graphics, a software development company, was prototyping a new TV set-top box, which required an efficient video codec. To meet this criteria, I took the responsibility of implementing a video codec based on wavelet transforms. Wavelet compression is basically an encoding scheme founded on frequency sub-band coding. Transforming images into frequency sub-bands has the advantage of eliminating image block artifacts as seen in JPEG and MPEG compression algorithms. Furthermore, wavelet transforms can be very low cost in terms of computation and complexity, hence they are suitable for real-time encoding applications.

1 Introduction

1.1 Project Background

Atlantis Graphics was a start-up development house, funded by a parent organisation based in the United States. The idea behind Atlantis Graphics was to build a prototype TV set-top box that had PVR (Personal Video Recorder) and VOD (Video on Demand) capabilities built in. The set-top box was intended to be used on existing cable TV networks. Each client would rent a unit and would access the TV content through a peer-to-peer networking system. In essence, the client would share, transfer, record and view TV programs at any time on the network.

Similar set-top box technologies are already in place today, although peer-to-peer PVR and VOD schemes were still a relatively new concept in 2002, and to some extent it still is today. The hope was to get a foothold in the cable TV market before competing firms would do the same. Atlantis Graphics had an advantage in that respect, because the parent company was connected with running cable TV networks. Unfortunately, these plans were never realised. Atlantis Graphics folded due to the extensive financial restructuring in the parent company.

1.2 Technology Overview

The system was supposed to operate on a proprietary network (owned by the pay-TV company), hence the actual sharing and transferring mechanism would be hidden from the user. The idea was to allow clients to save (or stream) content directly to (or from) other set-top boxes transparently. Therefore, even though each set-top box would be equipped with a storage device, a recorded TV program may not reside on the client's set-top box at all, it could be distributed and/or stored on other set-top boxes on the network. The client would only see a big collection of TV programs on the network. Any program recorded by a client will be visible to other clients on the network.

A distributed networking system of this magnitude would need a centralised database to keep track of the all recorded TV programs. On the client side, the set-top box would need an appropriate menu system that would allow the user to navigate through the list of programs obtained from the server.

The actual video content was streamed at either full NTSC or PAL resolutions. Obviously, this would need an efficient video compression algorithm to save bandwidth and to reduce data storage. Additionally, the video codec was required to encode video streams in real-time while playing back another stream simultaneously.

The whole project was prototyped on a couple of PCs, complete with a functional networking code, menu interface, a fast video codec and a TV program database server on the side. The grand plan was to eventually migrate the system onto an embedded architecture, and perhaps even adopt an ASIC implementation further down the track.

2 Video Codec

2.1 Choice of Compression Scheme

The video stream was processed by a technique known as the fast lifting wavelet transform [1]. The wavelet transform converted image data from the spatial domain into a series of frequency sub-bands. Data compression was achieved by combining the wavelet transform with a quantizing filter, and then by passing the resulting wavelet coefficients to a zero-tree entropy coder [3, 4].

The method we used in this project was based on the 2-6 wavelet transform [2], which was an optimised technique intended for ASIC applications. The performance of 2-6 wavelet compression was comparable to MPEG-2 technology, it achieved similar peak signal-to-noise and data compression ratios. However, the most attractive feature of 2-6 wavelet compression was its low complexity, making it suitable for fast software applications. Our implementation was capable of encoding and decoding video streams in real-time. The typical image quality that was achievable by this wavelet compression scheme is illustrated in Figures 1, 2 and 3.

Figure 1 Figure 2 Figure 3
Figure 1: Original input frame. Figure 2: The same frame with 2-6 wavelet compression applied. Figure 3: Error difference between original and compressed frames.

2.2 Colour Space Conversion

The initial step for video compression was to convert the incoming RGB frames into three separate colour planes, Y (luminance), U (chrominance blue) and V (chrominance red) respectively (see Figures 4 to 6). The frames were represented in the YUV9 format, essentially assigning every 4×4 block of Y pixels with a single pair of U and V pixels. In this form, each chrominance plane was reduced to 116th of the luminance bandwidth before compression. The only drawback with the YUV9 format was the chroma bleeding — colours would leak at the boundaries of the objects depicted, see Figure 3.

Figure 4 Figure 5 Figure 6
Figure 4, 5 and 6: The original RGB image is converted into Y (luminance), U (chrominance blue) and V (chrominance red) colour planes respectively. The bandwidth used by the U and V components is quite small, in this example they are shown in their actual size (160 x 128 pixels).

2.3 The Wavelet Transform

The 2-6 wavelet transform accepted a series of YUV9 colour planes, like the ones shown in Figures 4 to 6. It operated in multiple passes on each colour plane, successively dividing them into series of sub-bands as shown in Figure 7. The algorithm extracted both the low and high frequency data in each pass, separating them into two regions. The process was repeated successively on the low frequency half, dividing it further into smaller sub-bands, and stopped when the smallest sub-band was no longer divisible. For most situations, six passes was quite adequate (hence the term "2-6 transform" — splitting sub-bands in just six passes).

Figure 7
Figure 7: Image divided up into wavelet sub-bands.

As shown in Figure 7, the first horizontal division would extract the low and high frequency bands into L (left) and R (right) regions respectively. In the next pass, the low frequency data was vertically divided into L(T) and L(B) regions, forming new low and high frequency sub-bands respectively. The algorithm eventually formed a sub-band pyramid, essentially accumulating the low frequency data towards the lop-left, leaving the DC coefficients in the corner. The final transform was represented using 16-bit integers.

The transform algorithm was quite easy to adopt for SIMD processing. A large portion of the transform code was implemented in MMX and SSE assembly on the Intel platform, which could operate on four pixels at a time. This increased the efficiency by four-fold over the generic C++ implementation.

Figure 8 Figure 9 Figure 10
Figure 8, 9 and 10: The wavelet pyramids for the Y, U and V colour planes respectively. The low frequency information is accumulated in the top-left corner. The remaining areas consist of higher frequencies which get quantized.
Figure 11 Figure 12 Figure 13
Figure 11, 12 and 13: Bit-size map for the Y, U and V wavelets respectively.
Figure 14 Figure 15 Figure 16
Figure 14, 15 and 16: "Zero map" for the Y, U and V wavelets respectively. Black areas represent zero values.

Figures 8 to 10 illustrate how the YUV9 colour planes (in Figures 1 to 3) were transformed into wavelet sub-bands. It was immediately apparent that most of the sub-bands contained very little information, especially in the high frequency regions. This was exploited by applying a quantizing filter on these regions, practically making most of the high frequency coefficients zero. The redundant zero data would favour the zero-tree encoder for better data compression.

Figures 11 to 13 show the same wavelet data represented with a colour coded map. It highlights the bit-size of each wavelet coefficient. The black areas correspond to 0-bit information, while blue gradients represent bits of increasing size. The redundant zero coefficients are further highlighted in Figures 14 to 16. The wavelet coefficients were actually 16-bit, but most of those only had values set in the 5 least significant bits. About 1% of the coefficients had values greater than 8-bits - these corresponded to the low frequency and the DC coefficients.

2.4 Entropy Encoding

If the collection of wavelet coefficients were split into sixteen individual bit planes, one would notice that the plane corresponding to the most significant bit would be sparse and largely zero; whereas the plane corresponding to the least significant bit would be the most populated. Furthermore, each bit plane would become more sparse towards in higher frequency regions. To exploit this property, a zero-tree compression algorithm was implemented and operated on these bit planes independently.

Figure 17
Figure 17: The zero-tree structure in a wavelet bit plane (left) and its equivalent hierarchical tree structure (right).

Figure 17 shows how the zero-tree was organised in a single bit plane. The diagram on the right illustrates its equivalent quad-tree structure. The root node coincided with the location of the DC coefficients. Its descendants expanded outwards, spreading out to cover the high frequency area. The tree was built by linking each parent node to four children that contained one or more non-zero bits. Parent nodes whose children contained only zero bits were turned into a leaf node.

In other words, entire branches would be truncated at a node whose descendants were all zero. Since high frequency coefficients contained mostly zero data, the truncation technique made tree increasingly sparse for branch levels spreading further out, effectively making data compression feasible. This effect is illustrated Figures 18 to 20, which show the distribution of all the parent nodes in every bit plane. For example, one could observe that the sky region in Figure 1 had very little high frequency information to begin with, thus most of the corresponding wavelet coefficients were quantized out to zero. Figures 18 to 20 show the absence of parent nodes that correspond to the sky regions, because all of their descendant branches were zero.

Once the zero-tree was constructed, a series of bit codes were generated to represent the traversal of the entire tree. These traversal bit codes are saved in a file container as compressed data, which would allow a decoder to reconstruct the zero-tree during playback.

Figure 18 Figure 19 Figure 20
Figure 18, 19 and 20: These are the parent node maps for the Y, U and V zero-trees respectively. All bit planes are visualised together. Black regions indicate there are no parent nodes in any of the bit planes.

2.5 Summary

The whole encoding process could be summed up as follows:

  1. Convert each RGB frame into individual YUV9 colour planes;
  2. Apply the 2-6 wavelet transform on each of the YUV9 input colour planes;
  3. Quantise the wavelet coefficients to zero-out some of the high frequency data;
  4. Decompose the wavelet space into sixteen bit planes;
  5. For each bit plane, generate a parent node map for building a tree of non-zero bit coefficients;
  6. Traverse the entire tree while ignoring (or truncating) branches containing only zero bits;
  7. Store the traversal order in a series of bit codes.

3 Menu Interface

The set-top box user interface consisted of what can be described as a TV guide with a continuous time line. The user was able to scroll the TV guide seamlessly into the past, or peek into the future to reveal upcoming TV programs. Scrolling into the past would allow the user to access recorded programs already present on the PVR network, meanwhile upcoming TV programs in the future were available to be marked for recording. The scope of this TV guide would span several months into the past and a couple of weeks into future. If a user decided to watch a past TV program, the recorded video would be transferred in real-time from other clients' set-top boxes.

Figure 21 Figure 22
Figure 21: Main Menu. Figure 22: TV Guide.

Figure 21 illustrates the main menu of the set-top box. The "Movie Guide" and the "TV Guide" sub-menus were technically the same in functionality. They only differed in program contents. The "Setup" and "Test" sub-menus were inactive, because they were still under development.

Figure 22 shows an example of the TV guide. Each row displayed a list of programs slotted in their appropriate time for each TV channel. The user would navigate and highlight a particular program (light blue box). The TV program's title and information was displayed immediately below.

The menu system used various colour codes to signify the availability of the TV programs. TV programs with a yellow text were future programs available for recording. A program that was selected for recording were marked with a yellow circle. Items with a white text and a green arrow were the broadcasted programs. The user can view broadcasted programs immediately. Items that had a grey text with a red cross were past programs, which were not available for playback. Past programs that were available for playback had a green text with a green arrow (not shown in this example).

References

  1. C. Valens, The Fast Lifting Wavelet Transform (PDF, 291kB), 1999
  2. W. C. Lynch, K. Kolarov, W. Arrighi, R. Hoover, WZD - A Very Low Complexity Wavelet Video Codec for an ASIC Macro (PDF, 74kB), Interval Research Corporation, 1999
  3. J. Shapiro, Embedded Image Coding Using Zerotrees of Wavelet Coefficients, IEEE Trans. Signal Processing, vol. 41, pp. 3445 - 3463, 1993.
  4. C. Valens, Embedded Zerotree Wavelet Encoding (PDF, 89kB), 1999