Delivering MPEG-4
Aug 1, 2009 12:00 PM, Are Olafsen
A dedicated DSP chip means more onboard processing in a familiar environment.
There is general agreement that the future of digital television transmission, over any platform, will use MPEG-4 part 10 as a codec. Sometimes known as the advanced video codec (AVC), or the ITU-T standard H.264 to the telecom industry, MPEG-4 was collectively first published as a standard in 2003. This article will look at the complexities of encoding MPEG-4 and the resulting requirement for carefully designed hardware.
The first important point to bear in mind is that, like MPEG-2 — still commonly used for standard definition transmissions — MPEG-4 is an asymmetric codec. It uses processor-intensive algorithms to create the compressed stream but is relatively simple to decode, allowing it to be implemented in an inexpensive chip embedded in millions of set-top boxes and television receivers.
One fundamental economic consideration: Consumer goods manufacturers would be extremely reluctant to tolerate any increase in their manufacturing costs, so when moving from MPEG-2 to MPEG-4, the decoding process has to be broadly comparable.
The team working on the AVC project was tasked with the goal of achieving comparable video quality at half the bit rate (or lower) than MPEG-2. Given that there was no option to increase the decoder significantly, it put even greater pressure on the encoding algorithms.
The solution was to build on the fundamental building blocks of MPEG-2 — discrete cosine transforms within macroblocks on individual frames and the use of reference frames for additional temporal compression — but add considerably more depth of processing in each area. This took advantage of the vast increases in processing power available since the original MPEG-2 standard was established a decade or more earlier. But it remained highly challenging, which is why many of the elements are regarded as options and few, if any, commercial encoders attempt to use the full toolkit.
Coding efficiency
There is not enough space here to list all the additional techniques AVC uses to improve coding efficiency, and many require a detailed understanding of the mathematics behind the theories in order to appreciate them. However, it is worth looking at some of them in outline view.
Perhaps the most important step change is the move to multiple reference pictures. (See Figure 1 on page 10.) Whereas MPEG-2 uses one or at most two reference pictures in inter-frame coding, AVC permits the use of up to 16 frames (32 fields in interlaced television). In some cases — rapid back-and-forth cuts between two camera angles, or scenes with a relatively large expanse of background (a golf course, for example) — this can produce dramatic savings in the bit rate.
Motion vectors also play a big part in providing a good prediction of how a scene will develop across a group of pictures. Again, AVC allows multiple motion vectors, as well as a greater range of both horizontal and vertical values, to significantly improve the accuracy of the predictions and thus improve the compression. By weighting the predictions, a well-designed AVC encoder can also perform much better on transitions that are traditionally tricky for encoders, such as crossfades and fade to black.
Entropy coding is at the heart of the MPEG algorithms. Entropy is defined as a measure of the degree of disorder in a system, and in image compression it is the technique by which the random elements of the picture are controlled. AVC adds several ways of managing entropy coding, including variable length encoding (as opposed to a structure based on 8-bit bytes), context-adaptive variable length encoding and context-adaptive binary arithmetic coding (CABAC).
CABAC is an auto-adaptive algorithm that can offer lossless compression of syntax elements in the video stream by knowing the probabilities of those syntax elements in a given context. This is widely used in AVC encoding, requiring considerable statistical analysis.
The standard also includes advances in resilience, including a network abstraction layer. By decoupling information relevant to more than one slice from the media stream, AVC can eliminate header duplication. This makes for more compact and stable data, deriving key information such as picture size, coding modes employed and the macroblock map from self-contained packets in the network layer.
Other resilience measures include flexible macroblock ordering and data partitioning, which allows the separation of more and less important syntax elements into different packets of data. This in turn enables the application of unequal error protection, using the resilience overhead where it can be of greatest benefit.
The network abstraction layer is also the foundation for an important extension to the standard, codified in November 2007, which introduces scalable video coding. (See figure 2 below.) This creates a subset bit stream from the overall transmission by dropping packets. In turn, this means operators can be offered a reduced size or reduced frame rate service (perhaps for mobile devices) alongside a high-quality service, and within the same bit budget.
Finally, the standard allows for a number of profiles, which give the operator another way in which the delivered quality of experience can be fine-tuned. While initial interest in AVC was around the main profile, network operators are turning to the high profile to get better quality on-screen. Other profiles allow AVC to be used as a high-quality contribution or distribution algorithm, incorporating 4:2:2 color sampling and 10-bit video streams.
| Want to use this article? Click here for options! |
































