Video encoding technology
May 1, 2009 12:00 PM, By John Watkinson
Most TV viewing takes place with some ambient lighting, and as a result, the contrast ratio of television is much less than can be obtained in the cinema. This makes 8-bit resolution perfectly adequate. Broadcast television faces two bandwidth restrictions — one external and one self-made. First, the electromagnetic spectrum is needed for other purposes, and the spectacular growth of cellular telephones has made spectrum more valuable. Second, television broadcasters have decided that viewers want more channels, even though the constant amount of talent is thereby diluted. As a result, the compression factors used in digital broadcasting are high, and the level of artifacts is nothing to be proud of. For TV production purposes, intracoding gives editing freedom. Most videotape formats use intracoding for that reason.
Moving pictures viewed over the Internet tend to be downsampled and heavily compressed. This is a consequence of immediate and free access to an extremely wide range of material. Nevertheless, as the bandwidth available to Internet subscribers increases, the quality will improve.
One of the requirements for Internet use is a codec that allows the same material to be available in a range of qualities dependent on the bit rate available to the individual subscriber. Wavelet-based compression is usually superior in this respect.
Moving pictures by educated guesswork
There is no one ideal compression codec. The difficulty is figuring out how to make compression available to a wide range of applications and how to allow future developments to enhance the system without causing obsolescence. At one extreme, an electronic cinema compression system designed to work on a giant screen will need more powerful hardware and more memory than a system designed for a security camera. The way around this is to define levels and profiles in the system. Levels set limits on the amounts of processing power and memory needed to decode the signal. Profiles set limits on the complexity of the encoding and decoding. Obsolescence is avoided by adopting two steps. The first is to define the signal between the encoder and the decoder and not the encoder itself. The second is to make improvements in a way that is backward compatible.
Figure 1. A) In MPEG-1 and MPEG-2, computer graphic images must be rendered to video before coding. B) In contrast, MPEG-4 may move the rendering process to the decoder, reducing the bit rate needed with the penalty of increased decoder complexity.
Click to enlarge
A good way of visualizing compression is to consider that the decoder is equipped with tools that allow it to make an educated guess about what is coming next based on what came before. If the encoder contains a decoder, it must know what the decoder can predict and then sends only what couldn't be predicted. MPEG is an acronym for Moving Pictures by Educated Guesswork.
Clearly, if the decoder is equipped with more tools or those tools are more highly refined, the guesswork will be better, and the amount of unpredictable content decreases. So the development path from MPEG-1, through MPEG-2 to MPEG-4 represents the process of increasing and refining the toolkit. As MPEG-4 contains additional tools and refinements of what went before, then an MPEG-4decoder automatically contains MPEG-2 and MPEG-1 decoders, and backward compatibility is achieved. If we compare like with like and look at the performance of MPEG-2 and MPEG-4 on conventional video inputs, we find that the extra predictive ability of MPEG-4 allows the same picture quality at a significantly reduced bit rate. H-264, also known as Advanced Video Coding (AVC), is the part of MPEG-4 that relates to conventional video inputs. This is likely to be a popular codec for delivery of HD.
Whereas MPEG-1 and MPEG-2 work with entire pictures, MPEG-4 goes far beyond that. (See Figure 1.) In Figure 1A, a video coder expects as an input a complete picture repeating at the frame rate. Imagine that such a picture was the output of a graphics engine that was rendering images in real time. The graphics engine would compute the appearance of any virtual objects from the selected viewpoint using ray tracing. If the viewpoint or one of the objects moves, each video frame will be different, and the MPEG-2 coder will use its coding tools to encode the image differences. However, the motion of a virtual object could be fully described by vectors. In Figure 1B, an MPEG-4 encoder can handle the graphic instructions directly so that the rendering engine is actually in the MPEG-4 decoder. Once the appearance of objects is established in the decoder, animating them requires little more than the transmission of a few vectors.
MPEG-4 works with four types of objects. (See Figure 2.) Objects may be encoded as 2-D or 3-D data. 2-D objects are divided into video and still. A video object is a textured area of arbitrary shape that changes with time, whereas a still texture object does not change with time. Typically, a still texture object may be a background. Although it does not change with time, it may give the illusion of doing so. For example, if the background pixel array is much larger than the display, the display can pan across the background to give the impression of motion.
Figure 2 further shows that MPEG-4 standardizes ways of transmitting the 3-D shape of a virtual object, known as a mesh object, along with the means to map its surface appearance, or texture, onto that object. Generally, any shape of object can be handled. The decoder will recreate each object and render each one from the selected viewpoint. In parts of the picture where there is no object, the background will be keyed in. It should be clear that if the decoder is aware of the shape and texture of all relevant objects, the viewpoint does not need to be chosen at the encoder. The viewpoint might be chosen by the viewer in an interactive system such as a video game or a simulator. For applications such as video phones and video conferencing, MPEG-4 supports a specific type of mesh object that may be a human face or a human face and body.
Unlike the DC-based transforms of MPEG-2, the Dirac codec uses wavelets and so inherently works well in multiresolution applications. Dirac is available in intracoding versions for production purposes, as well a temporally-coded version for delivery. Developed by the BBC, it has the advantage of being royalty-free.
John Watkinson is a consultant in advanced technology. His most recent books are “The Art of Digital Video,” “The Art of the Helicopter” and “The MPEG Handbook” available from Focal Press/Elsevier.
| Want to use this article? Click here for options! |



















