Raising the standard
Sep 1, 2007 12:00 PM, BY TIM SHEPPARD
MPEG-4 AVC allows for a new generation of content delivery platforms.
Dynamic GOP
This technique dynamically decides the GOP size and structure, or how the types of frames used for encoding are divided. Typically, a GOP starts with an I-frame and is then followed by B-frames and P-frames before the GOP starts over with the next I-frame. The I-frame is not referenced to any other frame and is not based on prediction, while the B-frames and P-frames are referenced to other frames and use motion estimation across the frames. When dynamic GOP is used, the encoder selects between three hierarchical GOP structures — for example one with three B-frames, one with seven B-frames and a third with 15 B-frames.
The selection is content-dependent, meaning it uses motion among other characteristics, and the GOP length must be a multiple of 16. When dynamic GOP is not used, either a hierarchical GOP structure (with three B-frames) or an MPEG-2-like GOP structure with two B-frames is selected. In the case of the hierarchical GOP structure, the GOP length must be a multiple of four. In the case of the MPEG-2-like GOP structure, the GOP length must be a multiple of three.
Hierarchical GOP
In MPEG-2, a typical GOP consists of one I-frame and several P- and Bframes, where each B-frame uses the nearest past I- or P-frame and the future P-frame as references for prediction. B-frames are not used as references. (See Figure 2.)
H.264 removes this major restriction, providing the encoder the flexibility to choose whether to use Bframes as references for prediction. It is now possible to use only one Iframe and multiple B-frames in a GOP, where some B-frames would act as reference frames, or hierarchical GOP.
Figure 3 and Figure 4 illustrate a hierarchical GOP with seven B-frames. Figure 3 illustrates the GOP in the typical display order and decoding order as we normally illustrate an MPEG-2 GOP. Figure 4 more clearly shows the hierarchy of the GOP, but it's actually the same GOP as in Figure 3.
The B-frame of the level one (B
Support for high profile
High profile is an extension of the original MPEG-4 Part 10 standard that allows solution providers and developers to employ more sophisticated tools to improve encoding. One example is the use of dynamically different sizes of blocks within the frame. This is beneficial for use with HD as the use of bigger block sizes allows the preservation of more detail on an HD picture. The key to a good encoder is to dynamically select the right block sizes and the right encoding tools for the specific content being encoded.
Single-slice architecture
Single-slice encoding architecture yields significantly better video quality than that achieved by multiple-slice architectures for several reasons.
First, rate control is inherently more effective because it has easy access to the statistics of the whole picture.
Second, not having to reset statistics at the beginning of each slice within a picture maintains high context-based adaptive binary arithmetic coding (CABAC) efficiency. In addition, loop filtering, which cannot be applied across multiple slice boundaries, is applied to the full picture, significantly reducing blocking artifacts.
Third, the encoder does not have to perform motion compensation across slice boundaries, which is needed to avoid unnecessary blocking artifacts that would have been generated when two parts (one in each slice) of an object are motion-compensated differently. Obviously, motion compensation across slices requires additional complexity and memory transfer overhead.
While MPEG-2 and MPEG-4 are similar in some regards and share some common management and handling procedures, techniques for the two standards are not totally interchangeable. One area where there is similarity is the ability to use statistical multiplexing to dynamically share bit rate between channels. Conversely, splicing solutions designed for MPEG-2 are not directly useable for MPEG-4. The two splicing techniques that are apparent today for MPEG-4 are either decode/recode (this can be packaged as a single product) or preconditioning the incoming stream.
In an environment where there is the requirement to undertake a splice, for example for ad insertion into an existing MPEG-4 stream, the decode/recode option has drawbacks, including cost and potential quality reduction. Therefore, the best solution is to precondition the incoming feed at the point at which the splice is required.
The splicing device then sets up various parameters, such as buffers and frame types suitable to switch in another channel. This ensures that the splice point occurs at the end of a GOP to coincide with an I-frame and negates the possibility of an unclean splice through trying to link to a motion-estimated frame or broken reference. The local server can then play out the ad, ensuring that parameters such as buffers and frame types are set up in such a way as to minimize visual disturbance for the viewer.
It is important to recognize that despite its superior performance, MPEG-4 will have to coexist with legacy MPEG-2 deployments. In reality, this means using MPEG-2 and MPEG-4 in both constant and variable bit rate streams while still managing the bandwidth to maximize the number of services available. When working with constant bit rate services or variable bit rate services from a single multiplex, it's easy to calculate the maximum amount of services in a stream and manage the network to maximize use of the available bandwidth. (See Figure 5.)
However, mixing variable bit rate streams from different multiplexers can cause manifest problems, as the maximum bit rate cannot be quantified. This leads to overflow and packages getting lost, causing blocking and artifacts. (See Figure 6.) This situation could occur when an operator wants to make a subset of all the services available on the market to supply specific content to its client base. There is a possibility of having to mix variable bit rate streams from different muxes with the inherent problems, but for HDTV, this type of visual degradation is wholly unacceptable.
A proven solution for MPEG-2 involves using high-quality transraters. However, as we've already established, this is not commercially practical for HD MPEG-4 at the present time. Nevertheless, the problem can be resolved by providing a flexible solution using technology available today. An untransrated variable bit rate HD MPEG-4 service can be combined with fully transrated MPEG-2 SD services, producing high-quality video on both MPEG-2 SD and MPEG-4 HD while meeting the bandwidth criteria. (See Figure 7.)
Conclusion
The importance of the MPEG-4 family of audio and video coding standards cannot be underestimated. Not only is it extremely successful in its own right as a cutting-edge technology, but it has also proved to be a powerful enabler for the new generation of content delivery platforms. The success of IPTV and new HD services would not have been possible without the high-quality, low bit rate attributes of MPEG-4 encoding and enhanced quality of experience that it can help provide for the viewer. MPEG-4 technology is still being developed and will continue to evolve for many years to come. Arguably, the best is yet to come.
Tim Sheppard is senior business development manager broadcast at Scientific Atlanta, a Cisco company.
| Want to use this article? Click here for options! |

























