The MPEG-2 compression standard has been widely deployed in video distribution infrastructures, such as cable and satellite networks, as well as in several consumer applications, such as DVDs and DVRs. For more than 10 years, end-to-end systems have existed, with several million interoperable encoders, multiplexers and set-top boxes deployed. The need to transcode to and from the MPEG-2 format has arisen.
The need for transcoding
The following three scenarios describe the need for transcoding. First, improvements in compression techniques have resulted in mature new standards that offer significant bit rate gains of 30 percent to 50 percent over MPEG-2. Investments made in legacy devices that can only handle the MPEG-2 format and the existence of large amounts of content that was created using MPEG-2 necessitate that multiple standards will coexist for several years before the eventual conversion to a single standard.
Second, emerging IPTV deployments of video over bandwidth constrained last mile will result in part of the content distribution chain migrating to H.264, thereby creating a need for efficient transcoders. (See Figure 1.)
Third, new applications, such as high-definition video and real-time broadcast video over handheld mobile devices, require that the same content be made available in several spatial resolutions and frame rate formats. This can be achieved by recreating the same content in several formats. Alternatively, it will be more efficient to create the content once and transcode it to different formats and resolutions as needed. (See Figure 2.)
Finally, improvements in programmable processor technologies, such as the DSPs and FPGAs, have made it possible for video processing vendors to field products that can handle multiple existing formats and field-upgrade the products for future emerging standards.
Types of transcoding
Several forms of transcoding are possible, depending on the specific parameters of the compressed bit stream that are modified during the transcoding process. They include:
- Bit rate transcoding
This process changes the bit rate of the compressed bit stream while keeping the resolution, frame rate and the encoding format the same. MPEG-2 bit rate transcoders, also called rateshapers, have been widely deployed today, and they employ efficient, high-density rateshaping by primarily operating in the discrete cosine transform (DCT) domain.
- Format transcoding
This entails converting the compression format — for example, converting an MPEG-2 bit stream to an H.264 bit stream.
- Resolution transcoding
This involves the conversion of coded spatial resolution — for example, converting a standard-resolution bit stream to common intermediate format (CIF) resolution for a mobile application.
This article primarily focuses on the issues and challenges associated with format transcoding from MPEG-2 to H.264. Although MPEG-2 and H.264 use similar techniques of motion compensation, transformation, quantization and entropy coding, there are several basic differences between the two standards that make the transcoding operation challenging.
Several new features available in H.264, such as multiple reference frames, smaller block shapes and spatial intra prediction, have no corresponding information in the MPEG-2 bit stream. The use of spatial prediction in I-slices in H.264 makes the transcoding of the MPEG-2 I-frame substantially more complex than the simple re-quantization techniques that have been used by the MPEG-2 rate transcoders. The approach to transcoding MPEG-2 to H.264 is expected to progressively follow three approaches, which are presented below.
Decode and re-encode
The simplest approach to transcoding is to completely decode the MPEG-2 bit stream and then re-encode it with an H.264 encoder. The decode operation can be performed either externally or as a part of the H.264 encoder. System issues, such as handling SCTE-35 digital program insertion (DPI) messages, will require that the decode and encode operations be tightly coupled.
The quality of transcoding with this simple approach will not be high. Figure 3 shows a comparison between direct encoding and transcoding. The figure shows the PSNR (a measure of mean square error between the input and decoded output) values computed at different bit rates. The PSNR numbers are obtained by averaging the results over 18 different sequences of varying content type and complexities. The top plot shows the performance of direct encoding using an H.264 encoder. The bottom plot shows the performance of transcoding where the video is originally coded with MPEG-2 at 4Mb/s, decoded and then re-encoded with the same encoder used for direct encoding. Transcoding can result in up to 20 percent loss in compression efficiency.
Similar to the previous approach, the incoming MPEG-2 stream is decoded and then re-encoded using an H.264 encoder. However, here the relevant information available from the MPEG-2 bit stream is reused.
Decode and information reuse
Although there are significant differences between MPEG-2 and H.264, including block shapes for motion compensation, block sizes for transformation and motion search ranges, there is still useful information available in the input MPEG-2 bit stream that can be exploited by the H.264 encoder to improve transcoding quality and reduce computational complexity.
Reusing the picture type (I, P or B) information from the MPEG-2 bit stream can provide substantial improvement in transcoding quality. Because MPEG-2 encoders code I- and P-pictures at a higher quality than B-pictures, better transcoding efficiency can be achieved if the H.264 encoder can align the picture type with that of the input stream.
Other information such as motion vector values and coding mode decisions can be reused to reduce complexity of transcoding. The H.264 encoder can use the quantizer values and the number of bits used to encode a given picture obtained from the input MPEG-2 stream for bit allocation and rate control decisions. Reuse of information as described here can be similar to two-pass encoding, where the results of the first pass of encoding are used to drive the decisions in the second pass.
Transform domain processing
Transform domain processing is commonly used in the MPEG-2 bit rate transcoding applications mainly to reduce computational complexity and to avoid the loss of accuracy due to repeated DCT and inverse DCT operation.
With the use of integer transforms in H.264, there is no penalty because of repeated forward and inverse transformation operations. Performing complete transcoding in the transform domain may be unrealistic because of the substantial differences between MPEG-2 and H.264. However, computational complexity reduction can be achieved in certain operations, such as the I-slice transcoding in the transform domain by combining the inverse DCT operation in MPEG-2 with the forward integer transform of H.264.
Coexistence of various coding standards, and the requirement for multiple resolutions and frame rates for new emerging applications, will drive the need for efficient, high-density transcoding. Transcoders are expected to progress from simple decode/re-encode devices to more complex integrated systems that reuse information in the input bit stream and achieve higher density by employing selective transform domain processing techniques.
Santhana Krishnamachari is vice president of advanced engineering and Kyeong Ho Yang is technical manager of video algorithms group for EGT.