Format converters have been with us since the 1950s, when the NTSC, PAL and SECAM video formats emerged and pervaded the industrialized world. With the recent advent of DTV, the number of video formats has increased greatly, and so has the need for format converters. These devices perform many functions: upconversion (from SD to HD), downconversion (from HD to SD) or crossconversion (SD to SD or HD to HD). There are several processes common to all format conversions: de-interlacing (for interlaced inputs), rate conversion, image re-scaling, color-space conversion and metadata handling. But different conversions emphasize different processes. Let's look at the different types of program material that a converter might see.
Program material can be categorized according to the scan mode it employs. There are three main types of scan modes: interlaced, segmented frame and progressive.
Figure 1. In this representation of the interlace raster, an object comprising two adjacent lines in a field moves vertically from a position in Field #1 (shown in yellow) to a different position in Field #2 (shown in red). Click here to see an enlarged diagram.
Two things identify interlaced (I) material. First, the raster has an interlaced structure. Second, objects can move at the field rate. Figure 1 is a representation of the interlaced raster showing an object moving at the field rate.
Segmented frame (sF) material also has an interlaced raster, but objects can only move at frame rate. sF also is known as progressive segmented frame (pSF) and 2:2 film material in 50Hz areas. The film industry has used the sF concept for many years — where the film is played at 25fps, but is interlace scanned at 50 fields per second by a telecine device. Figure 2 is a representation of the segmented-frame raster showing an object moving at the frame rate.
Progressive (P) material is contained in a true progressive raster. Examples include 720p/59.94 and 1080p/24. Figure 3 is a representation of the progressive raster showing an object moving at the frame rate.
Another common format is 2:3 motion profile, but it is considered to be a derivative of the sF format because it is frame-based material with bonus repeat fields thrown in.
A keen sense of algorithm
To convert incoming video signals properly, the converter must apply the optimum conversion algorithms. Therefore, it must properly identify the incoming material. Format converters usually make strenuous efforts to properly pair up sF or 2:3 input fields into frames before applying the format-conversion filters. There are several reasons for this. First, customers normally want to maintain the integrity of the input frame rate as close as possible to their original. Second, treating the input as a progressive image allows the converter to use different types of vertical filters. These filters help maintain resolution while reducing judder and aliasing at the output. Finally, since progressive images don't have moving objects between frames, they don't have motion artifacts.
Figure 2. This diagram represents the rasters of two consecutive segmented frames. The moving object comprises three lines within a frame — two from one field and one from the next field in the frame. The object moves vertically from a position in Frame 1 to a different position in Frame 2. Click here to see an enlarged diagram.
To properly identify the incoming signals, the converter must analyze the motion profile of objects within the program. For example, a 1080i signal at 59.94Hz might be 1080/59.94 interlaced, 1080/29.97 sF or 1080/23.98 with a 2:3 motion profile. Typically, the converter analyzes the signal by looking at differences between incoming fields and frames, and trying to identify movement. The converter can use any available motion vectors in this process.
Once the converter has analyzed the incoming signal and identified its motion profile, it selects the required conversion algorithm. This selection also depends on the desired output format. For example, if the user calls for an sF format output, the converter must strictly ensure that all paired fields for its output constitute a single frame and no objects move within the frame.
Coping with motion
If the input signal is interlaced, you must use a converter with special de-interlacing filters, regardless of the desired output format. For the conversion process itself, you can choose a linear, motion-adaptive or motion-compensated converter, depending on your requirements and your budget. Linear converters are the least sophisticated and the least expensive. Motion-adaptive converters are more sophisticated and more expensive. Motion-compensation converters are the most sophisticated and the most expensive. If the conversion requires little temporal interpolation, linear conversion can provide acceptable results. Conversions involving a significant amount of temporal conversion (for example, 50- to 60Hz) are best performed with a fully motion-compensated converter. Note that the design of the converter's vertical temporal filter can be complex, and it is critical to picture quality. Many converters have a single field filter especially for downconversion. For manufacturers, this is relatively simple to implement. The downside is that, to avoid aliasing, it limits the available vertical resolution. A small modification to the temporal processing allows the converter to perform video-to-sF conversion without difficulty.
Figure 3. This diagram represents the rasters of three consecutive progressive frames overlaid on top of one another. The moving object (comprising two adjacent lines) moves vertically from one frame to the next. Click here to see an enlarged diagram.
As mentioned above, a single frame filter is sufficient for sF material if the converter can correctly identify the incoming motion profile. These converters are simpler to design than interlace filters, yet they give better results on frame-based material. But there are two main sources of difficulty for these converters. First, identifying the motion profile can be tricky if, for example, there is little motion in the scene or there is noise or a compression signature that masks the underlying motion. Incorrectly paired frames will produce outputs with conversion artifacts. Typically these take the form of a vertical high-frequency banding sometimes known as “Venetian blinds.” (See Figure 4.) Second, some conversions are inherently difficult. For example, converting between 24sF and 25sF (in either direction) can give unsatisfactory results. If the material is treated as sF, then it's necessary to do a frame repeat once per second, which is disturbing for the viewer.
Figure 4. A converter with a single frame filter can have difficulty pairing frames, resulting in conversion artifacts that typically take the form of a high-vertical frequency banding sometimes known as “Venetian blinds.” This effect can be seen here on horizontally scrolling SMPTE bars. Click here to see an enlarged diagram.
The alternative is to treat the material as interlace and temporally interpolate it. The low-beat frequency between the input and output makes linear conversion difficult. A better solution in terms of video quality is to play the source material back at 25Hz and live with the program-duration change. However, the audio may require pitch shifting for off-speed playback.
The format converter also will need to handle 2:3 motion and either be able to remove the repeat fields (for 60Hz-to-24Hz conversions) or to insert repeats (for 24- to 60Hz). Sophisticated converters also can repair material with broken 2:3 sequence. This can be important if the program is to be compressed downstream because the encoder can exploit the redundancy. This does require the motion-profile analyzer to be able to react instantaneously to sequence changes because any internal sequence flywheel will cause fields to be incorrectly paired until it is reset.
Up, down and sideways
Conversions between 50Hz and 60Hz dominate SD crossconversions. These conversions require a great deal of temporal interpolation, and a fully motion-compensated format converter produces the best results. Some engineers argue that aspect-ratio conversion is actually a format conversion because it involves changing the number of active lines or pixels. Since this process doesn't require a change of frame rate, a linear converter usually gives perfectly acceptable results.
During upconversion, it is important that the converter maintain as much resolution as possible. It should also provide the user with controls to enhance the picture to make the signals appear subjectively as sharp as possible. The user should also be able to reduce any defects in the incoming signal (such as noise) because upconversion tends to make noise more visible.
During downconversion, it is important that the converter reduce the resolution to prevent aliasing, but do so in a way that minimizes the artifacts caused by filtering. Filtering in the downconverter can be more closely controlled than filtering in an SD-originating chain (the camera lens, CCD processing, etc). This means that it is possible to produce sharper images by using downconversion than by using SD-originated material while suppressing alias signals. So, to blend SD-originated and downconverted material seamlessly, the converter should offer enhancement controls to soften the downconverted image.
Since there are many HD standards, HD-to-HD crossconversion can take many different forms. For example, converting 1080i at 59.94Hz to 720p at 59.94Hz does not require any temporal interpolation, whereas converting 1080i at 59.94Hz to 1080i at 50Hz requires substantial temporal interpolation. Thus, the most appropriate conversion method will vary with the conversion and, as described above, with the type of video being processed.
Audio and metadata
Of course, video is only one element of a complete program. There also is associated audio and metadata, such as time code and closed captioning. Considering audio first, the DTV converter must be able to extract embedded audio from an incoming signal, or to accept a separate audio feed (e.g., AES/EBU digital audio) and synchronize it to the video output. The audio usually will be automatically delayed to match the processing delay of the format converter and then re-embedded in the output video signal and sent to separate audio outputs. It is convenient if the converter has audio delay controls to allow the user to compensate for any other disparities in the audio/video paths. Usually, the synchronization requires audio-rate conversion, but if the incoming audio is compressed (e.g., Dolby E), then it cannot be rate-converted because this will corrupt the compressed data. Therefore, the converter must allow the user to turn off the audio rate conversion. If the input and output audio clocks are locked together, the user still can pass the audio through the format converter. This is easily arranged, if you lock the audio source and the format converter to a common reference.
Time code is metadata that must pass through the format converter. The way the converter handles time code varies with the type of format conversion. If the input and output frame rates are the same, then the converter can delay the incoming time code to match its delay and re-insert it at the output. But, if the input and output frame rates differ, the converter must employ more complex methods involving internal time-code generation and synchronization.
Some converters link the video and time-code processing to provide a powerful tool. For example, if a format conversion involves outputting a continuous 2:3 sequence, the converter can lock the 2:3 sequence at the output to the input time code. Thus, any input frame can be assigned to be, say, the A-frame output. The same input time code can be linked to a reset of the output time code. Some converters can arrange for the first frame of program to emerge as an A-frame with on-the-hour time code preceded by a minute of continuous time code and 2:3 sequence.
Flexibility for the future
At a time when digital and HD television are growing in popularity and DVD is leaving VHS in the dust, the importance of the DTV format converter is more evident than ever. Also, broadcasters and production houses are finding that converters that perform only one type of upconversion, downconversion or crossconversion are not necessarily the answer anymore. Manufacturers are addressing this by introducing a more flexible format converter to perform a combination of conversions. For many users, the ideal format converter is a universal unit that can convert SD or HD, in any combination. Universal converters with further capabilities for aspect-ratio conversion, comprehensive audio processing and time-code conversion are finding their way into the market. Armed with such devices, the broadcast industry is well equipped to face the challenges of digital television headed in its direction.
Steve Dabner is a design engineer at Snell & Wilcox.