Editing long-GOP video
Nov 1, 2008 12:00 PM, By Steve Mullen
Perform MPEG export using smart GOP splicing.
Figure 1. In this example, the four frames shown in red have been trimmed from a GOP. This causes the I-frames (in orange) to be closer together. When I-frames occur too frequently, the stream data rate can become too great.
When HDV was introduced, editors quickly discovered that when they exported a production back to HDV tape, the process often took hours. Many saw long export times as a reason to avoid long-GOP formats. At the same time, some marketing campaigns promoted the idea that long-GOP MPEG-2 was difficult to decode, making editing cumbersome. Also, so-called experts advised editors new to HD that MPEG-2 should be decoded in the VTR, sent via HD-SDI and stored as uncompressed video. Or, if that weren't possible, during capture, it should be transcoded to a better 4:2:2 codec.
Folks were also warned that if they edited long-GOP natively, the use of FX would result in poor quality because renders would be re-encoded to MPEG-2. And, worse, repeated re-encodings would further destroy quality.
The result of these warnings was a fear of native long-GOP editing. Thankfully, as NLEs were enhanced to support various types of MPEG-2 and more editors came to understand what really went on inside their NLE, this fear has lessened. (See MPEG-2 editing myths.)
There remains, however, the issue of long export times even with a cuts-only MPEG-2 timeline. The reason for the delay is that when a clip is trimmed during editing, the beginning and/or end of each clip likely has its GOP structure broken.
To obtain a perfect GOP series during MPEG-2 output, starting at the first frame in a timeline, every GOP is decoded to a sequence of YCrCb frames. Then, each series of six, 12 or 15 YCrCb frames are encoded into GOPs.
Smart GOP splicing
When you watch ATSC HD MPEG-2, transport stream sources are real-time spliced with frame accuracy. MPEG-2 data streams are spliced by shortening the GOP of the last clip (called the outgoing GOP), shortening the GOP of the next clip (called the incoming GOP) or shortening both GOPs.
Because MPEG-2 allows shorter than six-, 12- or 15-frame GOPs, it would seem the task is a simple one. Figure 1 shows an example where four frames (in red) have been trimmed from a GOP. While this appears simple, look again at the generated series of three GOPs. You will note the I-frames (in orange) have moved closer together.
I-frames add relative to P- and B-frames a huge quantity of data to the data stream. Normally, these I-frame peaks are smoothed by the P- and B-frames that naturally occur before the next I-frame. However, if I-frames occur too frequently, the stream data rate can become too great.
One way to prevent this error, as described in a Sarnoff patent on a broadcast MPEG-2 splicer, is to adjust the levels between the from-stream and to-stream such that the resulting spliced transport stream will not suffer overflow, underflow or other undesirable decoder buffer memory behavior. The goal is to decode and encode only GOPs that lie on a splice boundary. The re-encode is controlled by a feedback loop that adjusts encoder compression based upon the current data rate.
Feedback control when splicing GOPs is not only important for HDV that uses CBR encoding, it is also important for XDCAM HD, XDCAM EX and XDCAM HD 422, as well as AVCHD all of which use VBR encoding. The feedback-controlled process limiting decoding and encoding to only those GOPs that lie on splice boundaries is often called smart GOP splicing.
Simulation of smart GOP splicing
Table 2. A simulation of HD-1 (720p30 HDV) GOP splicing after removing two frames
Select image to enlarge.
To learn more about smart GOP splicing, I created a simple simulation of HD-1 (720p30 HDV) GOP splicing. In Table 1, the dark and pale blue cells represent two untrimmed, six-frame, closed GOPs. The left-most yellow cells represent the last B-frame from the GOP preceding the outgoing GOP. The right-most yellow cells represent the first I-frame of the GOP following the incoming GOP.
Notation within cells indicates that each I-frame is 1.44MB; each P-frame is one-half (I/2) of an I-frame (0.72MB); and each B-frame is one-fourth (I/4) of an I-frame (0.36MB). Therefore, the total data in one GOP is 3.6MB. The bit rate for five GOPs (1 second) is 18Mb/s the video data rate used by 720p30.
| Want to use this article? Click here for options! |


















