Managing lip sync

Mar 1, 2009 12:00 PM, By Aldo Cugnini

             

This is not the first time that the subject of A/V sync, or lip sync, has been covered in this column, nor will it be the last. While some industry organizations continue to study the issue, and a handful of products exist that either measure or control A/V sync, progress is slow in combating the problem. This month, we'll look at some of the lesser-understood technical factors contributing to the problem.

To recap the issue, correct A/V sync is necessary for program delivery so that the presentation retains a natural appearance. Studies have shown that a mismatch is detectable when the sound leads the video by more than 45ms or lags the video by more than 125ms. Various recommendations exist that put tighter bounds on acceptable performance. The ATSC, for example, recommends that the sound program should never lead the video program by more than 15ms and should never lag the video program by more than 45ms (±15). But state-of-the art systems and products are not yet at the point where this recommendation is always met.

Compression complicates the problem

Audio and video will be differentially delayed when passing through different equipment (or improperly designed equipment). These differences in routing audio and video signals can create an A/V sync problem, especially when the delays change over time.

Figure 1. Video and audio streams have different frame-size characteristics.

Figure 1. Video and audio streams have different frame-size characteristics.
Click to enlarge

In addition to the problem of independent signal paths and processing, compression adds another variable to A/V sync mismatch. Not only are video and audio signals compressed using different algorithms, but more importantly, the differential delay between the compression paths is not constant in parts of the system. This is illustrated in Figure 1, together with the program clock reference (PCR) synchronizing element.

MPEG video compression, like most compression systems, uses different types of frames, resulting in different amounts of data for each frame in the coded bit stream. While the overall bit rate for such a system is constant (when using constant bit rate encoding), the number of coded bits per second varies around a target rate and is smoothed by a buffer.

However, the compressed audio does have a constant number of bits per second in most transmission systems. This means that the video and audio frames never exactly line up, and therefore must rely on a time stamping mechanism in order to reproduce the correct A/V sync. MPEG provides a PCR to accomplish this, which is a sample of the master clock that is used in the compression system. By generating the video and audio clocks from this master clock, and then transmitting the PCR at frequent intervals, the decoder can correctly resynthesize the clocks necessary to maintain synchronization.

The video and audio streams each contain a recurring presentation time stamp (PTS) that indicates when each video and audio “presentation unit” should be presented to the decoder. With a fixed decoding time for each process, this then establishes the correct presentation time of video and audio to the viewer/listener.

However, there exists the possibility that receivers (decoders) do not process these time stamps correctly, depending on how the video decoder buffer is managed. As we saw previously, the bit stream data rate varies from frame to frame. This requires a buffer in order to properly decode the video, and an appropriate algorithm to manage the buffer. In MPEG, this is known as the video buffer verifier (VBV), a model that is used in the encoder to ensure that there is never an overflow or underflow condition.

Figure 2. The Video Buffer allows the decoder to process variable-sized frames.

Figure 2. The Video Buffer allows the decoder to process variable-sized frames.
Click to enlarge

This is shown in Figure 2 for a fictitious seven-frame stream, with the fullness of the decoding video buffer as a function of time. Bits enter the buffer and then are removed (decoded) starting at frame #0 in the graph. From that point forward, bits must be removed at the correct frame rate to ensure proper video display. (The model assumes that all bits from each frame are removed instantaneously. This is valid for the sake of buffer management, given actual hardware architectures and the fact that any practical delay is inconsequential to the action of the buffer.) If the buffer should overflow or underflow, the video would either freeze or jump ahead, causing a noticeable disruption.

The parameter VBV delay specifies the duration of time that the first byte of coded video data remains in the video buffer (to the left of zero in this example), to start the filling process. While this parameter can be specified in the bit stream, most decoders ignore it, and regenerate the buffer timing from the PCR and PTS data — and herein comes the potential for problems.




Want to use this article?
Click here for options!
Get Copyright Clearance

Share this article

blog comments powered by Disqus

 


Current Issue

A view from the top

January 2012

Some of broadcast's brightest reveal where the industry is headed.

Read More articles...

Related Newsletter

Transition to Digital
A twice per month tutorial on digital technology.

Confused about the terminology in an article? Find definitions of common terms and abbreviations in Broadcast Engineering's Glossary.

 


Submit your product for our NAB coverage.

Resources

Broadcast Engineering Newsletters Broadcast Engineering Essential Guides Broadcast Engineering White Papers Broadcast Engineering Videos Broadcast Engineering Podcasts Broadcast Engineering Industry Calendar

Industry Calendar

Broadcast Engineering Glossary of Terms

Glossary

Broadcast Engineering RSS feed

RSS

Interactive Media

Broadcast Engineering Webinars Broadcast Engineering Training Broadcast Engineering Blogs Broadcast Engineering Mobile Apps Broadcast Engineering on Facebook

Facebook

Broadcast Engineering JobZone

JobZone

Broadcast Engineering BE Roll

Blog

Featured Products

A Broadcaster's Guide To Camera & Lens Technology

A Broadcaster's Guide To Camera & Lens TechnologyThis eBook provides both new and veteran shooters an in-depth understanding of the technology that lies between the camera lens and the recording medium and how to maximize a camera's performance.

File Based Technology and Workflow

File Based Technology and WorkflowFile-based technologies have replaced video tape methods for a majority of production and broadcast operations. The worlds of AV and IT are coalescing to create new methods and workflows for media

Digital Television Fundamentals

Digital Television FundamentalsThis course, written by broadcast engineer Phil Cianci, provides a basic tutorial platform on the hows and whys of ATSC digital operation.

Video Compression, Editing and Displays

Video Compression, Editing and DisplaysVideo compression, editing and displays is an in-depth tutorial on MPEG compression technology, editing MPEG content and evaluating color video monitors written by long-time video expert, trainer and writer Steve Mullen, Ph. D.

 

 

Sound Off Podcasts

Erik Moreno, co-general manager of the Mobile Content Venture

MCV racks up successes on way to bright mobile DTV future

2012 will be the year of mobile DTV. That’s the view of Erik Moreno, who along with Salil Dalvi, senior VP for Mobile Platform Development at NBC Universal, is co-general manager of the Mobile Content Venture.

Danny Wilson

OTT year in review

Hear snippets of podcast interviews done throughout 2011 with Pat McDonough of The Nielsen Company, Glen Friedman of Ideas & Solutions!, Danny Wilson of Pixelmetrix and Greg Herman of Watch TV. Pictured is Danny Wilson, Pixelmetrix.

 

Broadcast Engineering Digital Reference Guide

Browse Back Issues

Back to Top