Monitoring with media fingerprinting
Oct 1, 2010 12:00 PM, By Marco Lopez
The technique can detect many potential errors.
Figure 1. A typical media fingerprint generator creates separate data for video and individual audio channels.
Select figure to enlarge.
Lowering the operating costs of television playout has become something of a mantra across the television industry, and it has been especially evident in playout monitoring. Because monitoring has traditionally been labor-intensive and also prone to human error, it has been a natural focus for many streamlining and automation projects.
The widespread adoption of multiviewers and IP-based facility monitoring with signal probing has opened up new, more efficient workflows. A single operator can now manage and monitor complex systems efficiently, with a consequent large improvement in productivity.
However, the targets for cost efficiency are getting tougher, and new approaches are being sought to take monitoring to the next level of efficiency. Looking ahead a few years, it seems that a typical target cost per channel is around $60,000 for channel ingest, preparation and monitoring. Following current trends, this indicates a requirement for channels with almost fully automated monitoring. This movement away from human eyeballs and ears for monitoring the key points of a playout channel is a big step for television, but there are pioneering technologies that offer real potential.
A key technology is media fingerprinting, which can be used with advanced electronic facility monitoring to detect and report playout problems that have been difficult to monitor effectively, such as lip-sync errors and content mismatching in multichannel environments. Before looking at how this new technology can be deployed in facilities, it's worth clarifying what is meant by media fingerprinting as it sometimes gets confused with technologies such as watermarking.
Figure 2. A simple convolution is used to look for matching patterns between two media fingerprint streams at different points in the playout process.
Select figure to enlarge.
Media fingerprinting is a way of digitally identifying video and audio content so that it can be recognized, along with any playout defects, later in time or downstream in the facility. The difference between a fingerprint and a watermark is that the fingerprint extracts properties from the content and stores or transports it separately. A fingerprint does not affect the original media content and corresponding metadata. It's like a traditional fingerprint, which does not leave a mark on the finger after the ink is wiped off. In contrast, watermarks must alter the source content permanently and in a way that devices along the signal food chain will not change, as this would render the watermark useless. Hence, a watermark is more like a tattoo — though you cannot really see or hear it — than a fingerprint.
Since a fingerprint is an additional piece of data that must be stored or transported, it must have a small data size. Other key media fingerprint attributes include resilience to typical playout processes such as downconversion to SD or PROCAMP adjustments to video and audio. Furthermore, fingerprint generation must be very light in processing load for purpose-built video devices and for PCs that search and analyze databases with thousands of fingerprints.
Fingerprint generation
Naturally, the value and strength of a media fingerprint system are dependent on the underlying creation algorithm. Currently, there are multiple proprietary technologies used for the generation of fingerprints, and these are usually level- and/or motion-based. Some use luminance characteristics, while others use transitions, edges, peaks, frequency and color characteristics.
Figure 3. Multipoint media fingerprint monitoring can be deployed across a facility to detect playout errors.
Select figure to enlarge.
A typical media fingerprint generator creates separate fingerprint data for the video and the individual audio channels to allow effective multichannel monitoring. (See Figure 1.) This data is multiplexed and can be streamed for storage or live comparison. This fingerprint generation requires minimal hardware and can be implemented on a simple monitoring-grade DA module or more complex interface cards. The fingerprint data stream is very small and represents just 0.0004 percent of a 1080i60 HD signal. This means that the fingerprint data from many HD channels can be transferred quite easily over standard IP networks.
This type of media fingerprint is highly resilient to normal television playout processes, with insensitivity to typical video level adjustments, up/downconversion and video compression. Similarly, the process is insensitive to audio bit rate reduction (compression), audio loudness control performed by gain or dynamic range adjustment, and sample rate conversion.
Media fingerprints are weakened, but not disabled, by aspect ratio changes and by the insertion of small graphics. The video processes that can cause problems are standards conversion, such as going from 50Hz to 60Hz, and prolonged periods of frozen video. On the audio side, an obvious problem is when new audio content is mixed into the original content, such as voice-overs and stings.
Figure 4. Media fingerprinting can be used with IP-based facility monitoring to watch for lip-sync errors.
Select figure to enlarge.
During fingerprint monitoring, a simple convolution engine is used to look for matching patterns between two media fingerprint streams at different points in the playout process. (See Figure 2.) At the simplest level, this comparison process can be performed by a single interfacing module. However, the process is entirely scalable, and multiple streams of fingerprint data can be correlated and analyzed using a standard PC-based platform to allow end-to-end deployment in a large facility. (See Figure 3.) It can even be used across multiple remote facilities because the fingerprint data is so small that transfers over a WAN are possible.
Once a match is found between the fingerprint streams, two timing measurements can be performed. The first is to calculate the program-to-program delay. This becomes the regular delay that is found within the signal distribution path. The second measurement is to calculate the difference between the video and audio delay. If the difference is zero, then there is no problem. However, if the difference is not zero, then there is a drift in audio with reference to the video. This drift is also known as a lip-sync error. This comparison process allows any video or audio differences between two or more signals to be quickly identified. The delay between two streams can be measured with a resolution of just +/-1ms.
Fingerprint data streams and alarms can be analyzed by the latest generation of facility monitoring systems, and operators can be immediately alerted whenever problems emerge to promote rapid fault resolution.
Content verification and lip-sync detection
One of the key applications of fingerprinting for television playout is automated content verification while a channel is played out across its distribution path, from the server to the uplink, and back via the return feeds.
Fingerprint technology is well-suited to addressing many typical playout issues, such as a backup channel not having exactly the same content as the primary channel. This kind of problem can be intercepted quickly and accurately by using fingerprint capture and monitoring along the primary and backup playout paths. For instance, by using fingerprint detection at the server and backup change-over, the facility monitoring system can quickly identify any content errors. Similarly, fingerprint detection at the distribution encoder and cable return can spot problems that have been encountered at a cable operator. Even subtle differences in content can be identified, such as missing branding graphics on the main playout chain, when they are present on the backup channel.
Continue on next page
| Want to use this article? Click here for options! |





















