Predicting subjective video quality

Feb 1, 2008 12:00 PM, By Kevin Ferguson And Winfried Shultz

Difference mean opinion score: A comprehensive approach to emulation the human vision system to obtain accurate and repeatable results when predicting subjective video quality rating.

             
Tools, such as the Tektronix PQA500, can be helpful when
measuring picture impairment in a compression system.

Tools, such as the Tektronix PQA500,can be helpful when measuring picture impairment in a compression system.

For a number of years, work has been performed to replace the manual process of human viewer trials with machines delivering results that highly correlate to the results obtained from ITU BT.500 trials. ITU J.144 attempted to save time and tedium by replacing humans with an algorithm that could take the place of the BT.500 human-based procedure. Although some success was achieved, more general applications remained unaddressed. The key challenges have been the complexity of the human vision system and its proper representation when feeding an electrical video data stream into the analysis engine. In recent years, a proliferation of encoding schemes, formats, resolutions and frame rates have increased the complexity of this task.

The human vision system

Objects in our surroundings usually produce reflections in the presence of light or directly produce light as in the case of most video displays. (See Figure 1.) This light or the reflections enters the human eye through the lens and pupil and triggers the receptors on the retina in the eye to produce a stimulus. The stimulus is passed through the optic nerve to the brain where the visual cortex turns the stimuli of the entire retina into a picture. The information channel to the brain (optic nerve) has a certain capacity further limiting our ability to resolve detail and process movement of the objects we are observing. Once the objects begin to move, acuity will degrade, and the human vision system is limited to processing about 30 pictures per second. Research indicates, however, that we can resolve light stimuli of a much higher frequency (temporal frequency resolution), given the proper settings other aspects of the light stimulus, such as sufficiently high average luminance level, etc.

Simplified human vision system

Figure 1. Simplified human vision system.
Click image to enlarge.

The charging process of the receptors is dependent on the duration and intensity of the light stimulus. Charging and discharging cannot occur infinitely fast. Therefore, the human vision produces visual illusion such as the dark spots in the Hermann grid. (See Figure 2.) However, in an area on the retina where this dramatic difference in a charge profile exists, a drain from low charge areas occurs. This can be seen in gray scales in Figure 3, where the horizontal gray bar in the combined picture seems to have brighter and darker areas where the contrast is high. In reality, the gray horizontal bars are identical.

In an amazing experiment, George Malcolm Stratton (1865-1957), a psychologist at the University of Berkeley, proved that the visual cortex can even account for a reversed optic, making objects appear upside down. This illustrates the importance of adaptation and, as a consequence, the relativity of the human vision system. Recently, there have been breakthroughs in human vision models specifically because of the appreciation of the importance of adaptation. One example is the international standard for color appearance models. CIECAM02, for example, was ratified in 2004. Its primary purpose is to mimic adaptation effects for color in nonvideo (static) applications.

When modelling the human vision system mathematically, it is absolutely clear that a simple linear system will have unacceptable inaccuracies. Instead, emphasis on accurately depicting the inherently nonlinear behavior of adaptation is essential.

Modelling the human vision

One way to specify the desired model behavior is to build stimuli-response pairs taken from vision science experiments in human vision science literature. (See “References” on the next page.) These then can be modelled in filter functions and other processing to account for other adaptation, masking and combined perceptual and cognitive effects. We can consider four classes of stimuli responses: transparent, linear, fixed (or stationary) nonlinear and adaptive (or dynamically nonlinear).

Transparent response class

In a transparent response system, usually no preprocessing occurs, and the measurements include actions like pixel-by-pixel subtraction. This process does not apply any weighting or filtering emulating physical limitations of the human vision system. PSNR measurements are a typical representative of this class and usually have limited correlation with results obtained from testing with human viewers. Only if the video contains stimuli closely corresponding with the transparent class, the correlation will provide meaningful results.

Linear response class

A linear response corresponds to cases where fixed or stationary linear filters are applied to the stimulus. These filters can mimic to the degree required the human vision response. Again, stimuli that strongly correlate with linear behavior can be approximated well with this approach. However, linear filtering does not take into account the human vision system's capability to discern detail in light or dark patches when the ambient light is changing. Linear filtering usually does not take Weber's law into account either. Weber's law states that a noticeable visual difference is dependent on the magnitude of the original stimulus and that the noticeable difference divided by the magnitude of the original stimulus is a constant.

Fixed or stationary nonlinear response class

Hermann’s grid

Figure 2. Hermann’s grid producing dark spot after images at cross points of the white lines between black squares leads to the Mach band effect (both retinal and neural adaptive processes account for this).

The fixed or stationary nonlinear response class corresponds to cases where two images overlayed, or more precisely superimposed, are not equal to the sum of the individual stimuli response. This means that the human vision system cannot superimpose properly certain stimuli in its response to the optic nerve and the cortex. When two sources of light are added, the human vision response for each point (or pixel) is not equal to the sum of the responses of each image alone.

Most advanced measurement techniques for picture quality combine linear and stationary nonlinear filtering to predict the responses of the human vision system. Yet even this combination of filters does not account for phenomena like flicker versus brightness, where light of a given intensity appears brighter when it is turned off and on rapidly. This class is also unable to detect the effect of perceived brightness against the adaptation to varied luminance levels nor other temporal aspects of visual illusions such as a phantom third pulse seen when two pulses are used as a stimulus.




Want to use this article?
Click here for options!
Get Copyright Clearance

Share this article

blog comments powered by Disqus

 


Current Issue

A view from the top

January 2012

Some of broadcast's brightest reveal where the industry is headed.

Read More articles...

Related Newsletter

HD Technology Update
A twice-monthly newsletter covering high definition technology through example applications.

Related Posts


Confused about the terminology in an article? Find definitions of common terms and abbreviations in Broadcast Engineering's Glossary.

 


Submit your product for our NAB coverage.

Resources

Broadcast Engineering Newsletters Broadcast Engineering Essential Guides Broadcast Engineering White Papers Broadcast Engineering Videos Broadcast Engineering Podcasts Broadcast Engineering Industry Calendar

Industry Calendar

Broadcast Engineering Glossary of Terms

Glossary

Broadcast Engineering RSS feed

RSS

Interactive Media

Broadcast Engineering Webinars Broadcast Engineering Training Broadcast Engineering Blogs Broadcast Engineering Mobile Apps Broadcast Engineering on Facebook

Facebook

Broadcast Engineering JobZone

JobZone

Broadcast Engineering BE Roll

Blog

Featured Products

A Broadcaster's Guide To Camera & Lens Technology

A Broadcaster's Guide To Camera & Lens TechnologyThis eBook provides both new and veteran shooters an in-depth understanding of the technology that lies between the camera lens and the recording medium and how to maximize a camera's performance.

File Based Technology and Workflow

File Based Technology and WorkflowFile-based technologies have replaced video tape methods for a majority of production and broadcast operations. The worlds of AV and IT are coalescing to create new methods and workflows for media

Digital Television Fundamentals

Digital Television FundamentalsThis course, written by broadcast engineer Phil Cianci, provides a basic tutorial platform on the hows and whys of ATSC digital operation.

Video Compression, Editing and Displays

Video Compression, Editing and DisplaysVideo compression, editing and displays is an in-depth tutorial on MPEG compression technology, editing MPEG content and evaluating color video monitors written by long-time video expert, trainer and writer Steve Mullen, Ph. D.

 

 

Sound Off Podcasts

Erik Moreno, co-general manager of the Mobile Content Venture

MCV racks up successes on way to bright mobile DTV future

2012 will be the year of mobile DTV. That’s the view of Erik Moreno, who along with Salil Dalvi, senior VP for Mobile Platform Development at NBC Universal, is co-general manager of the Mobile Content Venture.

Danny Wilson

OTT year in review

Hear snippets of podcast interviews done throughout 2011 with Pat McDonough of The Nielsen Company, Glen Friedman of Ideas & Solutions!, Danny Wilson of Pixelmetrix and Greg Herman of Watch TV. Pictured is Danny Wilson, Pixelmetrix.

 

Broadcast Engineering Digital Reference Guide

Browse Back Issues

Back to Top