Why is H.264 becoming so pervasive?
Apr 1, 2010 12:00 PM, By Mark Hershey
The H.264 codec is a major influence in the streaming media industry.
Optimizing H.264 for broadcast applications
With flexibility comes complexity, and H.264 is no exception. Setting up an H.264 encoder to fit your application can be as simple as selecting the default template your media encoder offers and pressing the “stream” button. Knowing what to tweak is challenging; H.264 is not a one-size-fits-all technology. One popular H.264 codec library has more than 200 configurable settings.
Fortunately, most H.264 encoding products offer a set of templates. But to get the best possible video playback experience, you will eventually need to tweak the settings under an “advanced” button in the user interface.
Here are a few of the critical configuration options that will become familiar as you embrace H.264:
-
Constant bit rate (CBR) vs. variable bit rate (VBR)
With CBR, a specified bit rate is held more or less constant, no matter the scene complexity or other factors that periodically spike bandwidth upward. This is pretty much required for streaming to handheld mobile devices because they lack bandwidth headroom and the additional CPU to receive and decode anything more complex. CBR is helpful in live Internet streaming applications using adaptive streaming. Because the player automatically switches back and forth between different streams, CBR helps keep the streams synchronized so the player switches more seamlessly at the same point in the video.CBR is not optimal for quality because it does not allow the level of compression to change dynamically with the degree of motion in the video. Conversely, VBR targets a specified bit rate but presumes additional bandwidth is available to handle spikes. Essentially, more bits are allocated in fast moving scenes and fewer in static scenes. More bits equal more bandwidth. This can be a problem for live Internet streaming but is a good choice for downloaded video.
- Macroblock size
Like other codecs, H.264 breaks down a captured video frame into individual rectangles called macroblocks. The motion compression and compensation techniques that make up the bulk of the magic of compression act on each macroblock, ultimately computing or predicting frames based on differences between the target macroblock and neighboring macroblocks. Older codecs had fixed size blocks (usually 16 × 16 pixels), but H.264 lets you select the size for your application.Smaller blocks mean more blocks per frame, which offers better overall picture quality at a significant cost in computing horsepower to sustain real-time (live) encoding. Constant encoding speed doesn't matter where no expectation to compress in real time exists (as when creating files for later playback). For live encoding applications, use the smallest size that does not result in dropped frames or other impairments caused by the inability of the compressor to keep up. Increase the block size if needed to sustain high motion content, at a cost of smoothness and faithful rendering of subtle color differences. Or, accept some degree of blockiness in the resulting video during high motion periods. For streaming to handheld devices, downscale the video ahead of compression, and specify small macroblocks. Most commercial media encoders automatically downscale for you when you select the output frame size.
- GOP structure
Group of Pictures (GOP) usually refers to how often the encoded stream is required to insert a full frame rather than continuing a series of predicted frames. Your choice can significantly impact encoder processing overhead. Most encoders have an automatic setting to detect a new full frame to scene change. However, some content, such as news desk content, has relatively few scene changes, and auto may extend the frequency of full frames out several seconds. That may be OK, but remember the player device will not start rendering a picture until it gets its first full frame of video, so it may be several seconds before a user sees your program. Forcing the structure to have a full frame at least every one-and-a-half to two seconds can be important in live streaming applications where the viewer may connect at any moment.
Mark Hershey is vice president of engineering at ViewCast.
| Want to use this article? Click here for options! |





























