Get the maximum performance and quality from your digital audio interfaces.
When Sony and Philips introduced the compact disc digital audio medium in the early '80s, it was a significant successor to the vinyl record and Philips compact cassette. However, industry players were in no hurry to provide a digital interface for the medium, as that would have enabled direct access to the digital data and would have too easily allowed illegal copying and pirating.
Nonetheless, once digital storage of audio became feasible, an efficient means was needed to convey the signal between devices. Sony developed SDIF-2 (Sony digital interface), which used three coaxial cables carrying the left and right channels and a Word Clock. However, SDIF-2 was cumbersome and limited in cable length and data speed.
The AES formed a working group to design a better interface. This group created the first professional digital audio interface standard, AES3, later designated AES3-1985, and subsequently updated. The standard was also ratified by ANSI, EBU and EIA-J, with one result being that the interface is sometimes referred to as AES/EBU.
AES3 uses either 20- or 24-bit sample words, allowing for the quantization of between 16- and 24-bit samples. While various sampling frequencies of between 22.05kHz and 192kHz can be specified, the most frequently used rates are 44.1kHz for CD audio and 48kHz for professional audio. An alignment level can also be specified.
While one or two channels can be carried in the interface, up to 16 channel numbers can be defined, aiding in the identification of multichannel bundles of signals. The samples are organized into alternating left- and right-channel subframes, with each pair forming a frame. And 192 such frames compose a block. A 32-bit subframe with its various components and bit positions is shown in Figure 1.
The preamble is used to identify the start of a block, the start of the left-channel subframe or the start of the right-channel subframe. The audio sample word is carried next, followed by four data bits. When less than 21-bit audio is used, the auxiliary sample bits can be used for other applications, such as carrying talkback or cueing audio. The other parts of the subframe include the:
The channel status information is carried as a serial string of channel status bits from each subframe. With 192 pairs of subframes per block, the channel status information is organized as pairs of 192-bit blocks, subdivided into 24 bytes. Information that can be conveyed uniquely for each channel includes sampling frequency, channel numbers, reference signals, pre-emphasis and options such as the use of nonlinear pulse code modulation (PCM). SMPTE 340M, for instance, specifies a method for transmitting AC-3 compressed audio over the interface. A cyclic redundancy check character (CRCC) can also be transmitted to test valid reception of the entire channel status data block.
There can be advantages to locking the audio and video clocks, such as for editing, especially when the audio and video programs are related. Although digital audio equipment may provide an analog video input, it is usually better to synchronize both the audio and the video to a single higher-frequency source, such as a 10MHz master reference. This is because the former solution requires a synchronization circuit that will introduce some jitter into the signal, especially because the video itself may already have some jitter. To accommodate possible clock differences between 59.94Hz and 60Hz video processing, AES3 also provides a flag to indicate whether the audio sample rate indicated in the channel status should be multiplied by 1000/1001.
Because the AES3 signal has a bandwidth of several megahertz, some users may be tempted to use video distribution amplifiers to fan out a signal. While this may work in some applications, a large proportion of video signal processing equipment uses clamping or synchronization circuits that rely on the black level, sync and other components within video for routine operation. So, user beware!
From a physical standpoint, AES3 uses balanced 110Ω lines and XLR connectors, with a nominal signal voltage between 2V and 7V peak-to-peak. Coaxial 75Ω lines are also sometimes used, as well as cable bundles (or ribbons) carrying up to 16 lines, terminating in 50-pin subminiature-D connectors. The modulation used is Biphase Mark Code, which provides various features, including:
Clock recovery is achieved easily.
The DC component (and hence the power transmitted) is minimized.
The interface is insensitive to polarity reversals.
In order to provide synchronization within one audio sample, the preambles present a unique sequence (which violate the Biphase Mark Code) but nonetheless are DC-free and provide clock recovery.
A consumer version of AES3 — called S/PDIF, for Sony/Philips Digital Interface Format (more formally known as IEC 958 type II, part of IEC-60958) — is also widely used. Essentially identical to AES3 at the protocol level, the interface uses consumer-friendly RCA jacks and coaxial cable. Despite the primary intent of consumer use, the interface also appears on some professional equipment.
Although specified for use with 75Ω cables, many consumers use “plain vanilla” audio cables for this purpose, which, over short distances, generally work fine. Optical TOSLINK connectors are also sometimes used.
One key difference between the AES3 and S/PDIF protocols is the channel status information. The format (location) of some information is different, and AES3 carries much more information. Thus, an interconnection between the two different interfaces could cause problems, if certain equipment needs specific status information to function properly.
One flag that is carried within S/PDIF but not AES3 is the Serial Copy Management System (SCMS) copy protection info. An early attempt at digital rights management, it is not clear that all sources correctly set the flag. Worse still, the system could be defeated by surreptitiously recoding the bit stream information.
When the availability of multichannel digital audio recorders made multiple circuit cables impractical, the proprietary Alesis Digital Audio Tape (ADAT) format was introduced in 1991 to allow the transfer of up to eight channels of audio over a fiber-optic interface. This has since been superseded by AES10 (or MADI, Multichannel Audio Digital Interface), which supports serial digital transmission of 28, 56, or 64 channels over coaxial cable or fiber-optic lines, with sampling rates of up to 96kHz and resolution of up to 24 bits per channel. The link to the IT world has also been established with AES47, which specifies a method for packing AES3 streams over Asynchronous Transfer Mode (ATM) networks.
It's also worth mentioning Musical Instrument Digital Interface (MIDI) for broadcast operations. MIDI is essentially a protocol that allows electronic musical instruments to exchange performance information. (For example, an MIDI command could be “Start_Playing Clarinet_Sound, B-flat_above_Middle_C, 50%_Volume.”) Thus, MIDI is described more appropriately as a control interface, rather than an audio interface.
However, two subsets of MIDI are useful to know about. One is MIDI Machine Control, which provides transport commands for controlling recording devices such as multitrack tape recorders. The other is MIDI Time Code, which embeds SMPTE time code information in a MIDI stream. While the MIDI specification calls for the use of 5-pin DIN connectors, nonstandard connectors are occasionally used on equipment.
The idiosyncrasies of audio interfaces, while somewhat more constrained than that of video, still merit attention to maintain the highest quality plant. Knowing the way these interfaces work can help ensure a trouble-free installation, as well as provide ideas for tracking down an audio problem — and perhaps, one might even implement some useful novel features.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to: email@example.com