Digital Audio Testing

May 1, 2007

Having worked with analog audio and equipment for many years, we have gained an expertise and comfort level that is admirable. We can hear noise and distortion and link these audible symptoms to likely causes, making the needed changes. We have our trusty tone generator and oscilloscope for isolating those more difficult challenges. We know how to measure levels, frequency response and distortion to gauge the audio performance.

Figure 1. Structure of AES digital audio. Click here to enlarge this image.

Unfortunately digital audio changes all this. The old cause and effect relationship we are familiar with is gone. Likewise, our trusty tone generator and oscilloscope offers little help in diagnosing a digital signal or equipment defect.

Transitioning from analog to digital audio analyzing and troubleshooting is not as hard as you may think. First, you need to study what is in the digital bits. Second, it is going to require some new and exciting analyzing tests and instruments so you can analyze those digital bits.

Digital audio bit by bit

An analog-to-digital converter changes the audio signal to digital values by sampling the audio level at fixed intervals of time. Sampling is like taking snapshots of the analog audio signal level over time. Sampling happens at equally separated intervals measured in the number of samples taken every second, expressed in hertz (Hz) or in thousands of hertz (kHz). Digital audio is commonly sampled at 44.1kHz or 48kHz or at doubled rates of 88.2kHz or 96kHz for professional recording.

Assigning a digital value to the audio level at each sample interval is called quantization. This requires that the amplitude range of the audio waveform be divided into level steps. A quantized binary value encoding system, Pulse Code Modulation (PCM), has been adopted for overall improved system performance. PCM quantifies linearly all quantizing intervals by means of a fixed scale over the signal amplitude range. PCM makes use of a two's complement system to distinguish positive and negative binary coded values. (See Figure 3.)

Figure 2. Clock/data/sample rate relationships in a 48kHz digital audio signal. A typical clock is 256 times the audio sample rate and the data bit-rate is 1/4 the clock frequency. Click here to enlarge this image.

The number of bits used to form the PCM digital words (bytes) that are used to represent each of the sampled audio levels can vary from eight to 24 bits. The bit word length determines the number of quantizing level steps (resolution) and the dynamic range. Each bit provides about 6dB of range. An eight-bit digital audio word length provides 48dB of dynamic range (quiet to loud audio range) while 16-bit provides 96dB and 24-bit provides 144dB.

In a digital audio system, the maximum audio level corresponds to 0dBFS (dB full scale), which is assigned the largest digital code word. Manufacturers have adopted the familiar zero VU level equal to +0dBu as a standard operating level (SOL). This level corresponds to -20dBFS, in which the digital values are well below the largest digital code word value. This provides 20dB of range for audio peaks to go above zero VU before digital clipping occurs.

PCM digital data is encoded using a second scheme called bi-phase mark coding (BPM). Bi-phase coding ensures a dc balanced data line, as each bit begins with a transition and ends with a transition. If the data bit is a “1,” a transition also occurs in the middle of the time slot. A data “0” has only the transitions at the beginning and end of the time slot and does not have a transition in the middle. Bi-phase coding doubles the data rate or frequency, as each data bit has two time intervals (clock cycles). A balanced line enables the receiver to properly detect logic high and low levels and the transition between them.

Getting your bits in a row

Some form of organization is needed so the receiver can reassemble and identify the assorted bits of information contained in a digital audio data stream. Organization involves assembling the data into blocks. Each block consists of 192 frames of audio. Each of the 192 frames can be divided into two sub- frames for two-channel audio. Each frame is produced at the digital audio sampling rate so each frame contains one digital value. In a 48kHz audio sampling rate, each frame is 20.833µs (microsecond) with each frame lasting 4ms (millisecond).

Figure 3. Pulse code modulation uses a two's complement system to distinguish positive and negative binary coded values with word lengths from eight to 24 bits. Click here to enlarge this image.
Figure 4. Jitter is variations in the transition times of the clock waveform. Click here to enlarge this image.
Figure 5. Jitter causes timing errors when the audio signal is reconstructed by the receiver. The receiver locks and regenerates a clock from the incoming digital audio signal. Click here to enlarge this image.
Figure 6. Receiver-generated clock jitter is related to transmitter (sampling) jitter, interface transition variations and transmission line noise. These effects can be accumulative. Click here to enlarge this image.

Each frame can carry two audio channels. In a two-channel mode, the samples from both channels are transmitted in consecutive sub-frames. Channel 1 is in sub-frame A and channel 2 is in sub-frame B.

In addition to the digital audio word data bits, each sub-frame contains additional data. Each sub-frame consists of 32 bits, which includes 20 or 24 bits of audio word data bits and eight bits of additional data. Each sub-frame includes bits for preamble or sync data, auxiliary data, audio data word bits, validity (V), user (U), channel status (C ) and parity (P) data bits. Considering that each sub-frame consists of 32 × 2 bits, occurring in 20.833µs (FS = 48kHz), the bit rate increases to 1,536,024 × 2 = 3,072,048 bits per second.

The first four bits of each sub-frame consist of four preamble bits or sync bits. These bits identify the start of a new audio block and each sub-frame. A “Z” sync bit arrangement marks the start of the first frame in the 192 frame block. The sync word “Y” indicates the start of every B sub-frame. The sync word “X” indicates the start of all remaining frames. The sync bit arrangement is used by a digital audio receiver to identify the start of the audio blocks and sub-frames.

Analyzing frequency accuracy

At the heart of any digital system is a clock. This is a crystal oscillator or voltage-controlled crystal oscillator circuit. The oscillator output determines the resulting audio sample rate and audio data rate. A perfect circuit would be exactly the desired frequency and each cycle of the clock waveform would be identical in duration or time.

The clock isn't a perfect circuit, as the crystal is not perfectly accurate. Crystals are rated in accuracy described by a parts-per-million (PPM) rating. This indicates the maximum number of cycles the frequency may deviate for every one million cycles or hertz. A typical crystal rating is ±20 PPM. If the crystal frequency was 1,000,000Hz (1MHz), the generated frequency would be within ±20Hz (1,000,020 to 999,980). The 20 PPM rating is additive. A crystal of 2,000,000Hz could deviate ±40Hz, while a 3,000,000Hz crystal could deviate ±60Hz and so on.

In digital audio terms, a crystal frequency of 12,288,000Hz is commonly selected. This is 256x the ideal sample rate of 48,000Hz. A 20 PPM error at this frequency calculates to an error in frequency of ±246Hz. Because this is a maximum error, one would expect typical operational errors in PPM or Hz to be much less.

In digital audio systems, some frequency error is tolerable because the clock frequency is imbedded into the audio data stream and used to recreate a matching clock frequency by subsequent digital audio equipment. However, good maintenance and troubleshooting practices should include a frequency measurement of the digital audio signal including the sample rate frequency (Fs) and clock frequency (256x Fs). Periodic measurement ensures that when trouble strikes, you know good from bad.

When multiple AES digital audio signals are created by separate clocks, differences in clock frequencies and sync timing exist. These differences present challenges to digital audio equipment designed to switch between or process multiple inputs. To produce multiple AES digital audio signals at the same frequency and timing, master clocks or digital audio reference signals (DARS) can be used to synchronize oscillators and sync timing.

Analyzing jitter

Discussion about clock frequency and timing errors would not be complete without talking about jitter. With a perfect clock square-wave each subsequent clock cycle would be identical in time, with positive and negative parts of the cycles the same duration. The clock would be a symmetrical square-wave with each of its transitions occurring in exact time increments from the previous transition.

Again, the clock is not perfect. Clock cycles may fluctuate in time with cycles being slightly shorter or longer than previous cycles. Clock positive and negative times may be slightly longer or shorter causing transitions to occur at slightly different intervals in time. These variations are called jitter.

In a digital system, it's all about timing. Consider how these timing variations can cause audio signal degradation. For example, consider a perfect jitter-free clock digitally sampling a linear rising waveform during the analog to digital conversion as shown in Figure 5. If the waveform is reconstructed by a digital-to-analog converter containing some clock jitter, the linear rising voltage is no longer linear. The digital values correctly indicate the audio level as it was sampled, but because the levels are incorrectly placed in time, the resulting waveform is distorted by the jitter component.

Jitter occurs in a digital audio system at the transmitter from the non-perfect clock or crystal oscillator circuit. This is commonly called transmitter or sampling jitter. The digital audio signal is also adversely affected on the interface transmission line, which contributes to jitter. This is commonly called interface jitter. These jitter elements are cumulative as the digital audio is transmitted and moves through a transmission line to a receiver.

Digital audio embeds the clock signal and sync transitions within the serial digital audio data stream. It is up to the receiver to regenerate an oscillator locked to the incoming digital audio. As in any digital system, the data transitions from high to low have crossover points. These transition points are used to lock and correct the oscillator frequency in the receiver. Influences on these transition points contribute to jitter within the receiver's clock.

One contributor of jitter is the data transmission line, better known as the connecting cable. The cable's capacity and frequency response characteristics can cause waveform shaping and slight DC balance shifts to the digital audio waveform. This causes slight delays or advances of the transition points along the digital waveform input to the receiver. This is interpreted at the receiver as jitter. Noise can also be induced into the transmission line, which further can shift the crossover points.

Measuring jitter is an important step to ensuring a quality digital audio signal. Jitter may be measured by an AES digital audio analyzer. Typical jitter measurements are displayed as small time errors and expressed in nano or pico seconds. Jitter errors are commonly expressed as an average RMS value to reduce the measurement effects of randomly occurring peak jitter errors. The AES/EBU standard specifies that jitter be less than ±20nS. However, it is desirable to minimize jitter to much lower levels to optimize digital sound reproduction.

Kropuenske is an application engineer with Sencore.