Predictive audio encoding

March 1, 2003

A popular non-perceptual method for reducing the bit rate of linear pulse code modulation (PCM) encoded audio in real time is Subband Adaptive Differential Pulse Code Modulation (ADPCM). There are three commercial equipment options are available when adopting this method of compression. The full bandwidth proprietary Standard apt-X compresses 16-bit PCM, while Enhanced apt-X can additionally compress 20-bit and 24-bit PCM. There is also the ISO standard G.722 algorithm. All three methods offer modest compression ratios of 4:1.

Enhanced and Standard apt-X are packaged together on the high performance Motorola DSP56362 device with four audio channels and the capability to operate in simultaneous dual channel, full duplex stereo. The Standard version of the algorithm is also available as Soft apt-X, a multi-threaded DLL software package. The G.722 algorithm is also software-based and designed for 7kHz mono speech.

The Subband ADPCM process, executed in the time domain, is often referred to as Predictive Coding. This takes a look at the history of the levels of the incoming PCM samples and then predicts the levels of subsequent audio samples. Subband filtering, commonly used in all forms of audio data compression, divides the incoming audio into a specific number of equal width frequency bands. Adaptive means responding to level variations of the incoming audio, then dynamically adapting the step size in the encoding quantiser to compensate, thus ensuring adequate signal headroom. Differential means that only small difference signals are encoded to enable the transfer of audio information to the decoder.

A linear PCM signal is a repetitive signal and a major part of it can be readily identified, measured, removed during encoding and replaced again in the decoder. This redundancy process is achieved by comparing the actual level of a sample and a predicted level for the same sample. Then, by subtracting the two, generates a difference signal. The rest of the signal is now deemed to be redundant. The decoder is the complete inverse, the now decoded difference signal is compared once again within a prediction loop and a measured amount of data is added back again to produce an accurate reconstruction of the original PCM signal.

This redundancy process is reflected in both versions of apt-X where sequential time blocks of four PCM samples (that's 64 bits for 16-bit PCM) are being continuously filtered and encoded in four separate, equal width frequency subbands.

The next stage in the apt-X bit-rate reduction process takes advantage of two naturally occurring perceptual phenomena. First, any complex sound is made up of a fundamental frequency and a number of increasing harmonics that diminish in levels. The fundamental frequency content is invariably around 4kHz, in the lowest frequency subband; the 2nd, 3rd and 4th harmonics being distributed across the other three subbands. Second, the aural sensitivity of a normal human ear is non-linear and most sensitive around 4kHz, and then decreasing at lower frequencies and again as the frequency increases to an HF max at about 18kHz. That is for a good ear.

To reflect these phenomena in 16-bit apt-X the LF subband quantiser is set to reduce the 16-bit differential sample to 7 bits. The quantisers in the other three higher frequency subbands reduce their 16-bit samples down to 4, 3 and 2 respectively. The four reduced subband outputs, when multiplexed produce a 16-bit apt-X encoded sample, now representative of the 64 bits at the input, a reduction of 4:1. By comparison G.722 takes two PCM samples (32 bits), filters to two subbands and then allocates 6 bits and 2 bits respectively producing an output sample of 8 bits, again a reduction of 4:1.

There are a number of advantages by using predictive coding. All of the reduced bit pool is used by the audio data. In ADPCM there is no requirement for any additional housekeeping data. In Enhanced apt-X the complete encode/decode cycle inserts 2ms of delay with 48kHz sampling. This is an important parameter should full duplex working over any length of telecom circuit be a requirement. A possible requirement in audio post-production may be to pass the audio through repeated compression stages. It has been shown that apt-X outperforms other codecs in tandem and multiple coding environments. Apt-X is also robust in the presence of any network data errors. The reason is that the data errors only affect the differential signal and because the errors that appear in one subband have no influence on the operation of the other subbands. Instead of muting the audio, the algorithm will continue to deliver acceptable speech with bit errors running as low as 1 in 1,000.

ADPCM processes the entire audio signal, even those elements that would be inaudible to the human ear. The perceptual masking is left entirely to the sensitivity of each individual listener's ear. The principle being that a loud sound will mask any other quieter sound or noise element in close frequency or temporal proximity.

Wylie is technical consultant for Audio Processing Technology, Belfast.