An MPEG of a different color
Apr 1, 2004 12:00 PM, By Oliver Kunz
Low bit-rate audio coding is an enabling technology for a number of applications like digital radio, mobile multimedia applications and Internet streaming (Web radio).
The limited overall bandwidth available for a digital transmission system makes it desirable to use a low bit rate per channel to distribute the audio in the given transmission data rate. Therefore, system designers have to use highly efficient perceptual audio codecs, such as MP3 or AAC, at low bit-rates.
In Internet streaming applications, the connection bandwidth that can be established between the Web radio server and the listener's client application depends on the listener's connection to the Internet. In many cases today, people use POTS modems or ISDN lines with a fairly limited data rate; lower than the rate that would be desirable to produce an appealing audio quality by means of conventional perceptual audio codecs. Even with connections to the Internet through high-bandwidth connections such as DSL, the ever-present congestion on the Internet limits the connection bit-rate that can be used in a stable manner over a longer time period.
In mobile communications, the situation is similar to the digital radio scenario. Because the overall bandwidth available for all services in a certain geographic area (a network cell) is limited, the system operator has to take measures to allow as many users as possible in that network cell to access mobile communication services in parallel. For commercial reasons, the network operators have to ensure that they use their available spectrum as efficiently as possible by means of speech and audio codecs. Considering the effect that the advent of multimedia services has on the data rate demands in mobile communication systems, it becomes apparent that even with G3 phone technology, cellular networks will have to use perceptual codecs at a fairly low data rate.
The technical challenge
Using perceptual codecs at low bit rates, however, is not without its downside. State-of-the-art perceptual audio codecs achieve CD-quality or transparent audio quality at a bit-rate of about 128kb/s (about 12:1 compression). Below 128kb/s, the perceived audio quality of most of these codecs begins to degrade significantly. The codecs either start to reduce the audio bandwidth and to modify the stereo image, or they introduce annoying coding artifacts resulting from a shortage of bits in the attempt to represent the complete audio bandwidth. Both ways of modifying the perceived sound can be considered unacceptable above a certain level. At 64kb/s for instance, MP3 would offer an audio bandwidth of about 10kHz or introduce a fair amount of coding artifacts. Each of these factors severely affects the listening experience.
The technical solution
Spectral Band Replication (SBR) is a newer audio coding enhancement tool. It offers the possibility to improve the performance of low bit-rate audio and speech codecs by increasing the audio bandwidth at a given bit-rate or by improving coding efficiency at a given quality level.
SBR can increase the limited audio bandwidth that a conventional perceptual codec offers at low bit-rates, so that it equals or exceeds analog FM audio bandwidth (15kHz). SBR can also improve the performance of narrow-band speech codecs, offering the broadcaster speech-only channels with 12kHz audio bandwidth used, for example, in multilingual broadcasting. As most speech codecs are bandwidth limited, SBR is important not only for improving speech quality, but also for improving speech intelligibility and speech comprehension. SBR is mainly a post-process, although some pre-processing is performed in the encoder to guide the decoding process.
From a technical point of view, SBR is a method for highly efficient coding of high frequencies in audio compression algorithms. When used in conjunction with SBR, the underlying coder is only responsible for transmitting the lower part of the spectrum. The higher frequencies are generated by the SBR decoder, which is mainly a post-process following the conventional waveform decoder. Instead of transmitting the spectrum, SBR reconstructs the higher frequencies in the decoder based on an analysis of the lower frequencies transmitted in the underlying coder. To ensure an accurate reconstruction, some guidance information is transmitted in the encoded bit stream at a low data rate.
The reconstruction is efficient for harmonic as well as for noise-like components, and allows for proper shaping in the time domain as well as in the frequency domain. As a result, SBR allows full bandwidth audio coding at very low data rates, thus offering a significantly increased compression efficiency compared to the core coder.
SBR can enhance the efficiency of perceptual audio codecs by about 30 percent in the medium to low bit-rate range. The exact level of improvement that SBR can offer also depends on the underlying codec. For instance, using SBR in conjunction with MP3 can achieve a quality at 64kb/s stereo that compares to conventional MP3 at a bit rate of more than 100kb/s stereo. SBR can be used with mono and stereo as well as with multichannel audio. SBR offers maximum efficiency in the bit-rate range where the underlying codec itself is able to encode audio signals with an acceptable level of coding artifacts at a limited audio bandwidth.
Currently in the process of standardization, Enhanced AAC Plus will further reduce bit rates and increase audio quality for bandwidth-constrained channels for bit rates as low as 20kb/s to 32kb/s.
The Digital Radio Mondiale (DRM) consortium has defined a global standard for digital radio in the short- and medium-wave frequencies. These frequencies are currently used for low-quality, wide-range radio transmissions, mostly by large global broadcasters like BBC World Service, Radio France International, Voice of America and Deutsche Welle. The transmission channel characteristics and the current channel spacing, which will be maintained in the digital system for reasons of co-existence in the transition period, do not allow a high data rate, making this system a good candidate for the use of SBR. Within the DRM system, SBR is used in connection with AAC.
XM Satellite Radio began using a customized AAC Plus audio encoding algorithm with neural audio optimization in April 2002. AAC Plus combines the AAC algorithm with SBR technology. AAC Plus is commercially available for Internet streaming applications, and it is used by Telos, Orban and RealNetworks.
The audio encoding algorithm used by Ibiquity for IBOC is called HDC, which combines Ibiquity's proprietary encoder with SBR.
At the end of 2003, mobile network operators launched the first services to download songs to mobil phones, using AAC Plus. These providers include mmO2, Vodafone, and SK Telecom.
A combination of MP3 and SBR in a backwards compatible way, MP3 Pro has been integrated into the existing MP3 market. Conventional MP3 players can still render a useful output from an MP3 Pro bit stream, while MP3 Pro players can decode the added information. The performance of MP3Pro is significantly higher than that of MP3. MP3Pro at 64kb/s performs better than MP3 at 96kb/s, offering the user a convenient way to improve the storage efficiency of his portable player. MP3 Pro will also be able to improve the fidelity at 128kb/s, allowing true CD-quality storage and replay. At the lower bit rates used for streaming applications today, MP3 Pro will help to increase the audio bandwidth of the compressed signal, giving it a substantial subjective quality boost over current streaming formats.
Kunz is the VP of strategic marketing for Coding Technologies, N�rnberg, Germany.