Next Steps in Surround

January 9, 2012

There are many facets to broadcasting surround content. Besides choosing a technology, the station's technical staff has to become appropriately educated, suitable monitoring is needed to listen to the content, and the related infrastructure has to be suitably in place. Although many areas of the broadcasting chain can be discussed, I will cover two key areas on either end of the surround-sound broadcasting spectrum. At one end, I'll look at several important aspects of capturing/creating surround content that may not be well known. On the other end, I will touch on several important elements to ensure that surround content can be broadcast successfully (while most certainly improving the stereo and mono content).

Let's start at the tail end of the broadcast/audio chain. A critical step in enabling broadcast surround is ensuring that the facility infrastructure can support it. Do not assume that just because it is seemingly working in stereo that it is ready for surround. It is imperative that the entire broadcast facility (including digital storage systems, analog and digital audio distribution, studio-to-transmitter links (STLs), etc.) can ensure the delivery of audio content with minimal degradation.

Unfortunately, with various digital audio systems integrated with various analog components alongside prosumer equipment, this is an area frequently becoming overlooked. With storage systems, STLs and other systems that may utilize a digital audio codec of one type or another, maintaining audio integrity in a facility has become more complex. Without proper care of the integration of all these systems, audible errors in surround can and will be heard and increased troubleshooting may result.

Don't assume that the facility is surround ready without careful scrutiny over the entire audio path that the content will travel. This becomes critical with surround content as the amount of auditory masking is far less than that used for stereo content. Masking is a property of the human auditory system, where some sounds (or certain aspects of sound) can simply disappear in the presence of other sound(s) with certain characteristics.

With surround content, the sound field is expanded beyond two speakers to five speakers or more placed around the listening position. It becomes much easier for the listener to distinguish auditory information about each sound source in a surround playback system. Sounds that would be previously masked in stereo become easier to discern. Likewise distortion, codec artifacts, poorly implemented and/or aggressive audio processing and other shortfalls in a broadcast audio chain become more apparent in surround than they would with stereo content.

The first steps

To minimize the aforementioned effects, there are several key fundamentals to check for and maintain. Some of the chief areas of concern are:

No. 1 — Implement and maintain optimum system levels, including a standard reference level, headroom and signal-to-noise ratios.

Whether a station has an all-digital infrastructure or one comprised of analog and digital components, implementing and maintaining optimum system levels greatly reduces the possibilities of audible errors. There should be a system-wide established reference level, headroom, signal-to-noise ratio and unified clip level of the system. One standard practice for professional digital and analog equipment is shown in Table 1.

With the proliferation of less expensive professional equipment (e.g. prosumer), it is becoming more common to have equipment that is unable to achieve a +24dBu analog output. Therefore, some accommodations to the audio level standards may be necessary. Regardless, if you do not have established reference levels, not only will the broadcast of surround content be in question, the quality of stereo/mono content will be reduced as well.

No. 2 — Minimizing use of digital audio codecs/prevent use of multiple audio codecs.

Most digital audio codecs utilized, such as AAC or MP3, are lossy. Passing audio through more than one codec creates generation loss, degrading the audio quality. If one or more codecs are used, set the data rate of the available codec at the highest setting feasible to minimize the effects that audio codec will have to the audio. If possible, remove unnecessary use of an audio codec.

No. 3 — Appropriate selection of HD Radio data bit-rates.

Currently, the maximum data bit-rate for HD Radio is 96kb/s. When choosing the data bit-rate on HD Radio for the surround content, make the choice wisely and with careful consideration. The integrity of the surround playback field will be reduced as a lower codec data rate is used. Using 64kb/s or more and no less than 48kb/s is suggested.

No. 4 — Minimizing distortion, including inter-modulation distortion (IMD).

Distortion can occur in a variety of ways. Among others, inadequate headroom in a system component can result in the clipping of audio, producing distortion. One simplistic method to minimize distortion is to avoid reaching the last 2 or 3dB before full scale of an A-to-D converter. With the increased quality of storage and transport protocols, the need to reach digital zero to maximize audio quality is no longer necessary.

Increased distortion also occurs with heavy/aggressive use of broadcast audio processors. In surround, the more aggressive the approach, the more distortion that may have been previously masked in stereo now becomes apparent in surround. A few rules of thumb will assist in reducing IMD and processing artifacts:

  • Choose a less aggressive limiter/clipper type.

  • Minimize the use of the limiting/clipping section of the processor.

  • 6; Use look-ahead whenever possible.

  • Utilize slower release times of the compressor/limiters as much as possible.

No. 5 — Maintain equal balance of all audio channels.

Maintaining equal balance of the audio channels is required for surround. In addition, maintaining phase relationships and unified frequency response of the audio channels is also imperative. An imbalance or improper phase relationship and/or frequency loss of even one channel can result in a dramatic shift in the integrity of surround content.

Now that we have addressed these basic infrastructure requirements, we can move to the other end of the surround broadcast spectrum, choosing and creating the surround content itself.

There are many sources of pre-recorded surround content available. It is possible to pass the surround content through one of the surround sound broadcast technologies and likely have a pleasant result. However, it is unlikely the result will be pleasant at all times. This is where one of the advantages of using prerecorded content comes into play.

With each selection, either the surround content and/or the technology employed can be optimized for optimum playback performance. Much like content provided to many radio stations today, it is possible for prerecorded content to be pre-encoded appropriately in surround in an ideal production environment. Then this content can be shipped and placed on digital audio delivery systems for playback. This is the case for those who are successfully broadcasting surround content heard today on satellite and on several FM radio stations. Other advantages of utilizing prerecorded content include adding metadata for various automation systems, audio processing and/or to provide content information to listeners.

With live content, there are several other variables involved. The variables include how the live content will be captured and delivered to the broadcast station from remote locations and maintain quality. Standard ISDN connections or 128kb/s are barely enough simply due to the limited data bandwidth available as well as the utilization of another digital audio codec in the broadcast chain. Luckily, there is an ongoing development of new technologies and connectivity to assist in this process providing higher data bandwidth and advanced audio codecs to optimize the expanded data throughput. Data rates of 256kb/s begin to permit enough data throughput for surround content with some systems reaching near linear digital audio data throughput, which is the ideal scenario.

Creating surround content

With monitoring, appropriate infrastructure in place and overcoming the challenges involved with delivering surround content, the next challenge is creating content itself.

There are two primary approaches in creating surround content. The first approach is placing microphones on each instrument and mixing that instrument into the surround sound stage. This method is useful for popular music. An engineer can create a mix that has instrumentation all around the listener, essentially putting the listener “in the band.” The second approach to creating surround content is capturing the event so that it sounds like you are in the natural acoustic space in which the performance is occurring, or said another way, from an audience perspective.

There are challenges in creating accentuated content as all radio broadcast surround technology systems use some version of a downmix in their structure. (Note: A downmix is the process where multiple channels of surround audio content are converged together into a reduced number of audio channels, typically stereo, labeled Lt/Rt. Upmix is the reverse, where multiple channels of surround audio content are derived from a reduced number of audio channels.) Many of these challenges can occur from the in-the-band scenario as well; however, since they are more prevalent from the audience perspective, they will be primarily discussed from that context.

Conventional spaced omnis, ORTF or other standard stereo recording techniques may be utilized for creating surround content. However, they have one drawback: They do not utilize the center front channel of the surround field. Center images would become phantom center images just like they are in standard stereo. With most downmix algorithms, the original phantom center image may be recreated in the center channel only, sometimes with different intensity. Therefore, use of standard stereo microphone techniques should not be the primary means of capturing surround content and surround microphone methods are recommended.

Multiple methods

There are now several surround microphone arrays, methods and systems available. When choosing, pay particular attention to one detail that is often overlooked: interchannel crosstalk. To understand interchannel crosstalk, you must first understand how most surround microphone array and methods work.

With almost all the microphone arrays and methods available, there are three microphones assigned (left, center, right) to the front-left, center and front-right of the surround sound playback system. In some of these systems, the placement of these microphones is relatively close. Interchannel crosstalk arises when multiple phantom images (just like the phantom center that occurs with stereo microphone techniques) arise from the front three or more microphones.

Figure 1

Figure 1. Phantom images created in surround.
Click to enlarge image

For example, if three microphones were placed across the front of a stage and assigned to front-left, center and front-right, it is possible that a phantom image would be generated for each pairing of microphones for a single sound source on the stage. A phantom image would occur in the playback system from the front-left and center microphones. Another phantom image will occur from the front-left and front right and another from the center and front-right microphones. These three phantom images compete auditorally. This interchannel crosstalk, smears the location of the sound-source, reducing the localization of that sound source in a surround playback system. The stronger the phantom image for each pairing that competes with one another in the surround playback system, the more interchannel crosstalk.

Interchannel crosstalk may not seem to be much of an issue in discrete-channel recording, however, just as a human will have to deal with the multiple phantom images emerging out of the speakers, a surround sound broadcasting technology will have to deal with this as well. For example, when high amounts of interchannel crosstalk are applied to a downmix algorithm, it challenges the system, often producing comb-filtering (unless the system has adaptive filters to address this issue), loss of fidelity and can produce localization errors in the both the downmix and the upmix.

Furthermore, high amounts of interchannel crosstalk can produce undesirable phase relationships in downmix algorithms that can increase L-R. With some technologies, unpleasant IMD can occur in the recreation of the surround material. Some technologies can deal with these issues much better than others, but regardless of the surround broadcast technology employed, interchannel crosstalk challenges the systems, sometimes producing undesirable effects.

As seen in Figure 1, for a single sound source, there would result three different phantom images in the surround playback system.

Figure 2A Figure 2. Various stereo mic techniques: A. Decca Tree.
Figure 2B Figure 2. Various stereo mic techniques: B. Fukada Tree.

Figure 2C
Figure 2. Various stereo mic techniques: C. OCT with a Hamasaki Square.

Figure 3 Figure 3. A Hamasaki Square. Figure-eight mics are placed at each corner with their nulls facing the sound source.

Reducing phantoms

Interchannel crosstalk can occur in in-the-band recording scenarios as well, when multiple microphones pick up one sound source and are mixed into the surround field in different locations. However, this is far rarer as the intensity and time of arrival of each sound to each microphone is far greater than in an audience perspective recording situation.

To reduce interchannel crosstalk, carefully choose the surround microphone technique. The microphone arrangement of the front three microphones should be positioned such that intensity of sound to each microphone or pairing or microphones is far greater than the adjacent microphone or pairing of microphones. Test results have indicated that microphone techniques such as the Decca-Tree, Fukada Tree and Optimized Cardioid Triangle (OCT) (see Figure 2) have lower amounts interchannel crosstalk and are also more desirable to listen to.

Taking this information one step further, the microphone techniques used to capture the surround channels are important as well to reduce crosstalk between the front array of microphones and the rears. Microphones that are assigned to the rear surround channels should pick up as little of the direct sound from the stage as possible.

Therefore, it is suggested that directional microphones are used for the rear surround channels. These can be direction microphones that have their null pointed at the primary sound-source such that the primary pick up is pointed elsewhere. Omnidirectional microphones can be used, but it is important that they are immersed in acoustic reflections of the acoustic space and have very little direct sound from the sound-source.

There are other techniques to use as well, and careful study and research should be used to pay attention to the amount of cross-talk that occurs between microphones in either an in-the-band scenario or audience-perspective scenario when recording surround.

Examples of these techniques include techniques such as the Hamasaki-Square (invented by Kimio Hamasaki of the NHK Science and Research Lab), which utilizes four figure-eight microphones arranged in a square about six feet apart, placed beyond the main microphone array anywhere from 12 to 20 feet behind the front microphone array. The figure-eight microphones face the side walls of the acoustic space and have their least amount of sensitivity point (null) towards the sound-source, usually on a stage, as shown in Figure 3.

Quite a bit of time and attention will need to be spent in learning how to capture and produce surround content. It is not nearly as easy as throwing up a stereo pair of microphones on a stand and hitting record. Multiple microphones will have to be placed with care. Live audience perspective recording situations should also address concerns such as visual aesthetics. Using all-in-one solutions may not produce the most useful and desirable result, so buyer beware. There are plenty of other key factors in producing quality surround content. It is highly suggested that anyone wishing to produce surround content listen to a wide variety commercially available and discern quality differences. Becoming educated on many of the other basics of surround production is crucial.

Kosiorek is the director of recording services at the Cleveland Institute of Music.

Table 1. Standard practice for audio levels.
Reference Level = -20dBfs digital = +4dBu analog
Headroom = 20dB
Unified Clip Level = 0dbfs digital = +24dBu analog
Resource Guide

Broadcast surround system providers


MPEG Surround

Neural Audio


Receive regular news and technology updates. Sign up for our free newsletter here.