To retain the audio as a digital signal, each digital sample word or quantized value is sequentially output and moved along the transmission line or cable. This produces a stream of digital bits, word by word or sample by sample. It is important to realize that for each sample interval an 8-bit to 24-bit digital word length is created. These are serially assembled and output from a A/D converter. In reality, the number of bits per second (b/s) is equal to the sample frequency, multiplied by the digital word length.
The amount of data created by digital audio is quite large. Sixteen bits per sample at 44.1kHz creates 705,600 bits per second. To convert bits to bytes, divide by eight (8 bits =1 byte). A 24-bit digital audio sample with a 96kHz sample rate produces a bit-rate of 2,304,000 bits per second. If two-channel audio is produced and assembled together onto a common transmission line, the data rate doubles.
Without modification, a PCM digital audio stream can be difficult to receive reliably. If all the bits are set to ones or zeros for a period of time, the signal is essentially a dc voltage. A dc signal cannot be passed reliably by some circuits, and a prolonged dc level causes a shift or offset on the digital transmission line. A lack of digital transitions on the line would also make locking a receive clock or recognizing sync transitions difficult for the receiver.
To resolve this potential problem, the PCM digital data is encoded using another scheme called bi-phase mark coding (BPM). Bi-phase coding ensures that a dc shift doesn't occur on the line, by maintaining a dc balance. A dc balance means the on-time (highs) must equal the off-time (lows) resulting in a mean or average of zero volts. Bi-phase coding ensures balance by producing continuous and balanced transitions on the data line.
Fig. 4. Bi-phase coding ensures balance by producing continuous and balanced transitions on the data line.
With bi-phase coding, each bit as a time slot begins with a transition and ends with a transition. If the data bit is a 1, a transition occurs in the middle of the time slot, in addition to the transitions at the beginning and end of the time slot. A data 0 has only the transitions at the beginning and end of the time slot and does not have a transition in the middle. This insures a transition and voltage balance no matter if there is a logic string of zeros or ones. Fig. 4 illustrates this. With regular transitions, the signal is a balanced ac signal in which a receiver can easily recover the clock rate.
Note that with bi-phase coding, the clock frequency is two times the audio data bit rate. Every audio bit is represented as two logical states when bi-phase coded. Each audio bit is divided into two time intervals or cells per data bit.
The audio data words are assembled and transmitted serially. Some form of organization is needed so the receiver can reassemble and identify the assorted bits of information in the data stream. Organization involves assembling the data into blocks. Each block consists of 192 frames of audio. Each of the 192 frames can be divided into two sub-frames for two-channel audio. Each frame is produced at the digital audio sampling rate. In a 48kHz audio sampling rate, each frame is 20.833µs with each frame lasting 4ms.
Fig. 5. The AES-3 data structure.
Each frame can carry two audio channels. In a two-channel mode, the samples from both channels are transmitted in consecutive sub-frames. Channel 1 is in sub-frame A and channel 2 is in sub-frame B. In stereo mode, the interface is used to transmit stereo audio with both channels simultaneously sampled. The left audio is in the A channel sub-frame and the right audio is contained in the B sub-frame. Fig. 5 shows the AES-3 data structure.
Fig. 6. The structure of the data sub-frames
In addition to the digital audio word data bits, each sub-frame contains additional data. Each sub-frame consists of 32 bits, which includes 20 or 24 bits of audio word data bits and 8 bits of additional data. Each sub-frame includes bits for preamble or sync data, auxiliary data, audio data word bits, validity (V), user (U), Channel status (C) and Parity (P) data bits. Fig. 6 shows the sub-frame structure.
Considering that each sub-frame consists of 32 × 2-bit, occurring in 20.833µs (FS = 48kHz), the bit-rate increases to 1,536,024 × 2 = 3,072,048b/s.
For each sample, two 32-bit words are transmitted, which results in a bit-rate of 2.8224Mb/s at 44.1kHz sampling rate or 3.072 Mb/s at 48kHz sampling rate.
Fig. 7. Detail of the sub-frame preamble
The first four bits of each sub-frame consists of four preamble bits. The preamble bits may be called sync words, as they identify the start of a new audio block and each sub-frame. A Z sync bit arrangement marks the start of the first frame in the 192-frame block. The sync word Y indicates the start of every B sub-frame. The sync word X indicates the start of all remaining frames. The bit patterns are shown in Fig. 7.
The preamble has a distinctive data pattern that actually is not in compliance with the bi-phase coding rules. The first bi-phase coding violation occurs during the initial portion of the preamble marking the start of a frame. The initial portion lacks a normal bi-phase transition. The remaining preamble transitions identify the word type as Z, Y or X. This purposeful violation of the bi-phase coding rule allows a digital audio receiver to identify that start of the audio blocks and sub-frames. Table 1 shows the details of the preamble.
It should be noted that the bi-phase rule breaking is by design and its effects do not cause any problems. Also by design, the bi-phase coding may cause each bit of the preamble or each of the 32-bit in the sub-frame to be opposite in phase.
The four bits following the preamble may be used as part of the main digital audio word or can be used for an additional audio signal known as auxiliary audio data. The use of auxiliary audio is rare. However, one application is for voice control communication. If the auxiliary bits are used for a special application, the following audio data word length is limited to 20 bits. The auxiliary data bits, if not used for a special application use, may be used to add bits to the audio data word length, extending the word length to 24 bits. Note the location of the auxiliary data bits in Fig. 6.