Audio reproduction

After dealing with most of the stuff needed to understand and produce sound, we can begin to tackle the final part of the audio signal chain. The way we choose to reproduce audio is actually the most crucial part of the sonic experience, and regrettably the one with most underserved hype and sheer misconception surrounding it. Hence, audiophiles. We’ll once again aim at a common sense understanding of the topic.

Common requirements for quality reproduction

When talking about audio reproduction, we need to take a look at the requirements we place on audio transmission, first. Most of these are both well established and commonly agreed upon. Mostly what varies are the methods we use to establish these goals and the specific requirements of a given audio application, of which there are quite a number. This section is about the higher level goals, applicable to all high quality audio applications.

First, any audio signal chain needs to be as linear as possible to avoid the formation of perceptible distortion products. In the reproduction end this is a formidable goal, one which is not often met. The reason is that while in the electrical domain perceptual linearity is quite easy to guarantee (even with analog equipment), transduction from and into sound is considerably more complex. And especially so when considered in combination with all the various further requirements we might want to place on our equipment. More on this in the section on loudspeakers.

Second, any audio equipment needs to preserve the frequency composition of sound as well as possible. Ideally we want the signal chain, from studio to the listener’s ears to have a flat frequency response. Again, this is easy enough to guarantee while working solely in the electronic domain, but usually we run into some trouble when we have to transduce the signals into actual sound. Again, the section on loudspeakers gives more details. But even more than the speakers, we have to worry about the acoustics of the space we intend to be listening the sound in. Since sensible rooms rarely produce discernible nonlinear components into sound, the room frequency response is the main concern as regards hifi audio. This topic will be briefly touched in a later section although a full treatment would lead to an impossibly long detour into structural acoustics and its related math.

Closely related to the requirement of flat frequency response is the phase response of the total signal chain. With analog equipment, this may entail some problems even before transduction. When we consider speakers and room acoustics, phase becomes impossible to retain accurately. Even in optimal conditions phase response is exceedingly nonlinear. Usually this is not a big problem, but when high resolution spatial audio is concerned, it seems that phase is still considerably more significant than the traditional audio literature would suggest. Hence we will dedicate some time to the problem of phase response. We will, after some consideration, place the aim at a balanced (between different channels and listening positions) phase response which is linear enough not to cause significant time domain distortion of the signal (so that transient content is not smeared out of existence).

Depending on the intended application, we have very different requirements for the size of the physical space in which the other requirements must be met. This is the listening area, or, when talking about the size of the optimal listening area, the sweet spot. Its extent is what makes public address (PA) and theatrical audio the most challenging arenas of high fidelity (hifi) audio reproduction. Especially so with the emerging applications employing spatial audio technology. Ideally, we would want the listening are to be as large as the physical space requires. In actuality, room acoustics prevents us from achieving this goal except in a narrow sweet spot. Of course, its extent depends strongly on the tolerances we set for the other parameters. This problem will receive considerable attention further down.

The wide dynamic range of hifi audio is a common tumbling stone in audio reproduction. Maintaining a constant quality of playback from the threshold of hearing right up to 120‐130dB of instantaneous and well over 100dB of continuous sound power is no mean feat. Especially so when we would ideally like to maintain this level of performance without any degradation to distortion or frequency response. Usually this cannot be achieved, even closely.

Spatial audio places its own, further requirements on the reproduction setup. First, we need considerably more complex setups to effect the kinds of complicated three dimensional wave phenomena needed to fool our perception of space and direction. Typically we are lead to a rapidly increasing number of both transmission and transduction channels. This burden is placed on electronic equipment as well, since spatial applications often require processing in the reproductive end of the signal chain, as well. Second, the demands on the surrounding space become considerably more strenuous. This is because unlike in regular playback, we must meet the excruciatingly tight constraints placed by our directional hearing.

In addition to theoretical considerations, practical realisations of audio listening environments need to satisfy some everyday wishes of the user. These include versatility and ease of use so that expensive investments do not go to waste. As new technology (like surround and spatial audio) become available, these need to be seamlessly integrated into existing products and must still be usable by the layman. This goes for television and video equipment as well, since most surround sound applications are nowadays driven by the home theatre phenomenon. Hence, synchronized playback of sound and image and easy, workable interfacing between heterogeneous equipment are a must.

Typical scenarios—where and how do people listen to music?

Home listening conditions

 ‐HIFI
 ‐controllable acoustics

Car audio

 ‐on the road?
 ‐cabin: reactive sound fields
 ‐refer to A. Farina

Portable audio

 ‐bad transducers
 ‐noise ⇒ limited dynamic range ⇒ need for compression

Television and radio

 ‐mono compatibility
 ‐only middle frequencies can be depended on
 ‐modulation scheme: limited dynamics
 ‐competition for louder commercials

Cinema and home theatre

 ‐Multichannel problems
 ‐delay
 ‐delay equalization
 ‐using the Haas effect
 ‐"center sucking"

The electronic domain

Typical processing at the playback time

 ‐equalization
 ‐compression
  ‐home cinema
  ‐the late night mode
 ‐spatialization/enhancement
  ‐amplifier effects modes
  ‐THX decorrelation etc.

Digital to analog

 ‐jitter

Amplification

 ‐powers needed may be a problem
 ‐linearity (consider amplifier classes)
 ‐capacitive coupling brings trouble
 ‐impedance matching to speakers/ability to drive low impedance loads
 ‐chopper circuits (H+ class/PWM) lead to better operation: higher efficiency and linearity (thru delta‐sigma/1‐bit/PWM analysis/inherent linearity of bilevel designs)
 ‐however H+ class has intermodulation and dynamic range (?) problems

Transduction

Idealized sound sources

 ‐point sources
 ‐dipoles
 ‐pressure and velocity fields for the preceding
         (near field analysis)

Types of speakers

Radiation and directionality. Single speakers.

 ‐directionality and radiation patterns (very different depending on frequency)
 ‐polar plots and spherical harmonics
 ‐response graphs (figure of eight and so on)
 ‐efficient radiation calls for elements of right size
 ‐size also controls directionality (plane mounted piston analysis!
 ‐sonic spotlight example!): we might want a smaller element with a more spread out response!

Multiway speakers. Mounted elements.

 ‐radiation efficiency: fourth wavelength
 ‐size of speaker dictates directivity ⇒ transducer size needs to be matched to range  ⇒ crossover networks ⇒ phase and amplitude problems
 ‐Doppler effects (especially with coincident axis multimode speakers)
 ‐coaxial speakers also display weird directional behavior (evening out at about 20° of centre? generally?)

Interaction with room modes. Nondirectional sources.

 ‐variable excitation of room modes by a noninfinitesimal source
 ‐dipole/tripole speakers
 ‐monopole design by conical symmetrical reflectors
 ‐nondirectional sources by multiple speakers (matrix surround in theatres! ambisonics with W signal only)

Reflex design, subwoofers and bass management

 ‐directionality of bass sounds: good or bad? evidence?
 ‐problems with standing waves/room modes: is this the only reason for lack of directionality?
 ‐the issue of resonance (bass reflex design): energy waterfalls show comb resonance

Headsets and externalization

 ‐not suited for direct playback of stereo ⇒ need for crossfeeding
 ‐binaural recordings
 ‐recovery of binaural info on extra assumptions (frontal Blumlein
  recording/ambisonic source/matrix surround/digital multichannel (Dolby
  headphones)
 ‐nearfield directional techniques in phones)

Acoustical correction and equalization

 ‐sound fields
 ‐single channel correction for better speakers
 ‐optimal correction in multichannel situations: a global optimisation
  problem over an area of the soundfield
 ‐utilising masking on the residual uncorrected reflections
 ‐correction needs to be psychoacoustically matched: precorrections are bad
 ‐oscillation is bad (phase response)
 ‐correction has problem with directional response (needs to be flattened
  out over the sweet spot or kills music)
 ‐room equalisation better for low frequencies with long wavelength ⇒ good
  because higher ones are more easily controlled by absorption
 ‐Volterra filters for nonlinearity?