Sound processing and effects

Nowadays, in the time of exceedingly commercialized and industrialized music production, sound processing, post‐production and effects crafting have become an increasingly important part of making music. Practically no one can do without some effects and often mastering can make a whole world of difference when aiming music at some specific segment of the market. To satisfy these needs, an overwhelming number of effects and processing engines, algorithms, environments, plugins and alike have become available and effects algorithms are an integral part of many synthesis platforms (or music workstations, as the more tightly integrated models are today called). This means that understanding effects algorithms is vital to any electronic musician that aims at doing something useful. In this part of the text, then, most widespread effects algorithms and sound editing methods are explored.

A few words about good effects

In actual studio use, effects share their possibilities and perils with synthesizes: they can save a recording or ruin it. That effects should be used with caution and in small doses is a time proven fact. This means that quantity is never a substitute to quality. Usually one needs a wide variety of common effects like flange, delay, reverb, fuzz and vocode, and in many variations, plus some specialties for sweeteners. A do‐it‐all multieffect is rarely a good investment: it does everything, but nothing well. I’d buy a well designed specialist tool instead. Effects need to be very accessible, easy to use and intuitive. Cluttered user interfaces, minimalistic state feedback (only a couple of LED’s, for instance) and poor sound quality are some of the more common problem areas. A wide variety of factory setups is always a good thing, whether or not there is other editing functionality. If there is, the parameters should be highly intuitive and varying them should cause perceptually uniform results—effects are often used on intuition and in a hurry. (Check out PCM80 and faint…) MIDI control is a must these days, because effects heavy mixes are almost impossible to work through without studio wide state recall. This is, of course, most easily achieved through SysEx dumps.

Types of effects

As the term effect means anything that transforms sound, the palette of effects is actually endless. Here only those effects which have been widely deployed in commercial studios are discussed. Most of these are fairly standard, such as reverberation and distortion. Some are not (few people ever get around to using a vocoder, for instance).

As I see it, there are two important aspects of sound processing algorithms from the theoretical and technical point of view are whether the algorithms are linear, and whether they are time‐variant.

Filtering, reverberation, echo, volume control, mixing, panning and a whole lot of further derivatives are all linear and time‐invariant operations, which makes them well understood, very easy and efficient to implement, and perceptually significant. As should be clear by now, most operations one wants to accomplish with sound are LTI, well fitting the linear nature of sound transmission in the physical setting.

If we drop the requirement of time‐invariance, flange, phase, chorus, ring modulation, vocoding and cut‐and‐paste type operations become possible. These effects are still easy to implement, though a little more difficult to handle mathematically. But we also see that LTI processing can only change sound, and not really introduce totally new components to it.

In the context of nonlinear systems time‐variance isn’t really an issue. Even if a system is time‐invariant, nonlinearity makes it quite difficult to analyze formally. For instance, severely nonlinear processing, like distortion, lets us transmute existing sounds into new ones, even when no time‐variance is introduced. On the other hand, controlled nonlinearity and time‐variance arises naturally out of psychoacoustics, the science of how we hear: if we build algorithms to achieve a specific psychoacoustic goal, the inherent nonlinearity of many of our perceptual processes is reflected in the result. Effects of this kind are really difficult to handle formally, yet have great use in the studio. Good examples include the guesswork done by logic decoding of multichannel audio, dynamic range processing (compression, limiting, expansion and gating), and again further forms one may envision.

Of course, beyond gentle nonlinearity and time‐variance, anything is possible. It’s easy to see that any calculable transformation of sound can be implemented on a computer, working on sampled sound, and that is a broad range of transformations indeed. However, these more complicated forms of processing usually have little perceptual significance, and so are of little importance for practical audio processing. For instance, a few pointers to what results from such generalized thinking in the context of sound synthesis were given in the previous chapter, but the results usually fail to impress.

How to make musique concrete—cut‐and‐paste type operations

Probably the most rudimentary operations that can be applied to audio are the cut‐and‐paste type ones. The name comes from the common menu item names of sound editors which in case originated in the word processing industry. The idea behind these operations is that as people hear sound as a time phenomenon, parts sounds should be operated on based on time interval selections. The usual implementation allows time ranges to be selected from audio files (often on a point‐and‐click basis in graphical user environments, such as Windows or the MacOS), after which they can be moved around in time and operated on. The most common operations include cut, copy, paste over, paste insert, delete, crop (trim) and punch‐in/punch‐out. In order, cutting removes and stores a selected range of audio samples, copy stores but does not remove the range, paste over places a stored sound bit over an existing range in a sound file and paste insert (or just insert) places the stored bit in a user specified location, moving following existing sound data forward in time. In essence, cut and insert form an inverse pair, just as copy and paste over do. Cropping removes everything but a selected range (especially useful for isolating shorter sound segments from longer recordings) and deleting just removes a range, splicing the remaining ends of the sound file back together. Punch‐in/punch‐out refers to the realtime version of paste over wherein the pasted sound segment comes live from outside the editing environment (e.g. from an A/D converter)—when monitoring a track, one punches in to start replacing the sound section and punches out to end the operation.

The basic methods in this class are exceptionally simple and easy to use. Implementing them is only a bit more difficult, although efficiency is a more difficult issue. (Because of the large amount of sample data often involved in this kind of editing, heavy burdens are placed on data storage subsystems. Computational complexity is extremely low, though.) The paradigm of cut‐and‐paste can be extended to multichannel audio editing, allowing stereo, quadraphonic and other complex time‐domain formats to be easily edited. Other extensions include fade‐in/fade‐out (crossfade), where edit‐points (the limits of the range an operation is applied to) are not pointlike in time (In effect, in an editpoint the volume of the sound before the edit point is ramped from full scale to zero, and the volume of the sound after the point from zero to full scale. This is done because differing sample values and power spectra on two sides of an edit point are likely to amount in a pop or an abrupt change in sound quality if crossfading is not used.), the possibility of more complex selections (noncontiguous blocks etc.), the inclusion of other editing possibilities in the environment (samplerate conversion, effects etc.), input/output and sound analysis facilities (which are especially useful in sound post‐production, precision sound crafting and bug‐hunting). When many of these features are implemented, the result is often called a sample editor (e.g. CoolEdit, GoldWave) and if serious realtime facilities are present, the result is a hard disk recording system. Such systems have been made popular by the constantly lowering cost of computer and electronics technology and the need for a larger palette of editing operations with higher performance, less audible signal degradation in complex editing operations and the possibility to experiment with sound and then rollback (that is, to nondestructive edit and then undo, possible multiple generations of edits (called multilevel or multigeneration undo)). Also, the common use of highly complex sound fabrics in television and movie sound has placed new burdens on synchronisation accuracy, project and data management and other parts of the audio production chain that only digital technology can ease.

The history of these operations is as long as the history of tape. Namely, in the time before accessible digital editing facilities, these operations were implemented by cutting, splicing and taping back together of audio tape, whencefrom the terms cutpoint and splice. Crossfading was achieved by cutting across the tape in an angle, which allowed the sound to fade over the splice point. Punch‐in and punch‐out are also terms that come from this era. From fourties to fifties, a discipline of genuine, tape based music began to evolve, which eventually lead to a type of music called musique concrete, which revolved around tape‐editing together of natural sounds and natural sounds only. Especially industrial noises and alike plus other sounds not commonly associated with music were given a prominent place in timbral textures, leading to the wide spread acceptance of such sounds as being in the realm of music. In the academic circles, musique concrete was long a serious challenger of synthesized music, especially since its advocates were often furiously opposed to the dead sounds of electronic instruments. Today, most of what remains of musique concrete is its enormous influence on experimental electronic music (and, in more popularistic circles, on such styles as industrial, ambient and techno) and the highly developed art of sampling.

Amplitude control and signal combination: volume control, panning, balance and mixing

People are used to sound as a composite phenomenon. When multiple sound sources reside in the same acoustic space, we most often perceive them as being separate, instead of just hearing the resulting combination as a single sound. This naturally leads to sounds being used in combinations and, when we remember that sound transmission is basically a linear phenomenon, to combination of sounds by simply adding together. This is also called mixing. Further, when mixing sounds, a prominent parameter to consider is the amplitude of the constituent sounds we throw into the mix. Again, linearity does the trick and we get amplitude control just by multiplying our sample values by a suitable constant.

Now as we do have two ears, not just one, we might consider what happens if we use a stereophonic signal and assign separate amplitude multipliers to the two channels. What happens is a simple panning operation—the sound seems to move to the side of the greater amplitude. This is natural since humans use the relative amplitude of sound sources as perceived by our two ears as a powerful clue as to which direction the sound emanates from. Again, the selection of the two coefficients is far from linear: when the signals are destined for adjacent speakers in a multichannel reproduction setup, we usually want to fool the listener into believing that the sound emanates from somewhere between the speakers, i.e. we want to create the illusion of a phantom sound source. It happens that the illusion is quite difficult to achieve for sound sources which are not to the front of the listener, and even there, a simple linear crossfade from speaker to speaker is less than adequate.

The simplest working solution, very common in today’s mixing equipment, is to use a constant power panning law, which means that at every position between the speakers, the total sonic power emitted stays constant while the relative contributions of the speakers change. The reasoning is that contant received power roughly corresponds to constant distance of the phantom source. If we do the math for two equidistant, closely spaced speakers, one particularly elegant and effective solution leads us to a panning law which causes the channel amplitude coefficients to vary between one and zero along one quadrant of a sinusoidal function, with contrary directions for the two channels. This kind of panning is simple to implement, and is the most widely used method to achieve source positioning in stereo sound production. The method can also be directly generalized to multiple output channels, by keeping the total power distributed to the channels at unity. Of course, now there are even more possible constant power solutions.

When multiple channels with separate volumes and pan positions are mixed, we get the basis of what is usually called a mixer—a device capable of independently placing sound sources in a stereo sound field at different relative amplitudes and delivering the result as a stereophonic signal. This kind of processing is the most rudimentary thing in a studio environment and most people do not even consider it to be worth calling it processing, at all. However, it is quite useful to know that the way mixing, amplitude control and panning are implemented have their roots in the human perception, and that each one involves some simplifications and compromises which, in certain cases, can explain some unexpected results we can run against when using them. It is also worth remembering that simple amplitude control is rarely the optimal way to deal with the spatial attributes of audio. Instead, professional multichannel audio production has to deal with real, psychoacoustically motivated controls. Those are only now becoming available in the commercial arena.

Filter based effects: equalization, chorus, flanging, phasing

It was pointed out in the section on psychoacoustics that people tend to classify sounds by spectral envelopes and their evolution. (As speech is based on modulation of that kind, this is very natural indeed.) Also, linear systems always affect the amplitude and phase spectra of sound and sound transmission is almost always quite accurately modelled as being linear. It is, then, extremely important to be able to shape sound spectra, both to remove or compensate for the effect of sound transmission channels and to achieve some characteristic feel (often best characterized by the long term power spectrum of a given signal). This is why linear filtering is probably the most ubiquitous audio signal processing method in existence, both in digital and analog form.

There are many types of linear operations that can be applied to sound signals. Although they all belong to the same general category, the peculiarities of the human perception make for big differences in using them. For example, when fixed filtering is applied in a way that does not appreciably smear transients (i.e. only results in dispersion of transients over a time period short enough not to be heard as a distinctive sound event in itself), the result is what is usually just called filtering. If the filtering action is slowly varied (i.e. slowly enough not to cause audible sidebands to be generated), filter sweeps, chorus, flange and phasing effects are produced, depending on the type of the filter and the modulation applied. If the group delay of some such system exceeds the threshold of perception, echoes and reverberation result. This way, we again see the importance of the human perceptual parameters in classifying effects—even if two effects can be treated as equal when analysing them mathematically, some additional facts are needed to match the result to what we hear.

The basic filter building blocks, as noted in the DSP section, are the first and second order filters. Of course, the effect generated by such a simple section is not very useful in itself. But by combining these sections, many interesting structures can be generated. For static sound modification, equalizers are the most important tools. There are two main varieties of these: parametric and graphic. The first type is the one in which there are usually only a few sections which can be fully adjusted. This means that filter gain (i.e. the amount of attenuation/amplification which the filter applies to its target band), center frequency and often also filter bandwidth can be individually varied. (Of course, each such adjustment necessarily changes many coefficients in digital implementations and the relations between the user parameters and the coefficients used in calculations are far from simple and intuitive.) This leads to quick and effective shaping of specific bands of which there can, however, be only a small number. Parametric EQ is the type most often seen in mixing consoles, where each input channel generally has some parametric EQ facilities. Parametric EQ is mainly used for audio production needs—it is ill suited for compensation of transmission losses since these often involve modification over a significantly wider frequency range and call for a greater number of small, individual adjustments rather than precise large scale operations. Another tool is needed to compensate such acoustic effects as room and equipment frequency responses. This tool is the graphic equalizer. The idea here is to divide the audible frequency band into fixed size (most often on a logarithmic scale) slices, assign a dedicated fixed center frequency and bandwidth filter to each part and let the user vary only each filter’s gain. The name comes from the usual layout of the analog user interface—there, adjustment sliders are put side by side in a regular arrangement so that when one wants a specific frequency response from the system, one forms the target spectral envelope directly with the sliders. The interface is very intuitive and is well suited for static spectral envelope modification, especially if a broad part of the spectrum has to be reworked. On the negative side, one cannot make extremely narrow, focused corrections. These are more effectively made with parametric EQ. This is why most consoles include both parametric EQ (per input channel) and graphic EQ (per output channel, to compensate for the room response).

When using static filters, only the overall sonic feel of the sound can be varied, no gestures can be incorporated. (Speech provides an excellent example: most vowels are characterized well by their frequency spectra. Using fixed filters is like speaking with only one single vowel—not very expressive or informative.) For this, filter parameters need to be modulated. A very simple example is the infamous wah‐wah pedal. Here, only a single low‐pass, resonant filter (i.e. three to five pole filter with at least one complex conjugate pole with variable pole locations) is modulated by moving the center frequency of the filter around with the pedal. At one extreme, practically no effect is created (the filter is open), at the other, almost everything is attenuated (the filter is closed). In between, the moving cutoff frequency creates a distinctive spectral bump (remember that poles create oscillatory behaviour at the cutoff) which is heard as the wah‐wah sound. The same basic principles are encountered in analog synthesizers, although there the filters often have more parameters, are more refined overall and take their modulation from envelope generators or other automatic sources (e.g. LFOs).

Most filters used in music making equipment are of the IIR type. This is because they are easier to implement and to control than FIR (in analog circuitry, FIR is almost impossible to do). Moreover, since self‐oscillation can be produced, IIR filters can produce pronounced resonances, something that can be used to create formants and is easy to use to advantage.

However, there are some effects which call for FIR. The simplest of these is flanging. Here, one adds together a signal with a delayed version of itself while at the same time varying the delay. (The delay must be kept below 20ms to make it sound like a filter instead of an echo.) The result is a time‐variant comb FIR filter—the notches in the frequency response move around to produce the distinctive sloshing flange sound. By varying the relative amplitudes of the two signals, the depth of the effect can be controlled (using negative multipliers on one of the signal paths results in a slightly different sounding flange). Feedback can also be used. This results in all‐pole (versus all‐zero for the feed‐forward type) response, which sounds quite different. Modifications include stereo versions (in which there are more delay lines and mixing points and/or the direct and delayed signals are fed unequally to the different output channels) and using multiple delayed signal paths in unison.

If flanging is what we want, we must modulate the delay length quite slowly. What happens, then, if we use an LFO to control the delay line and set the frequency a bit higher? The result is chorus. Usually the modulation is driven with lowpass filtered noise. Here, audible sidebands should not be generated, but audible filtering effects do arise—a doubling effect takes place—the sound seems to gain more depth as our brain is tricked by the inaccuracy of the recombined sound to believe there are more than one sound source present. (Here, the total variation in delay length must be kept in the 10‐20ms range if FM like effects are not called for…) Chorus effects can vary from subtle to drastic, depending on the amplitude multiplier, delay length, modulation frequency and modulation depth we use. Another effect closely related to chorus is ADT (Automatic Double Tracking), which is, in effect, chorus used with a 15‐30ms long delay and a very shallow, regular modulation. The name comes from the fact that ADT creates a two singers illusion when used on voice—the delay is long enough to be noticed but as the delay varies, the result is heard as an additional singer who can’t quite duplicate the lead one. ADT is commonly used to add depth to vocals when the singer cannot duplicate his/her parts closely enough to make true doubling possible.

Sometimes the effect we get from a flanger is too obvious or doesn’t fit well with the instruments we use. Here, another similar type of filter effect can do the trick. It is called phasing. When we flange, the short delay plus addition causes signal frequencies with a wavelength a multiple of the delay length to be reinforced (or cancelled, if one of the signals is inverted before adding)—that is the effect of the comb filter. Basically, the delay line causes a linear phase shift and upon addition, some frequencies cancel and some reinforce each other. So about using a nonlinear phase shift, instead? Now we get phasing. We shift the relative phases of signal components by an allpass filter and add. The effect is much more subtle since the comb structure is not present in the filter response—whereas flanging produces a repeating filter response, phasing produces only a limited number of (usually just one or two) dips in the system function. This is because the simplest allpass filters (first and second order) have a phase response with a limited total phase span. The filter coefficients are modulated to move the cut‐off point of our filter (on which the phase response changes the fastest) and so a moving point of rapidly varying phase and amplitude is produced in the frequency response. Again we can get more out of our effect by cascading sections, using allpass filters of higher order (especially of the type with a delay inside the double loop, a type lending itself to delay length modulation) and constructing stereo versions. In this way, many eerie spatial effects can be achieved. To hear the stereo version in action, listen to Deee‐Lite—some of their songs have truly nauseating stereo phaser parts.

Some basic multichannel techniques: advanced panning, balance and width control

We already discussed simple, equal power panning. But simplistic panning is far from perfect: the directioning effect we get is not as good as it could be, since inter‐aural time lag, ambience and HRTF phenomena are not taken into account. One particularly annoying problem is caused by our hearing, which relies on more than a single set of cues for directional resolution. Most importantly, above some 4kHz sounds are mainly localized based on the IID while at the lower frequencies the ITD dominates. Since most audio transmission systems are predicated on free‐field assumptions, that is, they presume the playback is performed in an unobstructed space, inserting the human head into the field disturbs the setup quite a bit. While the free‐field assumption enables us to accurately reproduce both IID and ITD over a small volume (the sweet spot), this new situation requires some additional correction. Namely, at high frequencies, we would like the setup to pan based on signal power, at the lower ones, simple intensity. In the middle, we would like to optimize the response to smoothly vary between the two extremes.

This is done by amplifying the directional components in the audio at high frequencies relative to the mono part. This amounts to applying a highpass shelving filter to the sum channel, and a lowpass one to the difference between the channels. This correction is not widely implemented as it requires a reqular, known playback array in order to really work. The best example is offered by better ambisonic decoders, where it is called psychoacoustic shelf filtering and is applied to the omnidirectional and velocity components of the signal. However, it is well known that the same kind of correction should in principle be used whenever we pan sounds around.

While panning is meant to be used on point like sound sources, the advent of stereophonic sound recordings brought on the need of adjust the relative amplitudes of the two stereo channels, in home. This is most often used in an attempt to balance the stereo image when the speakers cannot be placed symmetrically with respect to the listening position. This sort of control is called balance. It is usually implemented by a dual panpot with a panning laws akin to the constant power one. What we get is variable attenuation of the two channels by two complementary quarter sinusoids, this time with each channel retaining their identity through the operation: while panning expands a single channel to two, mixing it in variable proportions to the outgoing feeds, balance instead keeps the left and right separate, changing their relative amplitudes only. Further, if the output channels are tied together, we arrive at a circuit which does very much the opposite of the panpot: it does an equal power fade between two input channels into a single output. This is what the crossfade slider so often seen in DJ mixers does. Again, these methods generalize to multiple channels.

Panning and balance, for the most part, are simple and understandable. But even simple things like these can be turned into an effects unit: making the panning position change automatically produces the autopanner. These generally move the panning position of a sound periodically under user adjustable parameters, such as speed and width of movement. Sometimes multieffects units have a setting much akin to an autobalancer, or an autofader. These are hardly established modes of operation, however.

While panning mixes a mono channel into multiple output channels, and balance controls the relative volumes so that some reasonable shift of the multichannel signal can be accomplished. But how about changing the spatial attributes of the multichannel composite instead of just shifting it? The simplest control to accomplish this is the Mid‐Side (M/S) control, which is just a linear attenuator applied to the difference between the channels. The effect is to slide smoothly between normal stereo and fully downmixed mono. Sometimes negative gains, or gains above unity are offered as well, corresponding to inversion of the stereo field and widening of the sound stage, respectively. M/S applied only to the lower frequencies (typically below 700Hz) gives us what is called Spatial EQ. It varies the width of the stereo image far more naturally than simple M/S. Finally, there is a technique called shuffling which is based on lowpass filtering (integrating) the difference between the stereo channels to a variable degree. It is used to convert signals from two omnidirectional microphones into approximate coincident pair intensity stereo. The amount of integration is determined from the distance between the physical microphones, but when placed under continuous user control, the effect can be used to affect the width of the stereo stage and the relative size of the sweet spot, just as ambisonic’s shelf filters can. Once again, all the above techniques carry over to the multichannel setup, albeit with a considerable complication in the precise underlying mathematical details.

Adding space—echo, reverberation

The previous section discussed what can be achieved by linear processing if we restrict ourselves to filters which do not smear transients sufficiently to cause our ear to perceive significant lengthtening in them. When this restriction lifted, we get effects which fall under the general categories of echo, multitap echo and reverberation. The simplest of these is echo: we just use a delay line of some tens to some hundreds of milliseconds (or even longer) and add the delayed version to the original signal. The result is slap echo—any transient is repeated at a fixed time later in the signal. This system is basically a very high order comb filter, but since the delay is so long, we no longer perceive the effect as a filter, but rather hear transients repeating, echoing.

Using the corresponding IIR version of the filter (feeding back instead of forward), we get exponentially decreasing echoes. If we use multiple forward delays and give each a separate amplitude multiplier, we get a multitap echo (usually implemented by tapping into a single delay line; hence the name), which is a special case of general FIR filters. Add limited feedback and you have what is usually carried under the name of multi‐tap echo in effects processors. These effects can be used to color sound (especially if very short delays are used) and to create tightly controlled rhythmic echo patterns.

Echo is nice, but even using multitap constructs it is extremely difficult to simulate anything approaching the echo density of real physical spaces. For that we would need thousands of delay taps. However, in the sixties, Schröder described an efficient way to achieve a convincing simulation, which Moorer later built upon. Here it is noticed that as people mainly discriminate acoustical spaces by the first few perceived echoes after the direct sound (which, of course, arrives first as it takes the straight line path) and after those only hear the approximately exponentially decaying tail of the rest of the echo envelope (the echoes get multiplied in complex acoustical spaces so that while we approach the end of the echo tail, thousands of echoes per second are heard—these necessarily come at intervals significantly shorter than what we can perceive as being separate), it should be possible to simulate acoustical environments by using multitap echo for the early reflections and any linear filter producing sufficiently dense, smooth and exponentially decaying echoes after that. The construct he deviced is the one most reverberators are based on even this day and is composed of a number of simple allpass and comb IIR filters in series and parallel, possibly inside a feedback loop. In parallel with this bank, a separate multitap echo is used to account for the early reflections. All these constructs are easy to generalize to an arbitrary number of simultaneous channels.

The Schröder‐Moorer construction is nice, since it allows computationally efficient reverberation to be easily implemented. However, just like any other abstraction, it has its problems. Firstly, it doesn’t simulate frequency dependent damping in acoustical spaces. Most materials and even the medium (air) tend to exhibit such phenomena. Thus lowpass filtering is sometimes included in the algorithm. The second problem has to do with the static nature of the simulation. Most acoustic spaces call for some dynamics, at least in the early reflection calculation—sound sources often move and even the surrounding space can change shape. Next, due to the inherent structure of the elements used, the reverberator is quite difficult to tune for proper behaviour—it is all too easy to come up with parameters which produce uneven reverberation and a ringy, metallic quality. This is why newer designs based on nested allpass filters, networks of waveguides, feedback networks of multiple delay lines connected by a square matrix multiplication and combinations of sparse FIR filters (essentially multitap all‐zero echo elements) with brute force FFT based convolution are nowadays favored over the classic Schröder construction. Notice that all such structures are computationally demanding and the art of designing psychoacoustically plausible reverberation without consuming thousands of MIPS is highly advanced.

The algorithm is global in nature—one cannot count for specific shapes of the surrounding space or directional reverberation. An example would be a musician playing over a well—the well reverberates heavily and the reverberation comes from a specific direction. This creates a need to apply several separate reverberation algorithms in parallel if such effects are desired. The structure of the algorithm can also be optimized quite heavily under some circumstances. Many different filter structures are used in practical implementations, then, and if great versatility is desired, the dense reverberation tail can be implemented by convolution (with FFT, overlap‐add and frequency domain partitioning for zero latency), allowing accurate simulations of actual acoustical spaces with minimal computational overhead. Sometimes the principles of physical modelling are brough into play by simulating acoustics with meshes of waveguides and filters. These methods are, as typical of physical modelling in general, computationally heavy. However, they are the only known method capable of simulating dynamic spaces and diffraction effects.

The design of multichannel reverberation is a considerably newer area of study than the classical monophonic case. The reason is, again, the amount of computation to be done. The simplest reverberator of this kind simply mixes two stereo channels, applies a mono reverb and sums this to the two output channels. This is bad. Recent study shows that decorrelation between reverberation emanating from the different channels is essential to the sense of envelopment imparted by the effect. Not surprisingly, this holds mostly for the low frequencies: this is where interaural phase differences dominate directional perception of sound. The same considerations make amplitude panning work poorly with spatial effects: correlated signals hamper externalization and envelopment. Early reflections are pretty important too, as they control the sense of directionality and spaciousness in reverb. In fact, most reverberators have a setting which produces early, lateral reflections only—this is usually dubbed ambience or warmth, chosen for the feeling of envelopment and embedding into the sound field caused by the effect. The early reflections are concentrated in about 60ms after the direct sound is heard. They should never ever be created in mono and in the best case they vary with panning position. The reverberation after about 150ms is the dominating factor for envelopment and the sensation of distance. What matters is the ratio of direct to reverberated sound, not the absolute level of late energy. But surprisingly enough, we perceive the amount of reverb itself absolutely: a loud sound with the same relative level of reverb sounds more reverberated!

Making the sound come from somewhere—directional sound enhancement

In nature, many things affect sound before we hear it: the characteristics of the sound source, radiation effects, the environment we hear the sound in and the way our perception processes the sound. This far we can combine multiple sound sources, generate different kinds of instruments, splice audio clips etc. That means we can do quite a lot. But one link is still missing—how do we create an illusion of reality? Most effects aim at coloring sound in some specific way which may not have much reason other than it sounds good; they modify sound source parameters. But if we really want to make music and sound effects seem real, we must take into consideration the way physics shape sound so that we can hear where it comes from. Reverberation is the first logical step since our psyche requires sound to be heard in an acoustical space, something which is always present in our everyday environment. The next step, then, is to make the sound come from somewhere, to have a direction. The surprising fact is that our physiology is invariant enough to permit this to be done with adequate accuracy without separately matching for each and every listener. The linearity of sound transmission in air leads to another surprise: proper binaural processing requires only linear filtering which means that FFT, vector operations acceleration and all the other usual optimizations found in any book on numerical algorithms can be applied.

In the psychoacoustics section the main cues of directional sound perception were introduced. These are the relative amplitudes and phase differences received by our two ears, the amount of reverberation, the spacing and amplitudes of the early reflections and the head related transfer function. When we want to impose direction upon sound, we need to simulate these effects. First, room response needs to be decoupled from the processing taking place around our ears. This way reverberation can be done separately from binaural processing, leading to a greatly eased computational burden as reverberation must only be implemented once whereas binaural processing needs to be done separately for each sound source. Second, we must decide the accuracy to which we want to simulate the cues. Often HRTF processing (which is expensive) is not needed if sound sources do not need to reside behind the head or move from the horizontal plane of our ears. In this case it is sufficient to simulate the interaural time difference (and doppler shift, if the sound sources are moving), amplitude differences and possibly high frequency damping when the sound sources are at variable distances from the listener. The positive sides of this approach are the relative ease at which it is implemented, the low expense and listener independence (HRTFs are slightly different for each of us). However, if free and accurate positioning of sound sources is desired, we must either use multiple speakers that surround the listener and pan properly or use HRTFs in combination with headphones and process early reflections and reverberation amplitudes appropriately for each source. This is expensive if done right. It is also listener dependent although fairly good simulations can be achieved by using sufficiently generic HRTF models.

Simulating HRTF is the best way to achieve sound positioning, if only stereophonic transmission is available. As was noted above, in this case it is necessary to use headphones instead of normal speakers to avoid the listening environment affecting the stereo image—after all, HRTF processing is a rather delicate technique and as such requires a stable and controlled listening environment.

Implementing HRTF is rather straight‐forward: one convolves (probably by overlap‐add FFT techniques) the necessary impulse response with the signal, once for each ear, and possibly adds some reverberation and/or calculated early reflections to the sound and delivers it for mixing. Standard impulse response patterns, measured from dummy heads and/or actual human ear canals, are readily available, as are the algorithms needed for reverberation. Early reflections may generate more hassle, especially if the sound sources are moving or if HRTF processing needs to be applied to the reflections as well. If the listener is static (i.e. does not move in the acoustical space), early reflection calculation can be combined with HRTF processing by adding reflections to the HRTF impulse responses. If not, things become much more complex. Consequently, early reflection calculation is often neglected in actual designs. If either the listener or the sound sources move, one also needs to worry about filter interpolation, doppler shifts and parameter update rates; these too are often approximated rather crudely, if at all.

Positioning sound is difficult, then. Why? Because all current signal processing is based on one dimensional abstractions whereas sound is heard in three dimensions. It is true that point‐to‐point static sound transmission is, by virtue of linearity and time‐invariance, accurately represented by a linear filter. If the points are not static, however, time‐invariance breaks and the complex three dimensional structure of sound fields reveals itself—a linear filter is still exactly what is needed, but the filter is now highly time dependent and its impulse response varies chaotically.

Simulating such a filter calls for great simplification, heavy handed approximation and proper breaking down of the problem into more manageable pieces. This is why this particular field of audio DSP is one of the main research subjects of the moment.

Dynamics: compression/limiting and expansion/gating

Up to this point, only linear systems have been considered. While it is true that most common operations on sound, especially those considered to be commonly applicable effects, are linear in nature, there are some instances where nonlinearity must be brought to bear. The most inconspicuous use of such methods is the modification of sound dynamics, where one attempts to broaden (expansion), flatten (compression) or shape (gating and limiting) the perceived variation in sound loudness.

When we talk about the dynamics of the sound, we refer to variation in rather long time scales—variations in loudness occur at time scales of tens of milliseconds and more. This implies some kind of averaging over periods of hundreds of samples and upwards and quickly brings forth the nonlinearity of our loudness perception. Wishing to filter a parameter such as this results, then, in nonlinearly signal dependent amplification. I.e. we multiply our signal by a time‐variant, signal dependent number. However, if we only want to affect loudness without noticeable effects on timbre, we must not rapidly change our multiplier since such rapid amplitude modulation results in sidebands (heard, in this case, as pops, rattle or fuzz). Adjusting sound dynamics consists of estimating the loudness of the input signal (usually done by either peak or square calculation and low‐pass filtering to achieve the averaging effect mentioned above), translating this into a suitable, sufficiently slowly varying multiplier and applying the multiplier to the signal. The only stage that really differs between compression, expansion and gating is the translation. Dynamics processing is usually employed for two very different reasons: we may want to alter the dynamics in a way that is transparent to fulfill some technical or subliminal goals or, on the other hand, we may wish a specific artistic effect.

Compression attempts to reduce amplitude or loudness fluctuations. Amplitude fluctuations are bad for analog machinery because of its limited signal to noise ratio and dynamic range and for digital equipment because of hard clipping and quantization noise. The listener’s perceptual dynamic range can also present problems: in car on the road the headroom between the quietest sound heard over the noise and the loudest sound heard without pain is much slimmer than when the listener sits at home where little ambient noise is present. Pronounced loudness fluctuation can also be irritating to the listener or against the rules of some musical genres. (Rock should be just as solid as the name suggests, for instance.) This means that compression can be used either as a fix to circumvent range limitations or as a creative effect to solidify a sound. People actually do the very same thing by themselves: we roll the volume knob to keep the sound in a pleasing range. The recording community even has a term for this in the context of manual dynamics control for tape recording: gain riding. In dealing with technical boundaries, peak compression is usually better (because absolute overshoot is being combated), in the second RMS (since it better approximates perceptual loudness). Compression is achieved by amplifying lowlevel sounds and attenuating stronger ones. Usually the determination of our amplitude multiplier takes the form of a two piece line segment approximation on a linear scale: up to some point, sound amplitude increases rapidly after which it increases much more slowly. The point of change is called the knee of the compressor and the ratio between the slopes of the two line segments (increasing ratio means a sharper knee) the ratio. Consequently, if one drives a compressor with small enough signals, only amplification results (this section of the compressors response is really implemented by sheer amplification, only the upper portion requires dynamic control). If driven over the knee, however, the compressor begins to limit the increase in the amplitude. The more the ratio, the more this limiting shows. If ratios over 1/20 are used, the action is called, appropriately enough, limiting, since the signal does not appreciably increase in amplitude after hitting the knee. Compression causes peaks in the signal to be attenuated somewhat, so average amplitude can be raised—the signal gets flattened. This can be used to sustain sounds (indeed sustain pedals do just this) and to raise their perceptual loudness.

Some further parameters involved in compression are the attack and release times. These are incorporated by tracking the attenuation applied when operating over the knee (gain reduction in studio lingo) and by using a simple first order filter to limit the rate of change of the parameter. Both parameters control the time constant of one such filter. The one coupled to the attack parameter is a one used when the estimated input amplitude first rises above the knee, the second a one which is switched in when the estimate again drops below the knee. The reason these parameters exist is that they control the amount of time it takes for the compressor to adapt—shorter attack times mean better peak limiting, shortening the release time causes the compressor to return more rapidly to normal operation after a loud peak. But the shorter the time constants, the more prone the compressor is to pumping, the modulation of quieter (but still audible), mostly steady state sounds by louder, often percussive ones which cause the processor to go in and out of the active region (the operating range above the knee).

Expansion is the opposite of compression. Here one attenuates low level signals and amplifies stronger ones. The knee is inverted and the active region is now below the knee. When expanding with high ratios, one talks of gating. Here low‐level signals get attenuated so much that they are practically shut of. Hence, only when a signal exceeding the knee appears, does it sound through. This way noise in the quiet spots can be thrown away without doing much harm to the rest of the signal. Inserting a gate with a high knee after a reverb we get a gated reverb which was common in 80’s rock recordings. It is most commonly used on the snare drum to get the decade’s massive, snap‐off reverb sound. Since noise is a major enemy of the recording engineer, dedicated gates are a very common and useful implement. As such they come in packages of four and beyond. In these models the delicate time‐domain options provided by slew rates, attacks and delays are often missing. Instead such units implement the knee as a transistor switch which simply cuts the signal below some settable amplitude level.

There are many enhancements to all these algorithms, some of which are slew rate limiting, more complex response curves, hysteresis, look‐ahead, hold time limiting, soft knee response and multiband operation. Slew rate limiting refers to a technique where the otherwise filter generated transitions between attenuation settings are limited to some preset speed of transition (slope/slew rate). This can reduce some audible artifacts when using short attack/release times without compromising the smoothness in attack/release envelope ends. Especially in numerical implementations, amplitude response curves can be more complex. A good side in this is that it allows some interesting combinations of compression/expansion and also smoother transitions between operating ranges, also called soft knees, again resulting in less audible artifacts. A soft knee rounds of the transition from normal operation into attenuation—this is especially nice in connection with low attack/release settings. Hysteresis tries to reduce pumping by attempting to keep gain reduction constant if the input amplitude changes only for a small amount. Hold time limiting works on the same problem by setting a minimum time the compressor must remain in compressing mode before returning to pass‐through. Multiband operation divides input signals by frequency and operates on the bands separately. These are used to reduce pumping—sounds in different frequency ranges cannot cause other bands to pump and masking takes care of the background sounds in the same band. Quite separately from pumping, multiband compression adds the total signal energy even more than is possible with usual compressors. Hence they are especially popular with radio stations as more sound energy means more volume and more perceptual strength for the message (e.g. ads cut through ambient noise much more readily).

Actually the worlds of popular music and advertising have caused the battle for more volume to go even further than the previous paragraphs might suggest. In fact, now there are dedicated units, called sonic maximizers or boosters which are meant precisely to maximize the loudness of a given sound. For the most part they are combinations of the basic dynamics processor building blocks described above. The most common form is an amplifier plus a limiter operated in lookahead mode with extremely short attack and release times. Oftentimes the signal is allowed to clip slightly to increase its energy, sometimes there is even separate waveshaping circuitry to handle the overdriven case. One example would be a carefully constructed tube based limiter. Some of the more sophisticated models incorporate a multiband gate/compressor/limiter. If the dynamics section is operated in peak detecting mode (instead of RMS), the average amplification can sometimes be raised by first passing the signal through an allpass filter. All in all, maximizers are one of the best selling rack effects at the moment.

Multiband expansion and gating can be used as a more sophisticated form of noise control as audio power on one band does not affect the attenuation of noise on another. Look‐ahead, in case, refers to a mode of operation where the output is slightly delayed and stored sound is used to adjust attenuation coefficients more optimally—the compressor tries to anticipate changes in the amplitude and react instantaneously. Of course, like all the other effects, dynamics processors also need to come in stereo/multichannel varieties, in which case it is necessary to be able to apply equal compression to all audio channels to avoid shifting the stereo image when only dynamics should be affected. And as a final improvement, one might wish to use the dynamics of one signal to drive the compression of another (called ducking), or use only some specific part of the audio spectrum of a signal to drive the dynamics processor’s attenuation determination algorithm. De‐essing is an example of the latter. De‐essers are essentially compressors with an adjustable bandpass filter in the detector chain (center frequency in the range from 1kHz to 4kHz range), making them effective in limiting sibilant consonant sounds. The variations and combinations are endless since dynamics processing is one of the most important studio effects of today. Tinkering with sound dynamics is to be considered an integral part of today’s commercial sound production.

Introducing nonlinearity—distortion, excitation

The previous section also introduced a class of nonlinear operations. There the emphasis was on relatively subtle processing. Instead, here we want the full sound of sidebanding, distortion and—you guessed it—nonlinearity.

When using an instrument which has a very dead characteristic, such as an electric guitar (which by its construction doesn’t allow many modulatory and/or performance techniques to be used and is designed to give an even, predictable sound), one often needs ways to beef up the sound. One such technique, almost ubiquitous with guitar players, is distortion. Distortion refers to the sound one traditionally got by overdriving a guitar amplifier. This results in clipping, resonances, feedback and other nonlinearities not usually a part of the device’s normal operation, adding additional frequencies to the sound. Generally these frequencies take the form of sums and differences of input frequencies, although the limited slew rate and manifold unstabilities of analog tube amplifiers can also give rise to more unique distortions. The basic fuzz is usually simulated numerically either by hard clipping (which sounds considerably more harsh than analog tube distortion) or table lookup (i.e. waveshaping, with similar sound, of course). The kinked frequency response of the analog guitar‐amplifier‐speaker combination also needs to be simulated to make for a good stomp. The name crunch is used for more subtle distortion/filtering actions, used mainly for blues/jazz work. All in all, distortion adds depth to sounds and can be used as a creative effect—after all, as in waveshaping, the strength of the effect is heavily dependent on the average loudness of the input.

Exciting is a more subtle form of effect, akin to crunch. It is designed to be used on a wider variety of sounds than fuzz. The idea is to add punch or definition to a sound to make it stand out from a mix. The usual approach is to add subtle amounts of odd harmonics to the sound by adding a bit of a rectified version of the signal back to it. Other ways of giving the same illusion are variations of multiband compression, processing of attack transients and phases of sound partials, subtle highpass filtering (rather like the technique of presence peaking used in some vocal microphones) plus many more in all the possible combinations. All these come under the names of exciter and enhancer. If used sparingly, they can be a great tool when a busy mix needs just that final touch. Consequently exciters often come combined with a maximizer—to get the final cheese into a mix, we first use an exciter to give a bit of a scruffy edge to the track and then maximize to get into competitive volumes. Brutal, ain’t it?

Alien effects—ring/amplitude modulation (AM)

Dynamics processing was based on modifying signal amplitude so that it varied in some predetermined manner. Effects of that kind do not require any outside control to perform their job—we want the amplitude to vary less, for example, but we do not control it in any way from outside the effects box. Not so with ring modulation. Here we want to merge two signals so that they interact with each other. Basically, as the name amplitude modulation suggests, we merge the signals by multiplying them by each other. What is the kind of interaction that results, then?

As was seen in the section on DSP techniques, multiplying a signal by a sine function results in sideband formation with each component sine wave of the signal. Consequently, amplitude modulating with a signal other than a sine wave gives rise to even more sidebands, namely all the combinations between the modulator and the modulated partials. This is why ring modulation produces a sound which is much fuller than the originals. It is also quite nonconsonant, since the sidebands are not in harmonic relationships with each other. This means that amplitude modulation is not a technique that is very generally applicable in musical applications. For sound effects it is very good, witness its ample use in science fiction movies.

Even with such a simple technique, ramifications exist. They are similar to the modifications on which the different AM based modulation methods used in radio communications are based. First, it is sometimes useful to add a constant to the modulator signal. This makes the other signal leak through in variable amounts, so that it is always present in the output. Another modification is what is called SSB in radio technology—killing one of the sidebands formed when modulating a sine wave. This results in a kind of inharmonic pitch shift effect. Finally, one can simulate the approximate AM effect achieved by certain kinds of analog diode bridges. In this case the carrier signal is just clamped by the modulator. The effect is even more harsh than the one achieved by usual AM.


The term vocoding originated in the telephone industry when some researchers at Bell Laboratories attempted to find ways to code speech signals in a more efficient manner. It comes from the words voice coding. Its guiding principle is that it should be possible to code speech sound by separating its component frequencies and communicating these. The original application of the technique failed miserably because speech signals have so many partials, that separating these into time‐variant spectral estimates actually increases the amount of data needed to represent the signal. However, the technique was taken up by music researchers and later on found a novel application in cross synthesis. Nowadays the term is used almost exclusively to mean such an application.

When vocoding, one attempts to impose the short term spectral envelope of one sound signal upon another. This is accomplished by separating the spectral envelopes of short sound sections (by fast Fourier analysis, pitch synchronous partial estimation methods or with the help of filter banks; the latter is the original Bell Laboratories version) from two signals and multiplying or morphing to combine them.

Short sound bits are used because the aim is at catching the (often quite rapid) time variation perceived in acoustical signals. The result is often quite eerie—for instance, when vocoding some wide spectrum signal with speech sound, the speech can be recognized in the result but its character has been replaced by traits of the other signal. This way, speaking and/or singing instruments can be synthesized. Many rap artists used this technique a couple of years back (see California love by 2Pac or Pony by Ginuwine).

The prime concerns in vocoding are avoidance of splicing artifacts (as the sound needs to be divided into sections), interpolation between the processed blocks, finding suitable signals (as narrow‐band signals are not proper excitation for cross‐synthesis with speech signals), sufficiently accurate spectral estimation (since the sound sections are short, spectral smearing easily occurs; the effect can be lessened by oversampling in time, but then interpolation at splice points can generate chorus like distortion which is not very good) and the high computational cost associated with spectral estimation and morphing. Vocoding is also a rather specialized effect not suitable for most types of music. This is why it is used quite rarely and is not incorporated in most popular effects units. Thus it finds application mainly in the hands of music researchers and experimental composers, with occasional appearances in pop music (predominantly rap).

It is questionable whether vocoder techniques should be classified primarily as effects or analysis‐resynthesis. Often intermediate analysis data is modified in vocoders as well, so analysis‐resynthesis might be more suitable. However, in all commercial products only simple cross‐synthesis is possible and the units have been optimized for modulation by speech. This is why this section is placed under effects instead of synthesis techniques.

Pitch modification: compression/expansion and harmonization

Pitch shifting is a technique that is gaining popularity with the remix and dance music industry. It is used to modify the perceptual pitch of an audio signal without altering the speed or rhythmics, or change the tempo without changing pitch. The first application is good for correcting slight differences in instrument tuning or inaccurate vocal performances. The second is the one used extensively in remix productions—usually one needs to hasten things a bit when making a catchy dance track from 90BPM rhythm’n’blues. And one certainly doesn’t want the lead vocals to go chipmunks.

There are two main paths to good pitch shifting. The first employs time domain granulation and subsequent reassembly, the second relies on vocoding or analysis‐resynthesis techniques to modify the sound in the frequency domain. Granulation preceded the more sophisticated frequency domain methods, and would probably be considered quite outdated if it were not for the distinctive sound it produces when overdone. Granulation is based on the observation that if we want to stretch sound in real time, we need to read the signal in at a constant speed and output the result at a different one. This can be accomplished by storing the signal in solid state memory (or, as was done in the earliest applications, on tape) and reading it at a different speed. But because of the speed mismatch, we must either drop some data or interpolate some to fill the gaps. This is accomplished by smoothly windowing segments off the incoming signal, stretching these to accomplish the pitch shift and recombining to produce the output. Locally the sound gets stretched, but as each of the windows stays put temporally (window peaks do not move in time), the approximate rhythmic structure of the sound is preserved. If we choose the window lengths and windowing functions properly, very little audible distortion is heard even when doing pitch shifts of over a major third. Going further, the granulation process starts to sound through—the result is best described as choppiness at the granulation frequency. This process is often used for voice robotization in dance music.

If we want to do a tempo change without affecting the pitch, we do the process the other way around: we do not scale our windows but move them in time. This is the same as resampling the output of our pitch changing algorithm. Granulation is a workable method of pitch shifting, especially since it is quite cheap and easy to implement and performs well for small shifts. For extended ranges, however, more sophisticated methods are needed. One good example of a pitch shifter based on this method is the Lexicon Varispeed. Another method, less advanced but based on the same idea, is the one used in the Eventide H910 Harmonizer. Here one doesn’t use windowing at all, but instead uses just a circulating buffer with two pointers, the input and output pointer. These advance at different rates, of course, and each time that we encounter a situation in which our buffer pointer would cross, the output pointer is moved to a position determined by a heuristic matching algorithm. This usually means that complete cycles of the original sound are dropped and this way distortion is reduced. The method is rather bad, however, as it produced considerably more distortion than the windowing variant—even at small shifts. For vocal sounds, however, the results can actually be more accurate since glottal pulses end up being duplicated by the algorithm instead of being multiply crossfaded. More generally, pitch/time processing is heavily dependent on the properties of the signal we are processing and the tradeoffs we make in the algorithm always cause it to be better matched to some signals than others. The human voice is probably the easiest target for processes of this kind, inharmonic and noisy signals the most difficult.

The other approach, being based on spectral methods, is more robust to large shifts. Here one analyzes the sound by FFT or (preferable for voice) pitch synchronous harmonic analysis, shifts the analysis data and resyntesizes. The results can be very good even when the shift approaches one octave. Beyond that, a new kind of distortion kicks in: the spectral envelope of the sound moves with the shift and that is perceived as a kind of tightening which is highly irritating. Methods to overcome this include imposing the original spectral envelope of the sound over the shifted spectrum and intelligent combinations of all the aforementioned algorithms. Windowing effects can also cause problems since after pitch shifting, component frequencies, phases and amplitudes can be quite incompatible with the window length. These can be combated by appropriately chosen analysis windows, pitch synchronous operation and proper overlapping of the resulting data windows. Of course, resampling and inverse operation can again be applied to change tempo instead of pitch.

Harmonizers are a direct application of pitch shifting techniques—they are used to create backing chords from solo tracks. As such, they most often just use multiple pitch shifters in parallel to achieve the additional sounds. Most often harmonizers are based on spectral methods, since this way one can do all modifications on the analysis data at once and then resynthesize. Spectral methods are also better since they are more tolerant of large shifts, something which is absolutely necessary if proper chords are to be created. Harmonizers are used mainly on vocal sounds, so it is possible to use linear prediction as well. This path is often taken particularly in academic endeavours—commercial effects units rarely do LPC since it often requires a lot of hand tuning to work properly. LPC is also computationally demanding—it is not very viable for low cost real time operation.

Often combined with harmonizing, pitch quantization is one final variant of commonly utilized pitch processing technology. Its aim is to correct slight inaccuracies in the pitch of recorded material. It works by tracking the pitch of an incoming signal and shifting it by variable degrees to fit it into a preselected tuning scale (most commonly 12 semitones per octave equal intonation). Since pitch tracking and high accuracy of operation without perceptible distortion is required, actual implementations are usually quite limited. Commonly voice is the only type of signal properly supported. Settings for algorithms of this kind include the scale we are quantizing to and a number of hysteresis, hold time and aggressiveness parameters to limit the amount of processing incurred (and, hence, the level of artifacts produced). The problem is, most professional singing involves a lot of stylistic gestures and small modulatory traits which can all too easily be killed by too aggressive pitch quantization. If vocoding and/or LPC technology is used to implement the quantizer, we also get annoying flanging artifacts. Today these are one of the best ways to tell poor, overproductized singers from proficient ones without hearing a live performance.

Now as it happens, the very artifacts that make pitch quantization such a dangerous tool have found their niche as a creative effect as well. The best example is the peculiar, computerized kind of vocal sound which propelled Cher’s hit Believe into the charts. The artifacts are produced when a tightly set quantizer translates brief, artistic slides, vibrato and blue notes into series of extremely fast slides on a semitone scale with the attendant transient distortion plunging below the time resolution of the resynthesis filterbank of the quantizer.

Multieffects and patching. Composite algorithms.

Most effects in the preceding sections do not stand well on their own: all too often more beef is simply needed before a desirable sound quality is achieved. It is also more economical to build many effects into one box—only a minute portion of all possible effects are ever needed at a time. This has lead to a proliferation of multieffects units, some of which permit patching similar to that used in synthesis. In fact, it is nowadays almost impossible to find dedicated units for most of effects the effects described above.

The most common type of multieffect is one producing all the delay based effects (flanger and/or phaser, delay, tap‐echo and reverb). This is logical because all these require similar processing (mainly linear filtering and buffering) and can be handled with moderate hardware (i.e. one can make do with an inexpensive fixed point DSP chip and some memory). To make the unit more attractive, MIDI control, modulation (LFO) and more factory setups are made available. Units like this are everywhere, and rarely amount to anything significant. The worst problem is that usually the lowlevel parameters of the algorithms are not exposed to the user, limiting the usefulness of the hardware somewhat. Moving towards the midrange, distortion and other guitar type effects plus some more complicated and/or unique combinations (like distortion combined with midrange boost, commonly used to achieve a retro sound) can be added. Some programmability can be added, as well. In the high end, more signal processing power, multiple simultaneous algorithms, stereo or quadraphonic processing and complex patching come in to the picture as do MIDI based editors and complex modulation possibilities. Some companies (e.g. Lexicon and Yamaha) also produce high end processors which specialize in only a few effects types, like delay/echo/reverb (Lexicon PCM80).

A new trend in the effects world are plugins. Plugins are small software modules which extend the functionality of established computer audio editors and digital recording software. Plugins really offer the best of all worlds: one only needs to pay for the programming, not the hardware, there is no need to keep precious hardware unused since all the effects run on the same iron, unlimited patching is sometimes possible, more signal processing power can be brought to bear at will and non‐realtime operation becomes an easy treat. On the downside, one needs some pretty amazing hardware to run complex effects configurations in real time, applications accepting plugins are often heavyweight and expensive and it is often much more difficult to find a quality plugin and to use it than to plug in a conventional effects unit.