Control, modulation and patching

This section is about an aspect of synthesis and sound processing systems which is often overlooked. Namely, it speaks about modulation and control facilities, the layer of indirection which stands between the end user and signal processing views of electronic audio.

There are many perspectives to the control structures covered here. A synth vendor views them as value added or, simply, marketing leverage. A naïve synth buyer thinks of them as something which could be useful. A more experienced synthesist probably equates such facilities with necessity. A veteran sound designer no longer even recognizes them as being separate from the lowlevel synthesis engine. CSOUND programmers and synth implementers view control as anything running at k‐rate (which I sincerely hope isn’t for kontrol). And, finally, someone who has worked long enough with lowlevel tools might even dismiss such sophistication altogether. My opinion (possibly an inch biased by my fondness for abstraction) is that stepping up from the bare metal stance of signal processing is absolutely essential to intelligent sound design.

Abstraction: objects, parameters and protocols

So, what do modulation and control mean and why are they necessary? Let’s start from the basics. To generate or transform sound electronically, we use algorithms. (On analog equipment these take the form of prescribed interconnections between components.) An algorithm takes in signals, usually as a streams of numbers, and based on the input and some internal state updates the state and outputs something. Although it is possible to use a hard wired algorithm (one which can only accomplish a single, fixed function), more often we want some flexibility. This is achieved by presenting additional input to the algoritm which is then used to change the way the algorithm works. For instance, halving the amplitude of a signal is not nearly as useful as changing it by a separately specified amount. Algorithms like this are said to be parametrized. Now, signal processing at its lowest level is complicated and computationally intensive. This means that we would rather use simple, easy to understand and highly optimized algorithms at the hardware level. Yet we need a lot of flexibility so that the sound can be molded to our liking. There is also a need to convert machine friendly operations (multiplying by a number) to more understandable ones (like setting the volume of an instrument) and vice versa. We would also like to automate some of the more tedious tasks one necessarily faces when dealing with audio representations (so that gain riding becomes compression and ensemble performance turns into multichannel playback). All this can be accomplished by separating the actual signal chain, optimizing it for some specific function and passing parameters to it from a newly created control plane.

Before we can fluently talk about modulation and understand the true depth of the subject, we need to establish some terminology and to see examples of what has already been done in commercial products. By abstracting modulation sources, destinations and their interconnections we can find interesting new possibilities for patching.

Like the final sound signal, the synthesis parameters are time varying—they are represented as functions of time. They originate either in user interface equipment (keyboards, pedals, drum pads, switches, ribbon controllers, wheels etc.) or through some generating algorithm inside the control plane itself (like in a sequencer or an LFO). This way the sources of control data constitute software objects, just like the signal processing algorithm. In particular, they have their own parameters (like keyboard sensitivity, lfo frequency, pedal polarity, sequencer speed and track contents). This means controllers can be used to control controllers (like driving the rate of a low frequency oscillator with a mod wheel). Likewise, the destination of modulation data is modelled as objects with certain controllable parameters. The interconnection topology between modulation sources and destinations constitutes what is called a modulation routing in synth circles. This mesh of interconnections can be extremely complex—there’s no reason why multiple separate sources could not be routed to the same destination or why multiple destinations could not be driven by the same modulation source. Only the routing architecture of the synth limits how modulation signal can be processed and routed—it tells what can be routed where and how stiff the interconnections are. Some synthesizers (like Kurzweil’s K2000 series keyboards) offer extensive routing possibilities—in this case the modulation routing mesh itself offers possibilities for modulation control (e.g. switching routing configurations by a pedal). In most synthesizers, however, the possibilities are more limited and a given modulation routing is really considered a fixed part of a given patch (i.e. sound configuration). This happens mostly when a lot of structure is present in the synthesis architecture itself (e.g. algorithms, patches, programs, layers, streams, time regions, oscillators, sound grains and so on), we often learn that some control structures have been permanently linked to a part of the object hierarchy (e.g. one may have a dedicated envelope generator for oscillator amplitudes but no modulation possibilities for layer effects pan).

To make it possible to unify the patching procedure, it is necessary to define a coherent model of patching data. A common representation of control data is needed and controller data must be passed by common protocols. This way, controllers can be general in purpose instead of being embedded inside the lower level signal processing facility and flexibility is increased. This trend is evident in current synthesizer design, especially high end workstations which must cope with diverse musical data and demands. Nice things result for the sound designer, but there is a downside, too. Often generalization of this kind leads to the controls no longer being intuitive and clear. Where you could once tell amplitude is this and pitch that, you now need to assign ten or so effects routing parameters and the relevant envelope reads EG3/Bus18[mute] instead of just saying volume.

Typical sources of control data include the various measures produced a keyboard controller (note‐on and note‐off, velocity, aftertouch, program (patch) change plus discrete and continuous controller messages—both specific and generic), LFO, envelope generators, sequencers and arpeggiators and so on. Some control data is generated through transformations of existing modulation—portamento is a nice example. Typical destinations are oscillator frequency, amplitude and panning position, other controllers’ parameters, mixer settings (like reverb send or vector synthesis mixing state) and, for discrete controllers, patch memory number, arpeggiator (e.g. start sequence on key‐on or pedal press) and effect on/off (like when controlling a unity gain echo).

Algorithms, attributes and parameters. Events vs. continuous control.

It is worth pointing out that not all numbers and switches in DSP systems really constitute parameters that can realistically be modulated, not in real time anyway. Perhaps the best example is seen in patches and algorithms—when changing the settings of a complex algorithm, it may be that a lot of data will be moved around and some complex precalculation done. It may even be that some code needs to be compiled or data unpacked (soft synths can use just in time compilation and memory limited synths sometimes heavily compress their sample data). All this means that it may not be possible to accomplish the operation rapidly enough. This is why we reserve a separate term for such sluggish parameters: they are attributes. We also separate algorithms from patches, like in the previous section—a patch is a given parametrization of an algorithm, composed of a fixed set of data items. It tells how an algorithm is set up to sound like something specific. An algorithm, in case, estabtablishes the parameters but does not set their values. An algorithm determines the boundaries within which we can patch.

We must also make a distinction between discrete events and continuous control. Events mark point like objects in time: start note, stop note, change patch and so on. Continuous control gives values of a parameter at each moment of a given time period. Some systems model continuous control as a series of set‐value events. In most cases this is less than optimal—the control may not be fine grained enough and keeping long event lists together while editing can be quite tedious. Event lists are not resolution independent, either.

There is an even more pressing reason to favor continuous control over long series of events. In Western music there is a long standing tradition of treating music as a series of notes, i.e. an event based view of sound. The best proof is the common music notation which is patently intent on expressing every musical construct as a series of rather inaccurately defined notes and expressive notations. The tradition dictates a rather rigid view of what is relevant in music. For instance, most techno freaks would agree that arranging a song really significantly alters it whereas traditional pop enthusiasts probably would not. On the composer’s behalf, continuous control is an important part in an attempt to break away from the limited traditional view of music; in modern music extensive intra‐note evolution is a defining factor. This is why basing today’s composition software barely on discrete events and metric grids is not a very good idea.

In tune with the previous paragraph, we must also question whether control data should be as tightly bound to note events as it currently is. Nowadays it is the norm that most modulatory control is evoked through note‐on/note‐off events. There are always the continuous controllers (and to lesser part, LFOs), true, but it is atypical to find something other than direct manual control with no inherent coupling to note events. We can argue that there is considerable demand for control methods driven entirely out of synch with the melodic and metric time structure of music: in such diverse styles as Western contemporary, Oriental, Indian and African traditional, techno/electronica and modern jazz/fusion we see many examples of timbral structures which transcend note boundaries and strict metric time.

In techno, which to me is the most familiar of the aforementioned styles, these structures typically take the form of a separate klangfarbenmelodie on top of a relatively hypnotic bass drone, usually realized through filter modulation. Ample use of time variable delay and echo effects is also observed, especially in goa trance and the many descendants of dub. It must be further stressed that they are not just some sugar coating over the music, but rather constitute an elaborate, absolutely essential part of the musical fabric. Acid house is perhaps the ultimate proof—the genre hinges entirely on timbral modulation. Played on a piano, an acid bassline simply dies; it may be the same three to eight notes are repeated some ten minutes in a row with no dynamic variation or rhythmical gestures.

Ranges, transformations and maps

In the introduction, it was pointed out that we also need the indirection imposed by modulation for reasons other than direct control. This is because few signal processing operations work on parameters which are human readable or understandable. Volume presents a basic example: although volume control is implemented by multiplication, the number by which we must multiply does not live on a linear scale. So one of the important roles of modulation control is to offer linear, psychoacoustically meaningful scales to the user. This is often less than easy: the more complicated a sound processing architecture (and make no mistake about it, even the simplest current synthesizers are pretty difficult beasts to master and design), the more difficult it is to predict what changing a parameter will do the final resulting sound. A related problem is that of keeping the final transformed parameters inside limits that produce meaningful, correct results when passed on to the signal processing algorithms. For example, it is only too easy to drive an oscillator into uncontrolled self‐oscillation and unstability. One of the goals in controlling a signal processing algorithm is to avoid expensive stability and range checks in the engine itself and map user parameters so that no problems can arise.

So far it may sound like mappings between machine and user parameters only limit what can be done. This is not so. Mappings can be used as a rich compositional tool and make composing immensely more enjoyable. As examples of useful mappings, tuning tables (instead of Hertz values), cutoff/resonance‐type filter settings (replacing second degree biquad filter coefficients) and modulation index (easing rate setup in FM synths). An even more important benefit is that without perceptually uniform scales one cannot efficiently automate parameter handling. For example, it would be quite arduous to implement vector synthesis if the user had to set four to eight volumes separately for each instant of time. Instead after transforming the coefficients into balances between some suitable opposite ends, using a joystick to control the synth is easy and fun. (For a working implementation, see the Prophet VS keyboard.)

The same goes for envelope control: once there are enough parameters and flexibility, it is quite productive to map the data into fewer meaningful dimensions and control those instead of delving into hundreds or even thousands of individual numbers. (Here the stark contrast between the control architectures employed by E‐mu in their Morpheus and Kawai in the K5000 gives a fair baseline. In Morpheus we gain intuition and pay in flexibility. K5000 buys a lot of power but ends up paying a high price in editing facilities and performance features.) Relevant parametrizations can even lead to device independence—fixing in on meaning somestimes makes it possible to work regardless of the underlying device configuration. (E.g. as mixer settings, direction and distance are more meaningful than channel amplitudes. Unlike the latter, direction and distance do not depend on the equipment used to render the sound—they can be used to derive amplitudes, HRTFs or whatever.)

A further important criterion for a proper parametrization scheme is that parameters seen by the user should, in a sense, be orthogonal. This means that after setting one parameter, altering another does not interfere with the effects of the first one—each parameter has its own regime of outcomes and the ground covered by one parameter does not overlap that of another one.

Granted, that’s all pretty vague. The problem is, again, one of psychometrics. It is notoriously difficult to fit all there is to say about sound into clean cut dimensions and measures. However, it is easy to give working examples of the concept, such as the usefulness of equalizers (frequency ranges do not interfere with one another) or the problems with waveshaping (there, twiddling around with timbre often significantly impacts perceived loudness).

Patches

In modern synthesizers there is a lot of room for twiddling. Unfortunately this also means that building a patch is a highly nontrivial and time consuming task. It is then imperative that patches can efficiently be stored, transferred and fetched back into use. Reuse has its place as well: most people do not have the time or the energy to build a new setup from scratch. The same goes for all setup information and is one of the reasons MIDI SysEx is so popular nowadays. Both in the studio and, especially on the road, one needs powerful ways of storing and retrieving patch data.

In addition to reducing user data entry, the patch store has other uses as well. One of the more important is that patch data in the form of presets is valuable in itself. There is a burgeoning market out there for new, interesting instruments and it has already been pointed out that interesting, useful presets sell synths and audio processors better than anything else. How a module integrates with patch management, CD‐ROM patch distribution, MIDI and general purpose instrument editors largely determines how it will fare on the open market. Interworkability of equipment is a further issue, especially between equipment of the same manufacturer and product line.

From the above one easily gets the picture of a static, click to use parameter store. This is misleading as well. Many possibilities for creative use of the memory facility exist. They include random patch generation, driving some patch parameters from external sources, recombination of modular patch data (in a workstation synth, separate patches for the tone generator and the builtin effects, layering and separate patches for different parts of the synthesis architecture are but three examples) and morphing from one stored patch to another. Analysis‐resynthesis, cross‐modulation and trial and error methods rhyme well with simple scratch pad memory, also.

User control

In this section we explore most of the commonly used control and modulation sources under direct control of the end user. Most of these are well represented in any current workstation synthesizer. Actually, most of the musicians interface in studio equipment is there precisely for this end. Most of the controls described in this chapter have their origins in the data model of MIDI. The original MIDI specification dutifully catered for most of the relevant device independent parameter passing needs of its time. It also the source of some common, fairly constricted views of musical data representation.

Note on/off and velocity

Note on and note off refer to the events caused by pressing and depressing, respectively, a key on a keyboard. Originally they were conceived as digital counterparts of the control voltages (CV) used in old analog synths. The analogy goes as far as to include some of the auxiliary data usually sent alongside these events. From a more abstract point of view, note on starts the processes associated with creating a time event, note off starts the procedure of ending the event. The event can be almost anything, witness the use of note on/off to control effects processsors and mixing consoles in MIDI automated environments. Sometimes only the note on event is considered relevant. This most often happens with one‐shot sounds, such as simple drum patches in a sampling environment. As only two discrete events are used as cue points, one can clearly see that note on/off is not the most versatile of control methods and that as a control method, it closely parallels the note oriented discrete view of music mentioned above.

Now we can start and end simple events. As yet there is no way to parametrize the event, however. Now, everybody knows that even in simple synthesis use we need considerably more power. More specifically we need at least a measure of how forcefully a key was pressed so that we can put in, e.g., dynamics handling into our instruments. This is why we associate a numerical parameter called velocity to our note on message. For symmetry we also associate a velocity with note off events. If the events are used to control synthesis, we also need to identify the pitch at which the note is to be played. These two values come from MIDI and were originally used to record the speed at which a key was pressed (or released) and the key which caused the event. Again stepping up to a more abstract view, they can be thought of as generic measures of the abruptness and sonic class of the events or simply as untyped parameters to be assigned a function later.

In synthesizers, the usual effects caused by a note on consist of initializing oscillators and automated controllers, such as envelope generators, LFOs, arpeggiators and wave sequencing lists. Velocity is commonly used to set the peak volume of the sound, perhaps the slope of the initial amplitude envelope and often filter parameters as well. To simulate the complex effects that arise in real instruments in response to varying playing force, different velocities can also be mapped to different patches or, more generally, be used to interpolate between settings. This is called velocity switching and is highly useful when realistic instruments are aimed at. Note off most often starts the roll off (release) stage of the ennvelopes, sets the speed of decay and more generally instantiates processes which are guaranteed to end up killing the note. In use other than synthesis, there is less consensus over what happens after note on/off. This is consistent with the fact that the events were originally meant exclusively for instrument control. (Whence their names.)

Thus far we have taken a look at a single sonic event. One might ask if multiple simultaneous events would result in a more usable data model. Indeed it does, something which was acknowledged by MIDI designers early on. To implement this, we need a way to correlate note ons with note offs. In MIDI, the events are connected by pitch—only a single note can be in progress per keyboard key. This shows just how tight the keyboard‐MIDI ties are. More generally, we might associate the events by any means. Examples include unique identification numbers or the strict sequence of events regardless of other parameters. (The latter approach is taken in monophonic instruments.)

One might ask how these events could be added to or improved. Obviously we could add to the number of parameters the events carry with them. From a different angle, we might question whether there is anything special in the number two; that is, whether we could associate but one or, on the other hand, more than two events (of both types, on and off) per one controlled musical object. Or add to our selection of event types (e.g. on, release sustain, release note, note off). Similarly we might stretch the interpretation of simultaneous events to include partitioning not only by pitch, but by amplitude or other attributes. Or remove the inherent ties between events altogether and consider all events to be part of a large, very complex event. We might consider whether overlapping event sequences should exhibit interactions. And so on, the variations are endless. This kind of control does not exist, however. This may be because of the traditions mentioned in the previous sections, or because most such events are difficult to generate with current equipment and to interpret into meaningful parameters to the lowlevel signal processing algorithms we use to produce the result sound signal. The MIDI line protocol is also a strong limiting factor—it can only accommodate simple note oriented data streams.

Aftertouch

Note events are a nice way to start and end time objects. But for more serious work, especially if non‐keyboard instruments and/or patches are involved, they simply do not suffice. This is because most instrumental music involves a lot of structure other than fixed length notes. Namely, extensive intra‐note control is needed. These needs do not surface with stringed keyboard instruments because the sounds they emit are by their nature almost completely determined as soon as the hammer hits the string. In the case of keyboard synthesizers the analogy to piano has lead to similar behavior: only one event at a time per key (pitch) and initial velocity as the main modulatory parameter. The next logical question is whether there are keyboard instruments for whom the methods discussed so far do not suffice. The answer is to the positive, with the organ as a primary example. Here we need to monitor the degree to which a key is pressed over the course of the note.

In the MIDI circles a control which follows the pressure exerted on a key in real time is called aftertouch. In MIDI, there are two separate flavors of aftertouch, namely channel aftertouch and polyphonic aftertouch. The first gives a kind of average pressure for all notes played on a given channel while polyphonic aftertouch tracks each pitch separately. Needless to say, the polyphonic version consumes a lot of bandwidth and is difficult to implement cost efficiently. This means that even these simple types of intra‐note modulation nare often unavailable or inadequately supported.

Aftertouch typically varies filter and volume settings as we would expect—the control was born to handle one subtype of dynamics control. Of course, the control can be bound to any parameters our patching architecture and synthesis algorithm allow. From a wider point of view, we again use a parameter to switch or interpolate between different sets of synthesis settings.

Trying to generalize the notion of aftertouch leads us to think whether it should really be so tightly connected to the idea of key pressure. Perhaps we should view it as a generic parameter as well. There is also an obvious need to tie such a control to the musical object it controls. When multiple such objects overlap in time, all the concerns raised over tying note events to each other above become relevant. Further complicating the situation, we might wish for more degrees of freedom, i.e. more than one separate, simultaneous stream of control data. Channel aftertouch has some additional problems of interpretation—after all, it is basically a monophonic construct which is nevertheless used in a polyphonic environment.

Discrete and continuous control

Using keyboard instruments as a reference, the controls covered upto this point seem to suffice. But other instruments offer significantly more degrees of control. For instance, string and wind instruments can be modulated in almost countless ways, most of which do not admit a simple one dimensional description. Embouchure, wind pressure, string stroke point and the various ways in which a string can be damped offer but a few examples. This is why any serious modulation architecture needs ways to pass fine grained, time variable control data as part of a musical object. Another reason for such a control function is that not all modulation belongs to a specific musical object. There are properties which need to change without any active, associated sound processing, and some which need to affect all (or at least more than one) overlapping object at a time. Such streaming control data is dubbed continuous control, or CC, again after MIDI. Its streaming, possibly note independent nature is what separates CC from aftertouch. Continuous control typically emanates from knob or joystick type controllers, and is the direct result of user interaction.

CC is used practically for every parameter one can imagine in a modern synth or module. It is so simple and straightforward. The downside is, continuous control generates huge amounts of data and multiple streams worth of CC are quite difficult to control in real time. Clearly, we should use CC mainly for slowly varying parameters which are well matched to the effect we want to accomplish. This way we can cope without extensive knob work. Used properly, continuous control gives a wonderfully intuitive handle to the internals of syntheisis algorithms. This is why it is often used to liven up performances and to give the kind of hands on feeling available on analog synthesizers.

To account for configuration changes which do not move on a uniform, continuous scale, we need something to represent switch action. The help comes through discrete controllers, in MIDI implemented as a heterogeneous collection of separate methods. MIDI has at least a discrete subtype of CC, program changes, separate time events for synchronisation, all the protocols implementalble by SysEx and so on. These controllers are used to carry information about pedal presses and so on. Generalizing, it can be said any discrete action is described by such a controller. Examples include patch, algorithm, routing topology, switch state changes and trigger events for automated controllers and one shot musical objects.

Like before, there is a lot of room for different variants with discrete and continuous controllers. Whether the controllers bind to specific parameters, are configurable, exhibit intercations between successive controller events, are bound to channels or musical objects, are unbound or are generally applicable vs. being limited to specific patches or modulation setups, all raise issues.

Automated control

Simple turn the knob control is nice, but for more complex sonic events, it simply does not suffice. This holds especially for performances in real time and for live, both situations in which cannot just slow down and retake. We need automation. This section deals with automatic control generation methods, especially those used in modern synthesizers.

Most of the methods described in this section have originally been developed for one of two ends: to fatten a dull synthetic sound or to mimic a feature of some important enough physical instrument. The same really goes for synthesis methods and effects. Only recently (from about mid eighties) electronic instruments have gained enough popularity, power and familiarity for sound design to transcend these traditional trends. Nowadays there are more criteria by which to judge a patch than how fat and realistic the resulting sounds are. Livelihood, density (in psychoacoustics, volume), novelty and playability are in high demand. This has acute implications for modulation architectures, and for automatic control methods in particular. Some of these new demands are pointed out in the following sections.

LFO

LFO is an acronym for low frequency oscillator. The term derives from analog synthesizers where an LFO was a special type of module operating at sub‐audio frequencies. A separate LFO was needed because the control inputs of the actual audio oscillators were not meant to be used at such low rates and as such were not accurate enough. A basic LFO, like any oscillator, produces a continuous periodic waveform. Most typically a sinusoid is used. When we feed an LFO signal to some parameter of the synthesis algorithm, we get periodic variation in the resulting sound. The typical modulation targets include oscillator frequency (resulting in vibrato or at higher rates, frequency modulation artifacts), amplitude (to produce tremolo or amplitude modulation) or pan (especially when used with audio effects). Varying the waveform produced by the oscillator produces slightly different effects, although in typical use some basic waveforms (which include triangle, saw, pulse, rectangle and sine) may be quite hard to distinquish from each other. A special breed of LFO waveforms are the random ones. These are derived from noise, usually by a low rate sample and hold (S/H) process or by lowpass filtering. (Noise fed thorugh a 2 to 10Hz S/H circuit routed to modulator frequency control input is used to produce the typical robot sound of 70s scifi movies—a fast series of random blips.)

LFO is a typical example of a control which is used for both fat and realism. Realistic vibrato can be achieved by letting a sine LFO drive the pitch of an oscillator around some basic setting. Controlling the amplitude of the low frequency oscillator gives us direct control over the amount of vibrato exerted. This is also the basic setup for the mod wheel CC in current synthesizers. For fat, multiple overlaid oscillators can be LFOed to different degrees. Lowpass filtered noise in the amplitude/frequency inputs of an oscillator also greatly broaden any sound emitted. Setups like these come from the early analog synths and pave the way for the wide array of LFO setups available today. LFO is, apart from envelopes, perhaps the most important tool in a sound designer’s arsenal when putting together lively synthetic instruments.

To broaden the field covered by LFO, we might consider chaining multiple LFOs in the way FM operators are cascaded. A wider array of basis waveforms or even completely user definable wavetables can be used. We can sum and cascade LFOs with other controllers. The LFO frequencies can be driven from outside. We might also consider whether there is any reason to limit LFOs to periodic or random waveforms. In synthesizer use we often need to consider the ties to note events, as well—should LFO operation be repeatable (each event causes identical results, meaning LFOs must be retriggered at each note on) or should the LFOs be freerunning, so that the actual outcome of a note event is dependent on the exact phase relationship between the LFO and the note event.

Enveloping

LFO is a nice thing, but it is really meant for repetitive or unstructured modulation. If we want to produce a specific time function in response to an event, we need something else. The controller which takes in events and produced a predetermined, continuous control function in response is called an envelope generator, EG. The name comes from the fact that originally EGs were mainlyh used to control the amplitude of oscillators. In electronics practice, the instantaneous amplitude of a waveform derived from a periodic source through multiplication by a sufficiently slowly varying multiplier forms the envelope of the waveform—the periodic waveform is enveloped by the amplitude function. Later the term was extended to cover any one shot control voltage/function. Some form of EG is a necessary part of any synthesis architecture.

Envelope generators come in many flavors. Since they have traditionally been used to generate simple, one shot events mainly for amplitude control, the basic structure is almost invariably the same. It accepts only a few time and amplitude parameters and produces a function which first smoothly rises from zero to some level and then returns to zero. There are two to six stages in the process and their length and slope are controlled by the parameters. The basic version is the so called attack–decay (AD) envelope. It consists of a ramp from zero to some predetermined value and back, with controllable times/slopes for up and down ramp. This is sufficient for most analog drum sounds. To account for varying note times, we can add a separate sustain stage to get an attack–sustain–release (ASR) envelope. In this case, the function stucks at the peak until a note off is received. To model sound decay, we need one more stage: decay. Now we have the archetypal analog EG, the attack–sustain–decay–release (ADSR) envelope. It first ramps up to a peak value, then decays to a sustain level, keeps there till note off and then releases to zero. All slopes and levels are settable. There are countless further variants with more or less parameters…

Typical targets of envelope signals range from the age old amplitude envelope to filter cutoff, pan, frequency, patch morph and beyond. The applications are endless—as long as we make music based on notes and need any kind of repeatable intra‐note evolution, envelope generation is the way to go.

EGs in analog synthesizers are heavily driven by the demands of the underlying technology. Ramps are exponential, there are only a few stages in the envelopes and it may even be that none of the parameters can be controlled in real time. Digital implementations can therefore significantly expand the definition of EG. We can consider more stages, different parametrizations, new ramp shapes (linear, log, exponential, spline etc.) and so on. All parameters are easy to control by other modulation sources. We can generalize the model to the point in which there is not a fixed number of segments and ramps between them, but a list of (time,value) pairs or even (time,value,type‐of‐previous‐ramp) triplets which are rendered at runtime to arrive at the desired control signal. Parts of the envelope can loop to generate cyclic events. In fact we can think of LFO as a subtype of general EG. With EG we can combine one shot and cyclic events by substituting an arbitrary loop for the sustain portion of the simple ADSR envelope. (Note off breaks the loop.) Quite rapidly we come to think of EG not just as a simple way to generate an anplitude envelope but as a generic function generator.

Up till now, we have only considered very simple functions: functions of a real time variable which produce a single real value. Or actually families of functions, since our envelopes are parametrized. But we could as well jump up the ladder and make the target a real vector space. In English, why not let an envelope generator generate multiple parameters at the same time? This can be achieved by a proper transformation from our simple real output to a space consisting of multiple controlled parameters and is precisely what is needed to control such parameters as spatial location. Similarly we can make what were previously considered attributes true parameters. This is actually implemented in many synthesizers: other controllers can affect the parameters of an envelope generator. A very simple example involves using velocity to control note onset slope. Of course, the same reasoning goes for any continuous controller. There are also strong links to morphing, presented later.

Considering how multiple simultaneous sonic events interact and thinking about the events driving envelope generation opens still new horizons. The relevant question is, why just note on and note off? Why not something else as well? Further reiterating my concern over note oriented architectures, I again want to question the tight bindings between note events and envelope generation. There is no basic reason why envelope generation should be synchronized to notes and not have its own specific methods of invokation. A design such as this would give us an elegant way of formalizing the notion of timbral melody (klangfarbenmelodie) in the framework of digital sound synthesis.

Arpeggiation/sequencing

The ideas represented in the sections on LFO and EG work fine we desire to control continuous parameters—EG and LFO automate CC. But what about events? Could we automate discrete and event based control as well? The answer is, again, to the positive. Event generation usually comes under the heading of sequencing (when used to drive an entire synthesis/processing architecture) or arpeggiation (when we talk about generating events for an independent part of a larger synthesis architecture). There is a strong analogy between discrete and continuous control, arpeggiation and generalized LFO, and sequencing and EG/function generation, respectively.

The most straight forward variant of arpeggio involves toggling between a set of two or three sounds at a rapid pace. This is the form which was once again first employed in analog synthesis. Going up a notch, we might give an arbitrary sequence of pitches and then begin to control the length of the inter‐note time interval. After this it is easy to add looping, first the whole seuqence, then parts of it. Now only the jump to real events remains. After this we have a very versatile way to generate trains of repeatable time events, complete with any parameter information we need. To add livelihood, some parameters may be subjected to other forms of modulation, such as plain continuous control. Or methods can be cascaded and/or overlaid.

To see concrete applications, we repetitively turn to techno. Techno (and especially acid house) is filled with drones—highly repetitive melodies composed mainly of pitch and timbre changes. To make things interesting, we might put synth filter settings or reverb level partly under knob control.

Sequencing is the more comprehensive variant of arpeggio. Most often this means that sequencers work at the level corresponding (at least roughly) to MIDI events. One sign of the difference in conceptual level is that while few arpeggiators connect events to form on/off pairs or allow overlapping events (multichannel operation), in sequencers, this is the norm. To account for this level change, sequencers attempt to normalize the data model so that instead of working with actual synthesis parameters or modulation targets, we pick events from a more limited set, such as MIDI notes. If we have enough variety and the modulation architecture allows it, this is not a significant limitation: we can always view what happens in the sequencer as metaevents which are mapped to more complex event chains at a lower level. The main application of sequencers is the editing and storage of songs. Music as opposed to signals and events, that is. This is why sequencers usually offer fancy editing features and a nice, ergonomic user interface. For a comparison, most arpeggiators don’t do anything even nearly as fancy.

Transitional processing

It has already been mentioned that some interaction between overlapping and consequtive musical objects could benefit us. It seems that most musically relevant bindings between overlapping events must be implemented at audio rate, i.e. at the level of the underlying synthesis architecture, so primarily we are left to consider how adjacent events affect each other. In this arena there is a whole field of transformation and transition idioms which we inherit from traditional music. Idiomatic transitions have mostly evolved as a natural part of playing an instrument with limited polyphony; wind, brass and string. Perhaps the most well known such idiom is portamento, sliding from one tone to another. Adapting the notion to the framework of synthesis and sound processing, we get interpolation from one set of parameters to another. As long as we have musically sensible, uniform scales in which to work, the notion applies equally to amplitude and timbral settings. Some need may therefore arise for additional parameter transformation to generate trajectories better suited for straight forward mutation.

Interpolation (in the generalized setting, morphing) is nice but does not capture the whole breadth of meaningful transitional effects. An excellent example is the occasional need to generate new effects to achieve a meaningful transition. More concretely, a good physically modelled wind instrument might benefit from simulating a note transition as a sequence of randomly sequenced key hole state changes. Or we might enhance a sample based guitar instrument by selectively inserting fret and plectrum noise. The point is, there is a definite need to generate additional events and control trajectories beyond simply sliding from one parameter state to another.

In fact, some entire synthesis methods are based on the notion of generating transitions. We might see Roland’s linear additive (LA) synthesis as an instance of this class: the synthesis model combines a sampled transient to a simple looped wavetable with envelopes. The transient could be seen as a primitive form of transition processing. A more comprehensive example is given by diphone synthesis, which is based on LPC analysis‐resynthesis and models note to note transitions by driving the LPC synthesizer with a set of precalculated sets of coefficients, one for each before‐after pair of notes. The paradigm is employed in many commercial samplers as well. In fact, this is the reason the concept of layers was originally introduced into sampling—there is a need to model one‐shot, transitory phenomena in addition to playing and looping entire stored waveforms.

Parameter morphing

Morphing has been repeatedly referred to in the preceding. The word comes from metamorphosis, which means transformation. In the musical context it means generating a smooth set of intermittent states of to achieve a fluent transition from state of parameters to another. This is perhaps the most general definition. We must note that it is impossible to develop a general morphing technique if the start and end states do not admit similar parametrization. In English, this means that morphing is only possible between patches that employ the same synthesis algorithm, patches that have the same parameters. When this holds, the problem can be seen as a one of trajectory generation in a high dimensional space. It is quite essential that all the parameters which take part in the interpolation process are continuous. This is because otherwise it may be impossible to avoid discontinuities along the way—we want to generate an approximation to a continuous trajectory in our state space.

Once we have an algorithm to generate the intervening steps between two states, we can define a new controller which at its extremities assigns the synthesis algorithm one of the two base states, and in between generates a hybrid form. This is the morph controller and it acts much like any other controller. We can drive it with an LFO, for instance. We can go even further by considering hybrids of three or more states. The morph controller can take other parameters as well—there is always a multitude of ways to generate the trajectory from one state to the next. The trajectory generation method can therefore vary arbitrarily and can be chosen by external input. Morphing from one state to another easily generalizes to morphing through entire lists of states, under sequencer control. The decoding step of the GSM voice codec can be described as precisely such a process, as can most (analysis‐)resynthesis methods.

Finally, to step down from the skies, Emu’s Morpheus does just the kinds of multidimensional morphing described here. It allows morphing between arbitrary patches under combined user and envelope control. Basically, the Morpheus implements a sort of three dimensional morphing scheme.

Signal dependent modulation

 ‐Versus effects?
 ‐Is there really a difference?
 ‐Discuss from the point of view of methodology
  ‐can a scheme like this be used as an afterthought?
  ‐an effect usually can…

We often encounter situations in which we would like very fine grained control over a large set of parameters. The work needed to generate the data may be too large, however. Occasionally we can extract the parameters from an existing signal, however. A basic example is the design of additive instruments: we do not need to produce kilobytes after kilobytes of parameter data if we can extract the data from existing instrumental samples. From the controller viewpoint, this means extracting data from some existing signal on the fly and generating control output thereof. Such methods have lots of applications and are also tighly connected to effects and analysis‐resynthesis applications.

Above, in the effects chapter, we have already encountered a rather sophisticated example of precisely the kind of control proposed here. Namely, vocoding extracts control data for a high order filter from an existing sound signal. Dynamics processing offers a simpler example in the form of ducking: here we use the amplitude of one signal to drive the dynamics processor used on another. By now we can see that driving arbitrary parameters by the extract probably isn’t as useful as extracting a parameter and substituting it for a similar one in the controlled process. It is rather clear that this is the easiest way to achieve predictable effects without extensive transformation and mangling of the control data. This is not to say that mapping is not good, of course. Quite impressive effects can be achieved by reorganizing parameter data before resynthesis. It is how vocoders can be turned into harmonizers, pitch quantizers and so on. But in a general purpose modulation architecture arbitrary routings will mostly result in confusion and chaos.

Parameter extraction can be used at a more basic level, too. Sometimes the signals we want to use to control our instruments and effects come from sources which do not have a clear cut digital interface. The most well known candidate is the electro‐acoustic guitar. In cases such as this, a parameter estimation stage is a fundamental part of interfacing to the controller and can be fitted to our formalism under the current heading.

It was mentioned above that close ties between analysis‐resynthesis, parameter extraction‐reuse and effects exist. Where do the boundaries lie, then? A useful guideline is that if a direct mapping between analysis and the use of the ensuing parameter data exists (like frequencies to frequencies with no transformation), we have analysis‐resynthesis. If on the other hand extensive transformations are used and/or the usage stage involves considerable amounts of variables which are independent of the extracted ones, then we are reusing generated parameters or perhaps doing cross‐synthesis. Finally, if the nature of our setup is such that only slight quantitative changes arise when the extracted parameters change and the bulk of the other input data involved on the synthesis side is audio rate, we are probably in the realm of effects. To put it differently, if something can be used as an after‐thought, it is an effect.

Support functions

In this chapter it has become clear that the control functions associated with sound synthesis, effects and other processing are not at all simple. In fact, they often constitute the bulk of effort going into the design and implementation of digital sound systems. This is greatly compounded by the fact that control as such is never enough. Since there can be substantial amounts of control data associated with each control (hundreds or thousands of parameters, in the case of sequencers even more), proper ways of displaying, searching and editing the data are essential. There are also many one time tasks which need to be accomplished before the control data can be put into real use. Examples include such editing operations as quantizing and transposing sequencer data, compressing additive analysis data for later use in synthesis and so on.

Depending on the environment, these support functions take various forms. In a lowly rack mounted effects module, there is not much room for a fancy user interface or an extensive editing architecture. In a PC sound editing environment, anything is possible. To achieve a uniform, easy to work in environment, many device manufacturers have ditched onboard editing functions altogether and migrated to separate editing programs running on a standard PC. Data is moved through MIDI or, sometimes, SCSI/SMDI. This makes it possible to utilize the lavish graphical user interfaces, all‐round device support and huge software base present on the desktop. They also lead to easy maintenance and upgrade of the editing software, easy interchange of patch data (for reference, see the extensive CD‐ROM patch collections marketed in industry magazines) and more open interfaces to the equipment. These are also the reasons why sound processing to an increasing degree centers around computers.

It is also noteworthy that some of the data and tasks associated with sound synthesis, effects and their control are very different from the kinds of things the standalone equipment was originally meant to handle. Whereas computation speed, tried and true interfaces, out‐of‐the‐box usability and nice sounding presets are the currency of the hardware trade, the control side of the equation concerns itself with very different values. Here, openness of architecture, interfacing across vendor boundaries, easy maintenance of large databases and the need to give the best possible tools to the musician are what counts. This means the separation between actual signal processing apparatus and the higher level editing, transformation and archiving functions is only natural. The only real obstacles to this development are the low speed and representational power of MIDI and the lack of alternative standards for data interchange. This may well cause some real commotion in the music industry—the computer circles are famous for their speed in reacting to deficiencies. It may well be that tomorrow’s audio architecture comes from within the computer circles rather than the traditional music industry.