Vocabulary

Absorption

In absorption, wave motion is attenuated—usually by conversion into heat. Usually the unabsorbed part of the original wave is reflected.

AES

Audio Engineering Society; an American organization of audio engineers which standardizes audio related technology and forms a common forum for experts in the field.

AES10 or MADI

Multichannel Audio Digital Interface; a unidirectional multichannel digital audio transmission standard originated by the AES. MADI is based on the FDDI (Fibre Distributed Data Interface) transmission format, but usually uses coaxial cable instead of optical fibre. Accommodates up to 56 channels and 24 bits per sample. Used for point‐to‐point multichannel digital audio connections in studio and broadcasting environments.

AES/EBU digital audio bus (AES3)

A digital sound transmission standard which is based on a synchronous, self‐clocking RS‐422A compatible physical layer on top of which stereo digital audio and associated data (called sub‐channel data) is transmitted. The standard has been strongly influenced by CD technology, and is mainly used between digital studio equipment. The standard specifies multiple sample rates (32kHz to 48kHz) and sample bit depths (upto 24 bits per sample). Originally developed by AES, later adopted by EBU, hence the name; the correct, official name for the standard is AES3‐1985.

Beating

A phenomenon produced by the interference of two sinusoidal sounds at close enough frequencies; the audible sensation is the one of a single sound, periodically modulated in amplitude by a second one. The reason for beating is the inherent incapability of the human hearing of separating two close to each other frequencies and the mathematical equivalence of such a combination to a sinusoidally amplitude modulated sine wave.

Causality

A property of systems which states that the system does not laugh before tickled: nonzero output only results from nonzero input and the output never occurs before the input. Causality is a necessary condition for realizability—in the real world, systems simply do not know their future history…

Clipping/overdrive/saturation

When circuits or transmission media are driven past the point of their maximum input amplitude, they tend to limit the signal to its maximum value. This can happen sharply (digital full scale) or softly (the sigmoid type limiting action of analog tape) and results in the effect of hard or soft limiting, respectively. Limiting produces heavy sidebanding and, consequently, harsh and nonconsonant distortion. Synonymous terms are overdrive (especially when speaking of amplifiers) and saturation (taken from tube amplifier terminology).

CODEC

Coder/decoder. When talking about data transmission, a coder/decoder is a device or algorithm which works on a bidirectional data link, coding transmitted and decoding received data. Audio codecs usually use computer files, multimedia data streams or TV broadcast channels for their data.

Composite

A loose term used in this tutorial to mean sound signals that originate not from a single, but multiple sound sources, notably instruments. Examples include most recordings and natural sounds.

DCC

Digital Compact Cassette; an attempt by Philips at establishing a digital successor to the common analog C(ompact) cassette. The format utilizes a specially shielded cartridge with the dimensions of the traditional C‐cassette. The media is similar, as well, so DCC players can play analog cassettes. Because of this (and because recording on both sides of the tape would have been difficult), rotary heads were not used. Although multiple parallel tracks are used, the data rate of the tape is severely limited by the design and data compression needs to be used. The method chosen by Philips is MPEG layer 1 audio coding, only under the name PASC. DCC is now officially a dead format, killed by technical difficulties and heavy competition from Sony’s MiniDisc.

decibel

A logarithmic unit of ratios. Defined as ten times the base‐ten logarithm of the measurable divided by a reference level. For different quantities measured and different reference levels we get all sorts of decibel measures. The most typical are dB SPL (average sound power/average power at threshold of hearing), dBu (average voltage/1 Volt) and dBW (power/1 Watt).

EBU

European Broadcasting Union; an organization formed originally by national radio stations in Europe. Specializes in broadcast audio distribution technology. Current standardization efforts include terrestial digital radio, both for audio and various kinds of data.

Harmonic (overtone)

Given a signal, we can decompose it with the Fourier transform. Then, a harmonic (of some particular frequency present in the transform) is any frequency (also present in the analysis) which is at a whole number ratio to our base tone. If the signal is periodic, every partial present in the analysis is harmonic. The term implies an underlying base tone to which the harmonic in question is related. (Thus, one doesn’t say that components present in a composite signal are necessarily harmonics, even though they may appear in integer frequency ratios.)

HRTF

Head related transfer function; the transfer function of the system resulting from the linear filtering action of placing the human body (especially the head) in a sound field. The main components arise from shoulder and ear lobe reflections and from diffraction effects on sound travelling around the head. Also used to denote the impulse response of such a system or any processing method simulating such a system (the usage is quite fuzzy indeed).

IMA

International MIDI association; a consortium formed as a place for users of MIDI and related software to discuss their problems and propositions. IMA keeps close contacts with MMA to relay user input and suggestions to manufacturers.

Interpolation

Interpolation is a technique used to reconstruct waveforms from discrete samples taken from them. Many different such techniques exist, differing by their underlying mathematical structure. Most common ones are based on fitting a polynomial of some degree to the sample data. From this come the terms linear, quadratic, cubic and so on. These refer to the degree of the polynomial that is fitted. Interpolation methods are also named after the families of polynomials used (Chebychev, Legendre etc.) and their construction (NURBS). Common to all these methods is that they strive for optimality in some sense—most try to achieve smooth approximations with the resulting curve passing through the data points. When used to reconstruct acoustic waveform from evenly spaced samples, polynomial interpolation is never the optimal way. Instead, reconstruction by approximations to perfect lowpass reconstructing filters should be used.

Medium

A scientific term used to denote the underlying substance or space where waves travel. In the context of audio systems, air is usually the medium, although compressible liquids and solids can also transmit sound.

MMA

MIDI Manufacturers’ Association; a consortium formed to promote and refine the MIDI specification and to guide in the implementation of the standard. MMA extensions to the original MIDI specification include MIDI time signaling and SDS.

Modulation

The variation of some characteristic of a signal or a parameter of an algorithm producing the signal to achieve some specific goal. Examples include amplitude modulation (time‐variant scaling of a signal (AM, tremolo)) and frequency modulation (variation of the repetition rate of a (quasi‐) periodic signal (FM, vibrato)).

MP3

MPEG‐1 or MPEG‐2 (sic!), layer 3 audio coding; a lossy, perceptual audio coding format widely used for the transmission of stereophonic sound, both in commercial and non‐commercial environments. Layer 3 is the most sophisticated of the 3 layers specified for MPEG‐1 and MPEG‐2 (They share the same audio bitstream formats, only the allowed bitrates differ. Funny enough, MPEG‐2 allows only three of the lower bitrates.). The standard does not specify the codec, per se, only the bitstream syntax. However, implementation issues have stabilized fairly well by now. MP3 offers excellent audio quality for music and similar sound encountered on soundtracks at relatively low bit‐rates (in the range from 48kbps to 192kbps). Isn’t suitable for very low bitrate speech coding, for which different methods exist. The acronym comes from the common filename extension used for files of this content. (FYI: Philips used MPEG‐1 layer 1 audio coding in DCC, only under the name PASC.)

MPEG

Motion Picture Experts Group; a joint consortium of motion picture engineers. Standardizes movie related material. Commonly known for its MPEG‐1, MPEG‐2 and MPEG‐4 standards, which pertain to the digital coding and transmission of moving picture and associated sound (MPEG‐1‐MPEG‐2), and multimedia (MPEG‐4, in draft stage).

Partial

Given a signal, we can decompose it with the Fourier transform. Then, a partial (of some particular frequency present in the transform) is any frequency present in the analysis. The term implies an underlying base tone to which the harmonic in question is related. (Thus, one doesn’t say that components present in a composite signal are necessarily partials.)

PQ code(d)

PQ refers to the first two (of the eight, named from P to W) subchannel bits on CDs. These are used to carry auxiliary data, such as track information, the table of contents (TOC), catalog numbers, ISRC (International Standard Recording Code) information, de‐emphasis status, SCMS copy propagation control and so on. The majority of this information is carried over the Q channel, accumulated in 98 bit frames, whereas the P channel carries a simplistic code denoting the starts and ends of CD tracks, lead‐in and lead‐out areas. Most current CD players are sophisticated enough not to use the P channel code at all, since all relevant information is also available through the more sophisticated Q coding scheme. The addition of PQ code is a major portion of the CD mastering process—often manufacturing plant bound masters are simply referred to as being PQ coded or PQed.

RCA

Radio Company of America; a long gone manufacturer of radio and audio equipment in the US.
A connector used to interface audio equipment at a line level of 4dBu in an unbalanced mode of transmission. These take the form of an 8mm wide ground plug/jack with a concentric 2mm plug/jack (opposite polarity) in the center for signal and are usually colored red/white vs. black (for left/right signal wires). Originated by RCA the Company. Currently the most common way of interfacing consumer audio equipment.

SCMS

Serial Copy Management System; a protocol used for restricting digital copying of audio material in consumer applications. Based on sub‐channel coding of generation identifiers and copy protection bits on digital audio media, such as DATs and CDs. Only implemented in consumer mode applications, pro mode applications ignore SCMS. AES/EBU in pro mode cannot even convey SCMS information.

Shift‐invariance or time‐invariance

A property of (linear) systems which states that the response of the system does not depend on the time an input is applied. A FIR or IIR filter with constant coefficients is a prime example. A flanger or a tremolo is a prime example to the contrary.

SMDI

SCSI Musical Data Interchange; a data interchange standard originated in 1991 by Peavey Electronics. In the late 80’s and early 90’s, samplers were coming into fashion and a standardized way to exchange sample data was needed. As MIDI was quite old and extremely slow (MIDI choke was a problem even then), it was seen that a new bus was needed. As the SCSI (Small Computers System Interface) bus already existed and had proven to be interoperable, SMDI leveraged the existing technology. Nowadays SMDI can be used to convey all kinds of information besides pure sample data and is invaluable whenever samplers need to be integrated to the rest of the studio. As an added bonus, computer connectivity and use of existing SCSI hard drives became possible.

SDS

Sample Dump Standard; standardized by the Midi Manufacturers Association, this protocol allows unified downloading of sample data to synthesizers and samplers through the MIDI bus. Utilizes SysEx messages and offers two separate modes: open loop and closed loop. Open loop corresponds to the usual MIDI connection topology, in closed loop configuration a separate return cable is used to provide feedback. SDS is extraordinarily slow, even in the context of the MIDI physical layer. In addition, operating SDS reliably is quite difficult (to use SDS in closed loop mode, the physical cabling has to be changed, for instance) and so the standard is not currently widely deployed in studio environments.

Sidebands

A frequency components added to a signal when put through a suitable modulation process. Especially AM and FM produce sidebands. The name implies roughly symmetrical placing of the added components relative to the original unmodulated signal.

SMPTE

Society for Motion Picture and Television Engineers; an organization of motion picture and television technology experts that standardizes technical aspects of moving picture and related data (such as audio) transmission and coding, such as frame rates, time codes and modulation techniques. Responsible for the time code format of the same name which is commonly used in broadcasting, film production and professional audio applications as a common synchronisation standard to relate pieces of audiovisual presentations together.

S/PDIF or IEC‐958

Sony/Philips Digital Interface; a consumer derivative of the AES/EBU bus. Standardized by the International Electrotechnical Commission under the name IEC‐958, but marketed as S/PDIF for consumer applications. (Technically, these are two different standards but in practice, they are almost identical. They interoperate perfectly.) Uses simplified AES/EBU (consumer mode) and includes provisions for copy management through SCMS. Used primarily for digital audio transmission in consumer applications, such as CD players, DATs, MD players, and DCC recorders. Applied on top of both electrical and optical interfaces.

SPL

Sound Pressure Level; sound pressure levels are a measure of average sound power. They are defined as the average power of sound, relative to the threshold of hearing, on the decibel scale.

Stability

A property of linear systems stating that given any bounded input, the system will produce a bounded output. Other ways of saying the same thing is that the system function has no poles on or outside the unit circle, that the impulse response approaches zero as time goes to infinity or that the system does not exhibit self‐oscillation.

Subband coding

A generic term describing a method of coding signals where filter banks are used to divide the signal into several (frequency) subbands and their output is encoded instead of the signal itself. Decoding is done with a reconstruction filter bank. Includes transform coding as a subset.

Subchannel coding

The transmission of auxiliary data on CD data frame subchannel bits. Includes PQ coding of track and SCMS data, as well as the additional data oriented applications standardized as CD+G and CD+MIDI. Later, the same coding was transferred to AES/EBU frames and DAT tape.

Superposition

The addition together of multiple signals.
Mathematically, the superposition principle characterizes linear systems. What it says is that, first, if we input two signals to the system and add the respective outputs, we get the same result as we would get by inputting the sum of the original signals and observing the output (additivity). Second, if we amplitude scale the signal by a constant and observe the system output, the result is equal to inputting the unscaled signal and only after that scaling with the constant (homogeneity). All this put into a single formula gives the superposition principle. It is usually applied backwards when we already know the system obeys linearity.

Transform coding

A generic term describing a method of coding signals where a suitable (linear) transformation (such as the FFT or the DCT) is used to break up the signal before coding. Decoding is accomplished by the inverse transform (or an approximation to it, if the coding process is lossy). Specifically, the term often implies block coding: the signal is first divided into blocks and the transformation is applied separately to each of these. Transform coding is a special case of subband coding. The idea is to concentrate the energy (variance) of the signal into as few coefficients as possible, giving space‐efficient representation and leading to compression.

Vocoding

The superimposition of the estimated varying short‐term spectral envelope of a signal on another. Used as an effect to create illusions of singing instruments and other spectral hybrids of separate sound sources.