Sound as a physical phenomenon

Sound is something most of us know and love. But although we hear sounds every single day of our life, there are many aspects to experience that one usually doesn’t pay attention to. These are the basic questions of the nature of sound, the reasons for our hearing anything at all and the mechanics underlying the sensation. It is not entirely clear how sound is actually generated in physical objects, why different objects do not all sound alike or how all this relates to our sensory apparatus. When taking a closer look at sound as a physical phenomenon, many interesting characteristics of sound come to proper focus. The relevant topics include the wave characteristics of sound: diffraction, reflection, interference and so on. Taking into account the peculiarities of the human sensory organ and the psychology of perception in general, one is taken into the field of psychoacoustics—the study of human auditory perception and its underlying mechanisms. This is a field rich in applications and interesting discovery, not all of which seems intuitive at first sight.

As one application perceptual sound codecs could be mentioned—how can an MP3 codec throw 90% of an audio signal away and still reproduce a perceptually near perfect replica of the original?

What is sound?

Sound is a wave phenomenon. That is something we’re all told on highschool physics courses. But usually no time is left for a nice intuitive picture of the thing to build up. That’s what we will try to construct, next.

To create waves we need a medium in which to put them. Obviously the medium needs to be elastic in order to generate any waves at all. Additionally there must be a kind of stiffness which holds the adjacent parts of the medium together. This is what makes wave propagation, a form of energy transfer, possible. It also determines the kinds of vibration which can propagate in the medium: unlike solids, gases and liquids lack transverse binding forces and so only longitudinal energy transfer is possible. This is why sound cannot be polarised. The other important intrinsic property of the medium is its density—together density and the strength of the binding forces determine the speed of wave motion in the medium.

Note that both density and the stiffness can vary within the substance: the medium can be inhomogeneous. In addition to this, the binding forces can be different depending on direction. In the latter case we talk about anisotropy, which fortunately does not occur in gases.

In the case of sound, the density is just the usual weight per unit of volume of gas and the stiffness is borne of a balance of repulsive forces between molecules and their mean kinetic energy (temperature). In a closed system of moving particles the sum kinetic energy will stay constant and statistical physics predicts that the mean distance between the equally repellant particles will try to even out. Similarly the expected velocity (with direction) of particles over an arbitrary volume will be zero—at large scales the gas will tend to stay put even if the molecules themselves move quite a bit. Only after we upset the balance do the mean properties ripple. These mean properties are used as a stepping stone to the analytic model of a sound field, which simply forgets that the molecule level ever existed, assigns a real velocity vector and a real pressure measure to each point in space and lets these vary over time. From Newton’s laws we then get a relatively simple partial differential equation which tells that pressure gradients cause acceleration of particles and that the velocity at some point causes the pressures to change around it.

Often we make the further assumption that at each point, the velocity vector equals the gradient of the pressure field, though this way we lose part of the generality of the model.

To get a hold of the process of sound propagation, we first look at a simple example: a point source. This is simply one point in which we explicitly control the sound pressure or the velocity vector of the field. In practice, neither of these can be controlled independently of the other. When we excite the medium by creating a disturbance in its structure, coupling between adjacent particles makes the disturbance try to even out, globally. The forces arising from the new inhomogeneity accelerate the particles toward lower pressure. But this, assuming the pressurised region is point‐like, can only mean that the disturbance moves outward. The stuffed particles get pushed off from the center of the region. This makes them move and, consequently, pushes fresh molecules out of the way. Voila: motion. Once this has happened, the pressure is evened out. It is worth pointing out that the pressure wave does not get ironed out in the inward direction. This is due to the inertia of the molecules—once the pressure sets them in motion, the pressure moves in the direction the first molecules go in. What happens is similar to slamming a pool ball against another. Since all this has happened through what are almost 100% elastic collisions of particles, little energy is lost. (Some is, to the mean kinetic energy of particles, raising the gas temperature. This accounts for some of the attenuation that sound experiences while travelling.) As long as no new particles are brought into play the net effect is that of making the pressurised region move outwards from the point source. Note that individual particles do not move appreciable distances in the action but stop after transferring their kinetic energy to the next one. This is a characteristic of wave phenomena: seen at a larger scale, energy moves, not the medium. Put in another way, turning the volume knob east does not produce a tropic storm.

It is very important to separate such concepts as the pressure field (scalar in 3D), the density field (scalar in 3D), the velocity (3D vector in 3D), the gradient fields of the first two scalar ones (3D vector in 3D), the time derivative fields of the scalar ones (scalar in 3D) and all the derived fields on top of these. When bypassing the mathematical notation, it is exceptionally easy to confuse the first and second time derivatives of the pressure field with velocities and accelerations (which are taken relative to spatial coordinates and are, hence, vector fields).

Above, we compressed the air at the source. But the same principle of wave transmission applies if the opposite is done—namely, if we create a depressurisation zone. In this case, the surrounding particles move in and the zone moves outward again. Also, the amount of depressurisation is significant—the more violent the original disturbance, the bigger the propagating bump in the medium. Actually, aside from the energy of the original defect getting divided into a larger area and thus growing fainter and fainter per unit volume, one can accurately construct a series of more or less violent distrurbances at a point by measuring the local air pressure at a point some distance removed from the sound source. This is, roughly, how sound propagates through air in free space and is experienced from afar.

A few things have to be noted about sound radiation. The first thing is that we speak about pressures. The importance of this is seen when thinking about a speaker cone that moves very slowly. In this case, the air has time to escape from before the cone instead of forming a high pressure zone. Apparently efficient radiation is not possible. We see that in order to emit considerable radiation, rapid variations in pressure or large radiators must be used. This is typical of wave emission—it is why microwave radio transmission requires only small antennas while low frequency AM radio often employs dipole antennas that are tens of meters long. From this we come to the second point: in order to continuously emit sound, we cannot just move the speaker cone further and further ahead. Instead, the cone has to come back before the air has time to escape around the edges. In physics, the situation is discussed in terms of coupled systems and impedance matching. The principle is, sound emitters work best when the inertia of the medium keeps the medium from moving appreciably and on the other hand the emitter’s own inertia isn’t large enough to make it hard to move. Back and forth motion is the normal mode of wave transmission, not the impulses we have discussed so far. A special case of such movement occurs when the motion repeats at a constant rate and each cycle involves the same precise pattern of movement. In this case we speak of periodic motion and periodic sound/signals. The rate of repetition of a periodic motion is dubbed the frequency, with Hertz (Hz) as its unit. Hertz is the SI unit meaning times per second. As each part of a wave traverses at a constant velocity and, at each fixed point in space, the vibratory motion repeats at a constant rate, we see that one period of the motion is always exactly duplicated in a certain interval of space that depends only on the speed of the wave motion and the frequency of the wave we drive through the medium. As we work in a single medium, the speed stays constant, so the length of our interval depends only on the frequency of our vibration. This is the wavelength corresponding to the frequency, with an inverse dependency on it. Because of the properties of what we will become to know as linear systems, a certain type of periodic wave has a very special position in our treatment. This wave is the sinusoid. It is the smooth, endless, periodic function which we bump into in trigonometry. The sine wave has the property that when put in a linear system (in our case, transmitted through air), it comes through as a sine wave with the same frequency. The only variation comes about in the form of a time lag and a change in strength. When a combination of sine waves of different frequencies is introduced, they go through as if the other waves weren’t even present.

In later parts of the text when we talk about sound, we usually mean pressure variations measured at a point. This is because we have ears which are relatively small compared to the wavelength of audible sound—we can with good accuracy say that ears are pointlike with regard to sound fields. Thus few humans even fully comprehend the real, complex vibrational patterns which occur in three dimensional spaces—evolution has not equipped our brain to do such analysis. This fact is a double edged sword, really—it would be nice to actively understand all the phenomena involved in sound transmission since all such things affect what we hear but, on the other hand, mathematical description and manipulation of 2+ dimensional wave phenomena quickly becomes quite unwieldy. It is quite a relief to scientists, engineers, technicians and artists that such considerations are not strictly necessary to fool our hearing.

Interference

When nonsinusoidal sources and/or a number of radiators and/or closed spaces are considered, things get interesting. At once we note something called interference. It is what happens when more than one source is placed in the same space. At each point in space, the individual contributions of our moving pressure zones (one for each emitter) just add up. We get what is called linear wave transmission. The name comes from mathematics and means, roughly, that given a bunch of signals, we can first add and then feed through a system or first feed through the system and then add, with equal results. To a considerable degree, this is what happens with sound. In spite of its rather technical connotations, linearity is a true friend. Without it, there would be little hope of understanding anything about sound at an undergraduate level.

Said in another way, at small to moderate amplitudes, sound transmission in large scale obeys a second order linear partial differential equation, called the wave equation, which is seen in all branches of physics and is covered early on in physics education. As is well known, once we know some solutions to a linear differential equation, we get more by scaling and summing.

Now, as periodic waves interfere, it is interesting to see what happens in a single, fixed point in space as time evolves. Let’s suppose we have a one‐dimensional string where a single sinusoidal sound source is present. We know that the pressure in a single point reflects that of the source at any place, save the time lag it takes the vibratory motion to reach our point and the attenuation resulting from friction and other damping forces. If we now add a second source with an identical frequency but a different placing on the string, we get standing waves. How does this happen? Think about the peak of one period of the motion. As it leaves the two sources, it travels at a constant velocity away from them. Precisely at the middle, the two waves meet and we let them interfere; they add together. The same applies for the valley parts of the wave. So in the middle, we get twice the amplitude. We say the two sounds are in phase with each other. Let’s take another point, this time choosing it so that the time to get from source 1 to the point is precisely half a cycle time greater than the time to get to our point from source 2, that is, the difference between the distances to the sources is a whole number of wavelengths plus one half. This time, the sinusoids always arrive at our point precisely when they cancel each other out. So in this point, we never observe vibratory motion. Points of these two kinds occur repeatedly over the entire length of our string, with the amplitude of the sinusoid motion varying between them from zero to double the source amplitude.

The last example was very simple, as only one‐dimensional effects were considered. If two‐dimensions are used, we get a nice interference pattern, where our special points recur on points where the distance is, again, a whole multiple of half the wavelength.

We remember from highschool geometry that, given two fixed points, if we draw a curve of those points where the difference of distances from the fixed points is constant, we get a hyperbole. So the knots and humps of our interference pattern on a plane occur on hyperboles with the point sound sources as foci and the spacing of the points determined by the wavelength of the sound. The same deduction goes for the 3D case, only the sound field is quite a lot more difficult to visualise. We get, logically enough, hyperboloids. (To see this, put a line through the two point sources, rotate a plane set through this line and repeate the two dimensional reasoning on this plane.)

One should note that when different frequencies are combined, the result is more complex, since now we cannot combine the resultant vibration pointwise into a single sinusoid. But keeping to two, close frequencies, we get an interesting phenomenon called beating. When two frequencies that are close to each other are combined, we get, not an audible combination of the two, but the frequency in the middle of the two, varying sinusoidally in amplitude at the rate of the difference between the two original frequencies. This is seen as follows. Suppose we have two sine waves with frequencies f1 and f2 and we form their product, $\sin (2 π f_{1} x) \sin (2 π f_{2} x) . Through a basic trigonometric identity, the result is \frac{1}{2} (\cos (2 π (f_{1} + f_{2}) x) + \cos (2 π (f_{1} ‐ f_{2}) x)), which shows the symmetrical placement of the sidebands.
(Don’t worry about the cosines, since they have the same form as
sines. They are only a bit ahead in time\dots) The equation works
backwards, of course, so adding two sinusoids at near frequencies
produces beating at a period of half the separation of the originals. Reflection and absorption So now we have multiple sound sources, but still nothing but an empty
medium where our waves travel. How about obstacles? Starting from a
single dimension once more, we send a single pulse wave towards the
end of a string which tied to a rigid wall. What happens? Well, the
pulse comes back: it gets reflected. This is easy to understand—when
a pressurized zone meets the wall, it cannot move it, and the pressure
pushes back instead, making for a reflected copy. If the wall gives
in a little and takes a bit energy from the wave (turning it into heat
through friction, usually), the wave still bounces back but gets attenuated.
We say absorption has occurred. Absorption is the reason rooms
do not have indefinitely long echoes. In a sense, absorption is the
precise opposite of radiation. This way it is quite logical that, here
too, the size of the object and the frequency of the wave matter. Usually,
though, the size isn’t as much the size of the absorber, here, as it
is the scale of detail of and material used in the object. For example,
a paper wall can only hold the highest frequencies, whereas a soft, heavy
curtain can absorb significant mid and low frequency sounds. In higher
dimensions (2+), reflections become much more difficult to handle. Here
approaches similar to ray optics work much better. Resonance When we combine reflection and interference, interesting things happen.
Taking our 1D standing waves, we can now generate them by a single source
and a wall that reflects the waves back. One can think, as in ray optics, that the mirror image of the source now
provides the other source. A similar view works in higher dimensions,
but gets intractable quite fast when the number of reflections and
reflecting objects increases. Even more troublesome is the situation
in which the reflecting objects are not infinite, straight planes. At
a very basic level the problem with higher dimensional differential
equations is precisely the one of curved boundaries, which naturally
make no sense in dimension one. If we put two obstacles and send a pulse between them, a periodic motion
arises. If we put a source there, instead, we observe a complex
interference pattern as the waves get reflected again and again and
interfere with other reflections and the source signal. Again, the same
thing happens in higher dimensions, only with more hard to follow
patterns. If regular echoes, which reinforce each other, can be produced
at some frequency (in the case of periodic sources, this happens when
the distance between our two obstacles is a multiple of the wavelength), resonance results. If such resonant frequencies exist, they
reinforce sounds of the same frequency. The opposite (and all that is in
between) can also happen—destructive interference can greatly damp
some frequencies. Resonance gives rise to different modes of
vibration—if resonance can happen on different frequencies, complex
patterns of vibration can arise. These patterns are taken advantage of
in the design of traditional instruments. For instance, only a slight
variation in the design of a violin can cause significant variations in
its perceived timbre. Since acoustically significant vibrational
modes always appear as (composite) standing wave formations in physical
media (such as air columns, solids and water), the different modes can
often be independently controlled—they all have their own characteristic
vibrational shape with humps and knots which gives us the possibility of
exciting or damping the modes differently relative to one another. Further,
since air columns can vibrate, so can spaces filled with air. This leads,
in case, to the issue of room acoustics: if one puts a point source
(a very rough estimation of a loudspeaker) in a room, the more the walls
reflect sound, the more the room colors the sound (longer echo means
more chances for interference). As sound circulating around a room gets
reflected many times, it is necessary to ensure that no prominent
resonances occur (these are called room modes or just modes and
usually result from echoes between opposite walls). Same general principles
apply here as in the case of 1D resonance, with the exception of many
unusual and inharmonic modes—as such, the placement of speakers, room
geometrics and decoration crucially affect the sound field in the room.
In addition, psychoacoustical phenomena further complicate matters.
Thus, for instance, the more random the direction prominent echoes can
be made to come from, the better (as this lessens the effect of room
modes, and obvious echo directions get reduced). This is why audiophiles
use highly damped and irregularly shaped rooms to achieve a HIFI
listening environment. (Basic measures include thick carpets to absorb
stray sound, book shelves to absorb and scatter, absorbers in the
ceiling and placing of heavy furniture around the rim of the room.) Refraction and total reflection in a boundary Until now, we have assumed that the medium in which our waves travel is
homogeneous—the speed of travel of wave motion is constant throughout
the space. Often this is not the case, though. In the case of sound, the
speed depends on what material the waves travel in and its temperature.
Often one can ignore the inhomogeneity, but sometimes it produces important
effects. The main one is refraction . This means velocity
dependent bending of wavefronts. Refraction is most pronounced if sharp
boundaries between media of different properties are present—an excellent
example is the boundary between water and air. If a wavefront hits such
a boundary in an other than a straight angle, the direction of the waves
is bent. If the speed decreases in the boundary, the motion bends towards
the normal of the boundary. If it increases, bending is away from the
normal. If the incident angle is great enough and the waves are getting
slowed down, total reflection occurs. All this is precisely analogous to
what happens in ray optics. The only difference is that in acoustics,
one needs to worry about nonsharp boundaries more often. This is because
we are mostly dealing with sound transmission in air in normal atmospheric
pressures and in this case, the speed differences usually arise from
temperature differences—always a continuous phenomenon. As you can
already guess, refraction and total reflection happen with graded
boundaries as well. Here they take the form of smooth bending, not abrupt
changes of direction. One must also observe the fact that refraction,
just like diffraction, is frequency dependent—different frequencies
refract differently. What is the significance of all this, then? Most
often, at least indoors, none. Outdoors where temperature gradients can
be much greater, refraction effects can become significant, though. A
prime example is the way sound can propagate over lakes—if the water
is warmer than the air above it, a warm‐cold graded boundary can form in
the air above the water. This can, under some circumstances, bend sound
waves from the other side of the lake and prevent them from escaping.
This can lead to the sound propagating unusually long distances over
the lake. (The phenomenon is similar to the one employed in graded index
optic fibres.) Diffraction One final phenomenon is yet to be uncovered, namely, diffraction .
This is something that is often, sadly enough, left to little notice.
All waves behave rather weirdly when they pass around objects. If very
thin (compared to the wavelength) objects are passed, no substantial
effects are produced—such little defects in the medium drown into the
large scale wave motion. Very large objects exhibit reflection, at least
locally. But in between (e.g. around object edges and suitably sized
obstacles overall), the wave motion bends, creating some pretty complex interference patterns. Even in the case of
exceedingly simple geometric objects (e.g. balls, cylinders\dots), the
resulting interference is difficult to master mathematically. This
is a phenomenon that is specific to 2+ dimensional cases and is something
that greatly affects the behavior of sound in natural environments. Thus,
the behavior of sound near objects and object edges is really quite
poorly understood, leading to the term near field effect being
used in situations where such behavior is significant. Noteworthy examples
are the sound field of a loudspeaker and the field formed around a human
head while standing in a larger sound field. The latter to a considerable
degree dominates how we hear sound and mostly determines how the
direction of a sound source affects our perception of it. Diffraction is something which is not often taken into account when
simulating sound behaviour. Reasons for this are multiple. Firstly,
diffraction is rather difficult to simulate efficiently. As it is a 2+
dimensional phenomenon, it does not naturally lend itself to the one
dimensional abstractions of today’s simulation methods and 2+ dimensional
simulations cost dearly in terms of processing power and memory. Secondly,
diffraction is heavily frequency dependent—it disperses waves of
differing frequencies. This is one of the reasons why accurate
prediction of room acoustics is so difficult. Thirdly, there is little
need to think about 2+ dimensional effects when analysing static, linear
point‐to‐point transmission. Though it may sound like all this is just
plain academics, when one tries to create convincing simulations of
sound behaviour for reverberation and binaural processing, this is where
we usually hit the wall. Now we know diffraction does not fit in and is difficult to handle. Under
what assumptions, then, can we ignore the problem? Let’s start at the
bottom of things\dots To get a hold on wave phenomena, one needs to
simplify quite a bit. The most common way is to try to linearize and
then reduce the dimensionality of the problem. The latter part often
consists of building meshes of one dimensional simulations or neglecting
the size of phenomena in certain directions. The latter is the way we
arrive at ray optics and its audio counterpart—if we neglect the fact
that our waves have a finite wavelength, i.e. we pass it to the limit,
many ugly things go away and we get nice, unidimensional, cleanly behaved
rays instead of multidimensional wavefronts. We can do this if the waves
are very short compared to the feature size of the surrounding space. In
the case of light and natural objects, we can quite safely assume this
to be the case. (The speed of light is high but its frequency is even
higher. This leads to the wavelength being very small. Also, the relative
frequency range of visible electromagnetic radiation is much narrower
than the range for audible sound.) With sound we bump into a relatively
wide frequency range and feature sizes in our environment which sit
right in the middle of audible wavelengths. This means that sound
diffraction in our surroundings is often considerable and can only be
neglected if few obstacles are present, sound sources can be considered
point‐like, enough damping is present and reflective surfaces are simple
enough. Sound as signals/functions Before any mathematical treatment of sound is possible, we must represent
it somehow in the language of mathematics. To do this, we note what sound
is: it is just time‐dependent pressure variation. Furthermore, by taking
a point in space, we can represent sound at this point with a single
number, the pressure. When there is no sound, the pressure is just the
normal atmospheric pressure (around 100000 Pascals in the average), so
it would be a good idea to assign numbers with respect to this level. So
we represent the pressure at our point by telling how much the pressure
differs from long term average air pressure—rarefaction results in
negative values, compression in positive. What scale we use does not
much matter—since most DSP is linear, the same basic concepts apply
regardless of scale. Now that we have chosen a pressure scale, we just
present the pressure as a function of time. If we want a more complete
description of the sound field, we take more points and form a vector
(a list of numbers, basically) of the pressures in those points and
represent this vector as a function of time. Usually we do not use more
than two to four points since the resulting description mostly suffices
for audio systems. Most people have never had a chance to hear anything
exceeding two channels (i.e. stereo). So we now have functions of time. These we call signals. They can be
represented by voltages or currents on electric circuits and wires
(this is the way microphone cables, amplifiers and most consumer audio
equipment works), as grooves of varying depth on an LP, as numbers of
some given precision on a computer or as numbers encoded in the tiny
pits and ridges of a CD. Mathematically we treat these functions as
mappings from real numbers to real numbers (i.e. for each possible
instance of time, we assign an infinitely accurate measure of pressure).
In digital systems, we present a string of numbers which give a
sufficiently accurate measure of the pressure at points sufficiently
close to each other in time (these numbers are called samples and under proper conditions, they represent the original signal with
near perfect quality). (See the first section of the chapter on DSP for
a closer look at sampling.) Having got used to thinking about sound in
terms of signals, we often equate these. This makes it possible to use
mathematical terminology (which is suitable for signals) to describe
what happens or is to be done to sound. It may sound a bit strange, for
instance, to talk about squaring a sound . Thought of as a sequence
of numbers, it makes perfect sense. Especially since we aim at
understanding DSP as well. Amplitude, decibels and the spectrum Not every sound has a frequency—no repetition, no frequency. However,
measured at a point, every sound has an amplitude . This means
roughly the same as the strength of the sound and could be defined in a
variety of ways. We pick one and speak of (peak) amplitude, defined as
the difference between maximum compression and maximum rarefaction that
our sound wave causes during a given period of time. The term can also
be used without exact, mathematically defined meaning to mean the
(relative) strength of the sound (with respect to another). When we present some sound to people, we soon realize that amplitude
(peak‐to‐peak pressure variation) is not very significant perceptually.
Instead, average power seems to be. This is why most volume
monitors use an RMS (Root Mean Square) scale. This is a time localized estimate of the average signal power, and is
calculated by squaring the signal, taking a weighted average over a
period of time and then taking a square root. Why should this work? One
reason is that power is preserved in Fourier decompositions whereas
amplitude is not. Since we process signals mainly in a frequency
decomposed form, it is to be expected that time‐domain characterizations
which can be directly translated to frequency domain should work the
best. As the ear seems to do time‐localized filterbank analysis (as
opposed to real Fourier analysis which really has infinite memory),
time‐localized averaging should not come as a surprise, either. Now, the dynamic range of human hearing is exceptionally wide—the
amplitude ratio of the softest sound heard to the loudest noise tolerated
is in the vicinity of 100 000 000 to 1 (hundred million to one) with
most resolution in the quiet end. Around 1 kHz people tend to classify a
ten‐fold increase or decrease in sound energy as a doubling or halving,
respectively, of perceived loudness. This means that a suitable scale
for sound amplitude is not linear, but logarithmic. Values from this
scale are called sound pressure levels (SPL) and their unit
is the decibel (dB) . It is defined as twenty times the base
ten logarithm of the ratio of sound pressure variation (effective level)
to the one of the softest sound heard by an average human (the
threshold of human hearing, defined as 20 micropascals peak
variation for a 1 kHz sine wave). This means that 0 dB equals the
threshold and a twenty decibel increase in decibels means a ten‐fold increase in pressure variation. To illustrate, going
from 0 dB to 140 dB means multiplying by 107, so
140 dB SPL equals a variation of 200 Pascals (effective
level)—plenty. Ever wonder why going from 80 dB to
100 dB is considered harmful while 60 dB to 80 dB isn’t? Yet another amusing calculation reveals that with a sinusoid of 196 dB SPL, the rarefying part of the fluctuation reaches vacuum. This is the
theoretical limit on sinusoidal pressure fluctuations in normal
atmospheric pressure, then. (Compressive impulses can, of course, reach
much higher SPL s; cf. the hydrogen bomb.) Doubling the pressure
variation, an increase of 3 dB SPL is achieved. When we think a bit, we
see that if two sounds with a significant SPL difference (say, over
15 dB) are added together, their relative difference is much greater than
we would think. In effect, adding a 30 dB SPL sound to one of 60 dB does
not increase the SPL significantly beyond 60 dB . Similarly we define the intensity level (ten times the
logarithm of the ratio of sound intensity to a reference intensity of 10 - 12 Watts per square meter) and the power level (ten times the logarithm
of the ratio of power to a reference power of 10 - 12 Watts). These scales are used much less frequently than SPL s. Now, although it was established a while ago that not all sounds need
to have a properly defined frequency, the concept of frequency still
has its uses. This is because, as we shall see later on, it is quite
possible to uniquely construct signals from sine waves with definite
frequencies. This makes it possible to talk about frequency ranges of any signal—we break the signal into sine waves and discard
everything but the frequencies of interest. This can also be accomplished
directly. Such ranges (called bands or subbands)
can then be processed and analysed separately, which, of course, is
precisely what goes on when we watch the spectrum analyzer on a hip
soundsystem, crank up the bass on a car stereo or speak through a
telephone (which constitutes a severely bandlimited channel).
Simultaneously measuring the relative contributions of all the
different frequency ranges in a signal gives rise to the spectrum of a sound. Depending on the way in which we
extract the subbands, we arrive at different kinds of spectra.
Nevertheless, they all give some sort of budget of how much bass,
middle and trebble our signal has. Since our ear performs an analysis
somewhat reminiscent of the kind described above, spectra are
invaluable in discussing and analysing sound and related technology.
Even when working with the kind of simple, intuitive definition of
the kind given above. We want to defer the introduction of math, so any rigorous treatment
of spectra (amongst other things) is necessarily postponed as well.
This will leave some holes and vaguely defined concepts, here. Be
forewarned when we use such terms as periodic, continuous, discrete,
spectrum and so on. Periodicity, quasi‐periodicity and aperiodicity Most traditional acoustic research has centered around highly
reductionistic approaches, such as using anechoic chambers, sinusoid
test tones and so on. In the real world, however, we never encounter
strictly periodic sounds, let alone pure sinusoids—musical sounds are
never pure enough and in addition are strictly time limited. In fact,
most musical sounds do not even approximate periodic behavior. To get a
hold on the following topics, we need to classify sounds a bit further,
and to establish an intuition as to how the different types of tones
behave and what they sound like. Periodic sounds we have already seen. The simplest example is the sine
wave. All periodic sounds repeat over and over, reaching over all of
time. It is clear that such sounds do not really exist, but they are a
neat conceptual tool when analyzing sounds which are locally stable.
This can be done after a system in a sense no longer remembers that
some input has started a finite instance ago, that is, any
transient phenomena have diminished sufficiently. As to why we would
go with periodic analysis, periodic signals have extremely nice
properties. For instance, frequency is a concept which is only defined
for signals which are periodic. If we look at the spectrum of a periodic
signal, we quickly learn that only whole multiples of some fundamental
frequency (harmonics) are present. Later, when stated
formally, this notion leads to the classic theorem on Fourier series. This does not imply the fundamental or all the harmonics need to be
present. When they are not, the actual frequency (rate of repetition) of
the signal can be higher than the fundamental frequency. In fact, one
can always think of a series of harmonic partials as containing only
some of the even harmonics instead of successive harmonics. This leads
to the fundamental being somewhat lower than before the shift in the
point of view. Consequently the concept of fundamental frequency is not
very well‐defined and certainly does not relate uniquely to the actual
frequency of the signal. This permits some interesting acoustical
illusions and even serious musical applications. Investigating a bit further we find that the relative amplitudes and
phases of constituent harmonics uniquely determine a periodic signal.
Later we shall see that the absolute phases of the harmonics in a
periodic sound actually matter little to us, and even the amplitudes
are perceived a bit vaguely. There is no time information, either.
This means that there are actually not so many perceptually separate
periodic tones. Further, all of them sound extremely dull
and sterile. The importance of periodic signals and their spectra lies in the fact
that they are exceedingly simple mathematically—periodic sounds avoid
the topological complications of Fourier analysis. They lead to the
Fourier series which is discrete and as such quite simple to understand
and derive. The Fourier series serves as a starting point for the
construction of the discrete Fourier transform which is of pivotal
importance in DSP. More about all this in the math section. In the previous we established that every periodic signal can be
constructed from harmonics of some fundamental. Now, nobody says we cannot
add together partials which are not in harmonic relationship
with each other. When we do this, we obtain quasi‐periodic signals. These sounds still have discrete spectra, but they need not be
periodic. Quasi‐periodic sounds are more relevant to musical acoustics
than periodic ones—locally the steady‐state part of an instrumental
sound is usually best described as being quasi‐periodic. Again we assume
that all the partials are in the audible range. Unlike periodic signals,
quasi‐periodic ones can have some time content—closely spaced partials
beat against each other, possibly contributing to harshness and time
evolution in the composite tone. Inharmonic partials often lead to
bell‐like or metallic timbres, or even chord or noise like textures if
many enough partials are present. No strict time features emerge,
however, because any transient content would necessarily imply a
continuous spectrum. For the same reason, any sound with a discrete
spectrum will reach indefinitely back and forth in time. Finally we have signals with continuous (in the strict sense) spectra,
i.e. aperiodic signals. Sounds like these can be practically anything,
but they never display truly periodic time‐domain behavior. Usually
white noise is given as an example, but actually all time localized
and discontinuous signals belong to this class. All transients
(because they are time‐localized) and physical signals (because they
have finite energy) also have continuous spectra. Strictly speaking, noise is mathematically defined in terms of
its generating process and some statistical properties of that process.
The actual signals we process are just examples (a numerable
collection of which is, in proper mathematical terms, an ensemble)
of what such a process can produce, and should be strictly separated
from the process itself. This means that mathematically derived spectra
for stochastic processes are expectations—they relate to real spectra like the expected result of half heads and half tails relates to
an actual experimental record of coin tosses. In statistical analysis, a
property called ergodicity then guarantees that averages taken
in the time domain faithfully represent the properties of the stochastic
process across its ensemble, so we can often handwave the distinction
between the properties of the process and the properties of its example
output. (Ergodicity guarantees that time averages taken over one output
equal those taken over all the signals in an ensemble.) One should keep
in mind that they are not the same thing, however. Otherwise one runs
into some deep math. To get rid of the process description and to work
solely on time series, one must first consider such fun subjects as
information theory, Kolmogorov complexity, Bayesian statistics and
estimation theory, to mention a few. Those are topics well outside both the scope of this presentation and the capability of the
author.$