<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet alternate="no" href="/shared/style/dc-tiny/dc-tiny-1.css" title="decoy tiny 1" type="text/css" charset="UTF-8" media="screen"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:m="http://www.w3.org/1998/Math/MathML"
      xmlns:mono="http://www.iki.fi/~decoy/shared/namespace/dc-mono">
<head>
    <title>Sound, synthesis and audio reproduction: Sound as a physical phenomenon</title>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
    <meta http-equiv="Content-Language" content="en"/>
    <meta http-equiv="Content-Style-Type" content="text/css"/>
    <meta http-equiv="Content-Math-Type" content="text/mathml"/>
    <meta name="description" content="Some facts about the physics of sound."/>
    <meta name="keywords" content="sound,acoustics,physics,wave transmission,reflection,diffraction,interference,absorption,refraction"/>
    <meta name="robots" content="index,follow"/>
    <meta name="distribution" content="global"/>
    <?i-am 28?>
    <?parent 126?>
    <link rel="P3Pv1" href="/shared/policy/p3p-policyref.xml"/>
    <link rel="shortcut icon" href="/shared/graphics/icon/decoy-logo.png" type="image/png"/>
    <link rel="copyright" href="/shared/policy/siteip" title="Site IP policy"/>
    <link rel="meta" href="/shared/meta/merged"/>
    <link rel="author" href="mailto:decoy@iki.fi" title="Sampo Syreeni"/>
    <link rel="top" href="/front" title="Sampo Syreeni—decoy"/>
    <link rel="help" href="/shared/site" title="About the site"/>
    <link rel="up" href="dsound" title="Sound, synthesis and reproduction"/>
    <link rel="contents" href="dsound" title="Sound, synthesis and audio reproduction"/>
    <link rel="glossary" href="dsound-v-01" title="Vocabulary"/>
    <link rel="bibliography" href="dsound-r-01" title="References. Reading Which Is Worthy."/>
    <link rel="begin" href="dsound-c-01" title="Introduction"/>
    <link rel="end" href="dsound-c-14" title="Ideas to albums—working the sonic technology"/>
    <link rel="prev" href="dsound-c-01" title="Introduction"/>
    <link rel="next" href="dsound-c-03" title="Physical sound sources and sound production in instruments"/>
    <link rel="chapter" href="dsound-c-01" title="Introduction"/>
    <link rel="chapter" href="dsound-c-02" title="Sound as a physical phenomenon"/>
    <link rel="chapter" href="dsound-c-03" title="Physical acoustics"/>
    <link rel="chapter" href="dsound-c-04" title="Hearing, physiological and psychological aspects of"/>
    <link rel="chapter" href="dsound-c-05" title="Signal processing basics"/>
    <link rel="chapter" href="dsound-c-06" title="Sound analysis and visualisation"/>
    <link rel="chapter" href="dsound-c-07" title="Sound synthesis"/>
    <link rel="chapter" href="dsound-c-08" title="Sound processing and effects"/>
    <link rel="chapter" href="dsound-c-09" title="Control, modulation and patching"/>
    <link rel="chapter" href="dsound-c-10" title="Sequencing and MIDI"/>
    <link rel="chapter" href="dsound-c-11" title="Sound recording"/>
    <link rel="chapter" href="dsound-c-12" title="Audio transport"/>
    <link rel="chapter" href="dsound-c-13" title="Audio reproduction"/>
    <link rel="chapter" href="dsound-c-14" title="Ideas to albums—working the sonic technology"/>
    <link rel="appendix" href="dsound-a-01" title="APPENDIX: More on optical discs"/>
    <link rel="appendix" href="dsound-a-02" title="APPENDIX: Noise reduction"/>
    <link rel="appendix" href="dsound-a-03" title="APPENDIX: Audio coding/compression"/>
    <link rel="appendix" href="dsound-a-04" title="APPENDIX: Trackers, modules and sampling"/>
    <link rel="appendix" href="dsound-a-05" title="APPENDIX: Synthesis languages and softsynths"/>
    <link rel="appendix" href="dsound-d-01" title="Digression: common characteristics in signal analyses"/>
    <link rel="appendix" href="dsound-d-02" title="Digression: some common functions"/>
    <link rel="appendix" href="dsound-d-03" title="Digression: traditional psychoacoustic demonstrations"/>
    <link rel="appendix" href="dsound-d-04" title="Digression: consonance and dissonance"/>
    <link rel="appendix" href="dsound-d-05" title="Digression: commercial synthesis methods"/>
    <link rel="appendix" href="dsound-d-06" title="Digression: commercial sound effects algorithms"/>
    <link rel="appendix" href="dsound-d-07" title="Digression: some unconventional dimensions of sound"/>
    <link rel="appendix" href="dsound-d-08" title="Digression: more on sound fields"/>
    <link rel="appendix" href="dsound-d-09" title="Digression: synthesis tips and tricks"/>
</head>
<body xml:lang="en">
    <div>
    <h1>Sound as a physical phenomenon</h1>
    <div class="intro">
    <p>
        Sound is something most of us know and love. But although we hear sounds
        every single day of our life, there are many aspects to experience that
        one usually doesn’t pay attention to. These are the basic questions of
        the nature of sound, the reasons for our hearing anything at all and the
        mechanics underlying the sensation. It is not entirely clear how sound
        is actually generated in physical objects, why different objects do not
        all sound alike or how all this relates to our sensory apparatus. When
        taking a closer look at sound as a physical phenomenon, many interesting
        characteristics of sound come to proper focus. The relevant topics
        include the wave characteristics of sound: diffraction, reflection,
        interference and so on. Taking into account the peculiarities of the human
        sensory organ and the psychology of perception in general, one is taken
        into the field of psychoacoustics—the study of human auditory perception
        and its underlying mechanisms. This is a field rich in applications and
        interesting discovery, not all of which seems intuitive at first sight.
    </p>
    <mono:sidebar>
    <p>
        As one application perceptual sound codecs could be mentioned—how can
        an MP3 codec throw 90% of an audio signal away and still reproduce a
        perceptually near perfect replica of the original?
    </p>
    </mono:sidebar>
    </div>

    <div>
    <h2>What is sound?</h2>
    <p>
        Sound is a wave phenomenon. That is something we’re all told on highschool
        physics courses. But usually no time is left for a nice intuitive
        picture of the thing to build up. That’s what we will try to
        construct, next.
    </p>
    <p>
        To create waves we need a medium in which to put them. Obviously the
        medium needs to be elastic in order to generate any waves at all.
        Additionally there must be a kind of <q>stiffness</q> which holds the
        adjacent parts of the medium together. This is what makes wave
        propagation, a form of energy transfer, possible. It also determines
        the kinds of vibration which can propagate in the medium: unlike
        solids, gases and liquids lack transverse binding forces and so only
        longitudinal energy transfer is possible. This is why sound cannot be
        polarised. The other important intrinsic property of the medium is its
        density—together density and the strength of the binding forces
        determine the speed of wave motion in the medium.
    </p>
    <mono:sidebar>
    <p>
        Note that both density and the stiffness can vary within the
        substance: the medium can be <dfn>inhomogeneous</dfn>. In addition to
        this, the binding forces can be different depending on direction. In
        the latter case we talk about <dfn>anisotropy</dfn>, which fortunately
        does not occur in gases.
    </p>
    </mono:sidebar>
    <p>
        In the case of sound, the density is just the usual weight per unit of
        volume of gas and the stiffness is borne of a balance of repulsive
        forces between molecules and their mean kinetic energy (temperature).
        In a closed system of moving particles the sum kinetic energy will
        stay constant and statistical physics predicts that the mean distance
        between the equally repellant particles will try to even out. Similarly
        the expected velocity (with direction) of particles over an arbitrary
        volume will be zero—at large scales the gas will tend to stay put
        even if the molecules themselves move quite a bit. Only after we upset
        the balance do the mean properties ripple. These mean properties are
        used as a stepping stone to the analytic model of a sound field,
        which simply forgets that the molecule level ever existed, assigns a
        real velocity vector and a real pressure measure to each point in
        space and lets these vary over time. From Newton’s laws we then get a
        relatively simple partial differential equation which tells that
        pressure gradients cause acceleration of particles and that the velocity
        at some point causes the pressures to change around it.
    </p>
<p class="disposition">
        Often we make the further assumption that at each point, the velocity
        vector equals the gradient of the pressure field, though this way we
        lose part of the generality of the model.
</p>
    <p>
        To get a hold of the process of sound propagation, we first look at a
        simple example: a <dfn>point source</dfn>. This is simply one point in
        which we explicitly control the sound pressure or the velocity vector of
        the field. In practice, neither of these can be controlled independently
        of the other. When we excite the medium by creating a disturbance in its
        structure, coupling between adjacent particles makes the disturbance try
        to even out, globally. The forces arising from the new inhomogeneity
        accelerate the particles toward lower pressure. But this, assuming the
        pressurised region is point‐like, can only mean that the disturbance
        moves outward. The stuffed particles get pushed off from the center of
        the region. This makes them move and, consequently, pushes fresh
        molecules out of the way. Voila: motion. Once this has happened, the
        pressure is evened out. It is worth pointing out that the pressure wave
        does not get ironed out in the inward direction. This is due to the
        inertia of the molecules—once the pressure sets them in motion, the
        pressure moves in the direction the first molecules go in. What happens
        is similar to slamming a pool ball against another. Since all this has
        happened through what are almost 100% elastic collisions of particles,
        little energy is lost. (Some is, to the mean kinetic energy of
        particles, raising the gas temperature. This accounts for some of the
        attenuation that sound experiences while travelling.) As long as no new
        particles are brought into play the net effect is that of making the
        pressurised region move outwards from the point source. Note that
        individual particles do <em>not</em> move appreciable distances in the
        action but stop after transferring their kinetic energy to the next one.
        This is a characteristic of wave phenomena: seen at a larger scale,
        energy moves, not the medium. Put in another way, turning the volume
        knob east does not produce a tropic storm.
    </p>
    <mono:sidebar>
    <p>
        It is very important to separate such concepts as the pressure field
        (scalar in 3D), the density field (scalar in 3D), the velocity (3D
        vector in 3D), the gradient fields of the first two scalar ones (3D
        vector in 3D), the time derivative fields of the scalar ones (scalar in
        3D) and all the derived fields on top of these. When bypassing the
        mathematical notation, it is exceptionally easy to confuse the first and
        second time derivatives of the pressure field with velocities and
        accelerations (which are taken relative to spatial coordinates and are,
        hence, vector fields).
    </p>
    </mono:sidebar>
    <p>
        Above, we compressed the air at the source. But the same principle of
        wave transmission applies if the opposite is done—namely, if we create
        a depressurisation zone. In this case, the surrounding particles move in
        and the zone moves outward again. Also, the amount of depressurisation
        is significant—the more violent the original disturbance, the bigger
        the propagating <q>bump</q> in the medium. Actually, aside from the energy
        of the original defect getting divided into a larger area and thus growing
        fainter and fainter per unit volume, one can accurately construct a
        series of more or less violent distrurbances at a point by measuring the
        local air pressure at a point some distance removed from the sound
        source. This is, roughly, how sound propagates through air in free space
        and is experienced from afar.
    </p>
    <p>
        A few things have to be noted about sound radiation. The first thing is
        that we speak about pressures. The importance of this is seen when
        thinking about a speaker cone that moves very slowly. In this case, the
        air has time to escape from before the cone instead of forming a high
        pressure zone. Apparently efficient radiation is not possible. We see
        that in order to emit considerable radiation, rapid variations in pressure
        or large radiators must be used. This is typical of wave emission—it
        is why microwave radio transmission requires only small antennas while
        low frequency AM radio often employs dipole antennas that are tens
        of meters long. From this we come to the second point: in order to
        continuously emit sound, we cannot just move the speaker cone further
        and further ahead. Instead, the cone has to come back before the air has
        time to escape around the edges. In physics, the situation is discussed
        in terms of coupled systems and impedance matching. The principle is,
        sound emitters work best when the inertia of the medium keeps the medium
        from moving appreciably and on the other hand the emitter’s own inertia
        isn’t large enough to make it hard to move. Back and forth motion is
        the normal mode of wave transmission, not the impulses we have discussed
        so far. A special case of such movement occurs when the motion repeats
        at a constant rate and each cycle involves the same precise pattern of
        movement. In this case we speak of <dfn>periodic</dfn> motion and
        periodic sound/signals. The rate of repetition of a periodic motion is
        dubbed the <em>frequency</em>, with Hertz (<abbr title="Hertz" xml:lang="en">Hz</abbr>) as its unit. Hertz is the
        SI unit meaning <q>times per second</q>. As each part of a wave traverses
        at a constant velocity and, at each fixed point in space, the vibratory
        motion repeats at a constant rate, we see that one period of the motion
        is always exactly duplicated in a certain interval of space that depends
        only on the speed of the wave motion and the frequency of the wave we
        drive through the medium. As we work in a single medium, the speed stays
        constant, so the length of our interval depends only on the frequency
        of our vibration. This is the <em>wavelength</em> corresponding to the
        frequency, with an inverse dependency on it. Because of the properties
        of what we will become to know as linear systems, a certain type of
        periodic wave has a very special position in our treatment. This wave is
        the <dfn>sinusoid</dfn>. It is the smooth, endless, periodic function
        which we bump into in trigonometry. The sine wave has the property
        that when put in a linear system (in our case, transmitted through air),
        it comes through as a sine wave with the same frequency. The only
        variation comes about in the form of a time lag and a change in
        strength. When a combination of sine waves of different frequencies
        is introduced, they go through as if the other waves weren’t even
        present.
    </p>
    <p>
        In later parts of the text when we talk about sound, we usually mean
        pressure variations measured at a point. This is because we have ears
        which are relatively small compared to the wavelength of audible
        sound—we can with good accuracy say that ears are
        pointlike with regard to sound fields. Thus few humans even fully
        comprehend the real, complex vibrational patterns which occur in three
        dimensional spaces—evolution has not equipped our brain
        to do such analysis. This fact is a double edged sword,
        really—it would be nice to actively understand all the
        phenomena involved in sound transmission since all such things affect
        what we hear but, on the other hand, mathematical description and
        manipulation of 2+ dimensional wave phenomena quickly becomes quite
        unwieldy. It is quite a relief to scientists, engineers, technicians
        and artists that such considerations are not strictly necessary to
        fool our hearing.
    </p>
    </div>

    <div>
    <h2>Interference</h2>
    <p>
        When nonsinusoidal sources and/or a number of radiators and/or closed
        spaces are considered, things get interesting. At once we note something
        called <em>interference</em>. It is what happens when more than
        one source is placed in the same space. At each point in space, the
        individual contributions of our moving pressure zones (one for each
        emitter) just add up. We get what is called <dfn>linear</dfn> wave
        transmission. The name comes from mathematics and means, roughly, that
        given a bunch of signals, we can first add and then feed through a system
        or first feed through the system and then add, with equal results. To a
        considerable degree, this is what happens with sound. In spite of its
        rather technical connotations, linearity is a true friend. Without it,
        there would be little hope of understanding anything about sound at an
        undergraduate level.
    </p>
    <mono:sidebar>
    <p>
        Said in another way, at small to moderate amplitudes, sound transmission
        in large scale obeys a second order linear partial differential equation,
        called the wave equation, which is seen in all branches of physics and
        is covered early on in physics education. As is well known, once we
        know some solutions to a linear differential equation, we get more by
        scaling and summing.
    </p>
    </mono:sidebar>
    <p>
        Now, as periodic waves interfere, it is interesting to see what happens
        in a single, fixed point in space as time evolves. Let’s suppose we have
        a one‐dimensional string where a single sinusoidal sound source is present.
        We know that the pressure in a single point reflects that of the source
        at any place, save the time lag it takes the vibratory motion to reach
        our point and the attenuation resulting from friction and other damping
        forces. If we now add a second source with an identical frequency but
        a different placing on the string, we get <em>standing waves</em>. How
        does this happen? Think about the peak of one period of the motion. As
        it leaves the two sources, it travels at a constant velocity away from
        them. Precisely at the middle, the two waves meet and we let them
        interfere; they add together. The same applies for the valley parts of
        the wave. So in the middle, we get twice the amplitude. We say the two
        sounds are <em>in phase</em> with each other. Let’s take another point,
        this time choosing it so that the time to get from source 1 to the point
        is precisely half a cycle time greater than the time to get to our point
        from source 2, that is, the difference between the distances to the
        sources is a whole number of wavelengths plus one half. This time, the
        sinusoids always arrive at our point precisely when they cancel each
        other out. So in this point, we never observe vibratory motion. Points
        of these two kinds occur repeatedly over the entire length of our string,
        with the amplitude of the sinusoid motion varying between them from zero
        to double the source amplitude.
    </p>
    <p>
        The last example was very simple, as only one‐dimensional effects were
        considered. If two‐dimensions are used, we get a nice interference
        pattern, where our special points recur on points where the distance is,
        again, a whole multiple of half the wavelength.
    </p>
    <mono:sidebar>
    <p>
        We remember from highschool geometry that, given two fixed points, if we
        draw a curve of those points where the difference of distances from the
        fixed points is constant, we get a hyperbole. So the knots and humps of
        our interference pattern on a plane occur on hyperboles with the point
        sound sources as foci and the spacing of the points determined by the
        wavelength of the sound. The same deduction goes for the 3D case, only
        the sound field is quite a lot more difficult to visualise. We get,
        logically enough, hyperboloids. (To see this, put a line through the two
        point sources, rotate a plane set through this line and repeate the two
        dimensional reasoning on this plane.)
    </p>
    </mono:sidebar>
    <p>
        One should note that when different frequencies are combined, the result
        is more complex, since now we cannot combine the resultant vibration
        pointwise into a single sinusoid. But keeping to two, close frequencies,
        we get an interesting phenomenon called <em>beating</em>. When two
        frequencies that are close to each other are combined, we get, not an
        audible combination of the two, but the frequency in the middle of the
        two, varying sinusoidally in amplitude at the rate of the difference
        between the two original frequencies. This is seen as follows. Suppose
        we have two sine waves with frequencies
        <m:math><m:msub><m:mi>f</m:mi><m:mn>1</m:mn></m:msub></m:math> and
        <m:math><m:msub><m:mi>f</m:mi><m:mn>2</m:mn></m:msub></m:math>
        and we form their product,
        <math xmlns="http://www.w3.org/1998/Math/MathML">
            <mrow>
                <mi>sin</mi>
                <mo><mchar name="ApplyFunction"/></mo>
                <mrow>
                    <mo>(</mo>
                    <mrow>
                        <mn>2</mn>
                        <mo><mchar name="InvisibleTimes"/></mo>
                        <mn>π</mn>
                        <mo><mchar name="InvisibleTimes"/></mo>
                        <msub>
                            <mi>f</mi>
                            <mn>1</mn>
                        </msub>
                        <mo><mchar name="InvisibleTimes"/></mo>
                        <mi>x</mi>
                    </mrow>
                    <mo>)</mo>
                </mrow>
            </mrow>
            <mo><mchar name="InvisibleTimes"/></mo>
            <mrow>
                <mi>sin</mi>
                <mo><mchar name="ApplyFunction"/></mo>
                <mrow>
                    <mo>(</mo>
                    <mrow>
                        <mn>2</mn>
                        <mo><mchar name="InvisibleTimes"/></mo>
                        <mn>π</mn>
                        <mo><mchar name="InvisibleTimes"/></mo>
                        <msub>
                            <mi>f</mi>
                            <mn>2</mn>
                        </msub>
                        <mo><mchar name="InvisibleTimes"/></mo>
                        <mi>x</mi>
                    </mrow>
                    <mo>)</mo>
                </mrow>
            </mrow>
        </math>. Through a basic trigonometric identity, the result is
        <math xmlns="http://www.w3.org/1998/Math/MathML">
            <mfrac>
                <mn>1</mn>
                <mn>2</mn>
            </mfrac>
            <mo><mchar name="InvisibleTimes"/></mo>
            <mrow>
                <mo>(</mo>
                <mrow>
                    <mrow>
                        <mi>cos</mi>
                        <mo><mchar name="ApplyFunction"/></mo>
                        <mrow>
                            <mo>(</mo>
                            <mrow>
                                <mn>2</mn>
                                <mo><mchar name="InvisibleTimes"/></mo>
                                <mn>π</mn>
                                <mo><mchar name="InvisibleTimes"/></mo>
                                <mrow>
                                    <mo>(</mo>
                                    <mrow>
                                        <msub>
                                            <mi>f</mi>
                                            <mn>1</mn>
                                        </msub>
                                        <mo>+</mo>
                                        <msub>
                                            <mi>f</mi>
                                            <mn>2</mn>
                                        </msub>
                                    </mrow>
                                    <mo>)</mo>
                                </mrow>
                                <mo><mchar name="InvisibleTimes"/></mo>
                                <mi>x</mi>
                            </mrow>
                            <mo>)</mo>
                        </mrow>
                    </mrow>
                    <mo>+</mo>
                    <mrow>
                        <mi>cos</mi>
                        <mo><mchar name="ApplyFunction"/></mo>
                        <mrow>
                            <mo>(</mo>
                            <mrow>
                                <mn>2</mn>
                                <mo><mchar name="InvisibleTimes"/></mo>
                                <mn>π</mn>
                                <mo><mchar name="InvisibleTimes"/></mo>
                                <mrow>
                                    <mo>(</mo>
                                    <mrow>
                                        <msub>
                                            <mi>f</mi>
                                            <mn>1</mn>
                                        </msub>
                                        <mo>‐</mo>
                                        <msub>
                                            <mi>f</mi>
                                            <mn>2</mn>
                                        </msub>
                                    </mrow>
                                    <mo>)</mo>
                                </mrow>
                                <mo><mchar name="InvisibleTimes"/></mo>
                                <mi>x</mi>
                            </mrow>
                            <mo>)</mo>
                        </mrow>
                    </mrow>
                </mrow>
                <mo>)</mo>
            </mrow>
        </math>, which shows the symmetrical placement of the sidebands.
        (Don’t worry about the cosines, since they have the same form as
        sines. They are only a bit ahead in time…) The equation works
        backwards, of course, so adding two sinusoids at near frequencies
        produces beating at a period of half the separation of the originals.
    </p>
    </div>

    <div>
    <h2>Reflection and absorption</h2>
    <p>
        So now we have multiple sound sources, but still nothing but an empty
        medium where our waves travel. How about obstacles? Starting from a
        single dimension once more, we send a single pulse wave towards the
        end of a string which tied to a rigid wall. What happens? Well, the
        pulse comes back: it gets reflected. This is easy to understand—when
        a pressurized zone meets the wall, it cannot move it, and the pressure
        pushes back instead, making for a reflected copy. If the wall <q>gives
        in</q> a little and takes a bit energy from the wave (turning it into heat
        through friction, usually), the wave still bounces back but gets attenuated.
        We say <em>absorption</em> has occurred. Absorption is the reason rooms
        do not have indefinitely long echoes. In a sense, absorption is the
        precise opposite of radiation. This way it is quite logical that, here
        too, the size of the object and the frequency of the wave matter. Usually,
        though, the <q>size</q> isn’t as much the size of the absorber, here, as it
        is the scale of detail of and material used in the object. For example,
        a paper wall can only hold the highest frequencies, whereas a soft, heavy
        curtain can absorb significant mid and low frequency sounds. In higher
        dimensions (2+), reflections become much more difficult to handle. Here
        approaches similar to ray optics work much better.
    </p>
    </div>

    <div>
    <h2>Resonance</h2>
    <p>
        When we combine reflection and interference, interesting things happen.
        Taking our 1D standing waves, we can now generate them by a single source
        and a wall that reflects the waves back.
    </p>
    <mono:sidebar>
    <p>
        One can think, as in ray optics, that the mirror image of the source now
        provides the other source. A similar view works in higher dimensions,
        but gets intractable quite fast when the number of reflections and
        reflecting objects increases. Even more troublesome is the situation
        in which the reflecting objects are not infinite, straight planes. At
        a very basic level the problem with higher dimensional differential
        equations is precisely the one of curved boundaries, which naturally
        make no sense in dimension one.
    </p>
    </mono:sidebar>
    <p>
        If we put two obstacles and send a pulse between them, a periodic motion
        arises. If we put a source there, instead, we observe a complex
        interference pattern as the waves get reflected again and again and
        interfere with other reflections and the source signal. Again, the same
        thing happens in higher dimensions, only with more hard to follow
        patterns. If regular echoes, which reinforce each other, can be produced
        at some frequency (in the case of periodic sources, this happens when
        the distance between our two obstacles is a multiple of the wavelength),
        <em>resonance</em> results. If such resonant frequencies exist, they
        reinforce sounds of the same frequency. The opposite (and all that is in
        between) can also happen—destructive interference can greatly damp
        some frequencies. Resonance gives rise to different <em>modes</em> of
        vibration—if resonance can happen on different frequencies, complex
        patterns of vibration can arise. These patterns are taken advantage of
        in the design of traditional instruments. For instance, only a slight
        variation in the design of a violin can cause significant variations in
        its perceived timbre. Since acoustically significant vibrational
        modes always appear as (composite) standing wave formations in physical
        media (such as air columns, solids and water), the different modes can
        often be independently controlled—they all have their own characteristic
        vibrational shape with humps and knots which gives us the possibility of
        exciting or damping the modes differently relative to one another. Further,
        since air columns can vibrate, so can spaces filled with air. This leads,
        in case, to the issue of room acoustics: if one puts a point source
        (a very rough estimation of a loudspeaker) in a room, the more the walls
        reflect sound, the more the room colors the sound (longer echo means
        more chances for interference). As sound circulating around a room gets
        reflected many times, it is necessary to ensure that no prominent
        resonances occur (these are called <em>room modes</em> or just modes and
        usually result from echoes between opposite walls). Same general principles
        apply here as in the case of 1D resonance, with the exception of many
        unusual and inharmonic modes—as such, the placement of speakers, room
        geometrics and decoration crucially affect the sound field in the room.
        In addition, psychoacoustical phenomena further complicate matters.
        Thus, for instance, the more random the direction prominent echoes can
        be made to come from, the better (as this lessens the effect of room
        modes, and obvious echo directions get reduced). This is why audiophiles
        use highly damped and irregularly shaped rooms to achieve a HIFI
        listening environment. (Basic measures include thick carpets to absorb
        stray sound, book shelves to absorb and scatter, absorbers in the
        ceiling and placing of heavy furniture around the rim of the room.)
    </p>
    </div>

    <div>
    <h2>Refraction and total reflection in a boundary</h2>
    <p>
        Until now, we have assumed that the medium in which our waves travel is
        homogeneous—the speed of travel of wave motion is constant throughout
        the space. Often this is not the case, though. In the case of sound, the
        speed depends on what material the waves travel in and its temperature.
        Often one can ignore the inhomogeneity, but sometimes it produces important
        effects. The main one is <dfn>refraction</dfn>. This means velocity
        dependent bending of wavefronts. Refraction is most pronounced if sharp
        boundaries between media of different properties are present—an excellent
        example is the boundary between water and air. If a wavefront hits such
        a boundary in an other than a straight angle, the direction of the waves
        is bent. If the speed decreases in the boundary, the motion bends towards
        the normal of the boundary. If it increases, bending is away from the
        normal. If the incident angle is great enough and the waves are getting
        slowed down, total reflection occurs. All this is precisely analogous to
        what happens in ray optics. The only difference is that in acoustics,
        one needs to worry about nonsharp boundaries more often. This is because
        we are mostly dealing with sound transmission in air in normal atmospheric
        pressures and in this case, the speed differences usually arise from
        temperature differences—always a continuous phenomenon. As you can
        already guess, refraction and total reflection happen with graded
        boundaries as well. Here they take the form of smooth bending, not abrupt
        changes of direction. One must also observe the fact that refraction,
        just like diffraction, is frequency dependent—different frequencies
        refract differently. What is the significance of all this, then? Most
        often, at least indoors, none. Outdoors where temperature gradients can
        be much greater, refraction effects can become significant, though. A
        prime example is the way sound can propagate over lakes—if the water
        is warmer than the air above it, a warm‐cold graded boundary can form in
        the air above the water. This can, under some circumstances, bend sound
        waves from the other side of the lake and prevent them from escaping.
        This can lead to the sound propagating unusually long distances over
        the lake. (The phenomenon is similar to the one employed in graded index
        optic fibres.)
    </p>
    </div>

    <div>
    <h2>Diffraction</h2>
    <p>
        One final phenomenon is yet to be uncovered, namely, <em>diffraction</em>.
        This is something that is often, sadly enough, left to little notice.
        All waves behave rather weirdly when they pass around objects. If very
        thin (compared to the wavelength) objects are passed, no substantial
        effects are produced—such little defects in the medium drown into the
        large scale wave motion. Very large objects exhibit reflection, at least
        locally. But in between (e.g. around object edges and suitably sized
        obstacles overall), the wave motion <q>bends</q>, creating some
        <em>pretty</em> complex interference patterns. Even in the case of
        exceedingly simple geometric objects (e.g. balls, cylinders…), the
        resulting interference is difficult to master mathematically. This
        is a phenomenon that is specific to 2+ dimensional cases and is something
        that greatly affects the behavior of sound in natural environments. Thus,
        the behavior of sound near objects and object edges is really quite
        poorly understood, leading to the term <em>near field effect</em> being
        used in situations where such behavior is significant. Noteworthy examples
        are the sound field of a loudspeaker and the field formed around a human
        head while standing in a larger sound field. The latter to a considerable
        degree dominates how we hear sound and mostly determines how the
        direction of a sound source affects our perception of it.
    </p>
    <p>
        Diffraction is something which is not often taken into account when
        simulating sound behaviour. Reasons for this are multiple. Firstly,
        diffraction is rather difficult to simulate efficiently. As it is a 2+
        dimensional phenomenon, it does not naturally lend itself to the one
        dimensional abstractions of today’s simulation methods and 2+ dimensional
        simulations cost dearly in terms of processing power and memory. Secondly,
        diffraction is heavily frequency dependent—it disperses waves of
        differing frequencies. This is one of the reasons why accurate
        prediction of room acoustics is so difficult. Thirdly, there is little
        need to think about 2+ dimensional effects when analysing static, linear
        point‐to‐point transmission. Though it may sound like all this is just
        plain academics, when one tries to create convincing simulations of
        sound behaviour for reverberation and binaural processing, this is where
        we usually hit the wall.
    </p>
    <p>
        Now we know diffraction does not fit in and is difficult to handle. Under
        what assumptions, then, can we ignore the problem? Let’s start at the
        bottom of things… To get a hold on wave phenomena, one needs to
        simplify quite a bit. The most common way is to try to linearize and
        then reduce the dimensionality of the problem. The latter part often
        consists of building meshes of one dimensional simulations or neglecting
        the size of phenomena in certain directions. The latter is the way we
        arrive at ray optics and its audio counterpart—if we neglect the fact
        that our waves have a finite wavelength, i.e. we <q>pass it to the limit</q>,
        many ugly things go away and we get nice, unidimensional, cleanly behaved
        rays instead of multidimensional wavefronts. We can do this if the waves
        are very short compared to the feature size of the surrounding space. In
        the case of light and natural objects, we can quite safely assume this
        to be the case. (The speed of light is high but its frequency is even
        higher. This leads to the wavelength being very small. Also, the relative
        frequency range of visible electromagnetic radiation is much narrower
        than the range for audible sound.) With sound we bump into a relatively
        wide frequency range and feature sizes in our environment which sit
        right in the middle of audible wavelengths. This means that sound
        diffraction in our surroundings is often considerable and can only be
        neglected if few obstacles are present, sound sources can be considered
        point‐like, enough damping is present and reflective surfaces are simple
        enough.
    </p>
    </div>

    <div>
    <h2>Sound as signals/functions</h2>
    <p>
        Before any mathematical treatment of sound is possible, we must represent
        it somehow in the language of mathematics. To do this, we note what sound
        is: it is just time‐dependent pressure variation. Furthermore, by taking
        a point in space, we can represent sound at this point with a single
        number, the pressure. When there is no sound, the pressure is just the
        normal atmospheric pressure (around 100000 Pascals in the average), so
        it would be a good idea to assign numbers with respect to this level. So
        we represent the pressure at our point by telling how much the pressure
        differs from long term average air pressure—rarefaction results in
        negative values, compression in positive. What scale we use does not
        much matter—since most DSP is linear, the same basic concepts apply
        regardless of scale. Now that we have chosen a pressure scale, we just
        present the pressure as a function of time. If we want a more complete
        description of the sound field, we take more points and form a vector
        (a list of numbers, basically) of the pressures in those points and
        represent this vector as a function of time. Usually we do not use more
        than two to four points since the resulting description mostly suffices
        for audio systems. Most people have never had a chance to hear anything
        exceeding two channels (i.e. stereo).
    </p>
    <p>
        So we now have functions of time. These we call signals. They can be
        represented by voltages or currents on electric circuits and wires
        (this is the way microphone cables, amplifiers and most consumer audio
        equipment works), as grooves of varying depth on an LP, as numbers of
        some given precision on a computer or as numbers encoded in the tiny
        pits and ridges of a CD. Mathematically we treat these functions as
        mappings from real numbers to real numbers (i.e. for each possible
        instance of time, we assign an infinitely accurate measure of pressure).
        In digital systems, we present a string of numbers which give a
        sufficiently accurate measure of the pressure at points sufficiently
        close to each other in time (these numbers are called <dfn>samples</dfn>
        and under proper conditions, they represent the original signal with
        near perfect quality). (See the first section of the chapter on DSP for
        a closer look at sampling.) Having got used to thinking about sound in
        terms of signals, we often equate these. This makes it possible to use
        mathematical terminology (which is suitable for signals) to describe
        what happens or is to be done to sound. It may sound a bit strange, for
        instance, to talk about <q>squaring a sound</q>. Thought of as a sequence
        of numbers, it makes perfect sense. Especially since we aim at
        understanding DSP as well.
    </p>
    </div>

    <div>
    <h2>Amplitude, decibels and the spectrum</h2>
    <p>
        Not every sound has a frequency—no repetition, no frequency. However,
        measured at a point, every sound has an <dfn>amplitude</dfn>. This means
        roughly the same as the strength of the sound and could be defined in a
        variety of ways. We pick one and speak of (peak) amplitude, defined as
        the difference between maximum compression and maximum rarefaction that
        our sound wave causes during a given period of time. The term can also
        be used without exact, mathematically defined meaning to mean the
        (relative) strength of the sound (with respect to another).
    </p>
    <p>
        When we present some sound to people, we soon realize that amplitude
        (peak‐to‐peak pressure variation) is not very significant perceptually.
        Instead, <dfn>average power</dfn> seems to be. This is why most volume
        monitors use an <dfn>RMS (Root Mean Square)</dfn> scale.
    </p>
    <mono:sidebar>
    <p>
        This is a time localized estimate of the average signal power, and is
        calculated by squaring the signal, taking a weighted average over a
        period of time and then taking a square root. Why should this work? One
        reason is that power is preserved in Fourier decompositions whereas
        amplitude is not. Since we process signals mainly in a frequency
        decomposed form, it is to be expected that time‐domain characterizations
        which can be directly translated to frequency domain should work the
        best. As the ear seems to do time‐localized filterbank analysis (as
        opposed to real Fourier analysis which really has <q>infinite memory</q>),
        time‐localized averaging should not come as a surprise, either.
    </p>
    </mono:sidebar>
    <p>
        Now, the dynamic range of human hearing is exceptionally wide—the
        amplitude ratio of the softest sound heard to the loudest noise tolerated
        is in the vicinity of 100 000 000 to 1 (hundred million to one) with
        most resolution in the quiet end. Around 1<abbr title="kiloHertz" xml:lang="en">kHz</abbr> people tend to classify a
        ten‐fold increase or decrease in sound energy as a doubling or halving,
        respectively, of perceived loudness. This means that a suitable scale
        for sound amplitude is not linear, but logarithmic. Values from this
        scale are called <dfn>sound pressure levels</dfn> (<abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr>) and their unit
        is the <dfn>decibel (<abbr title="Decibel" xml:lang="en">dB</abbr>)</dfn>. It is defined as twenty times the base
        ten logarithm of the ratio of sound pressure variation (effective level)
        to the one of the softest sound heard by an average human (<dfn>the
        threshold of human hearing</dfn>, defined as 20 micropascals peak
        variation for a 1<abbr title="kiloHertz" xml:lang="en">kHz</abbr> sine wave). This means that 0<abbr title="Decibel" xml:lang="en">dB</abbr> equals the
        threshold and a twenty decibel increase in decibels means a
        <em>ten‐fold</em> increase in pressure variation. To illustrate, going
        from 0<abbr title="Decibel" xml:lang="en">dB</abbr> to 140<abbr title="Decibel" xml:lang="en">dB</abbr> means multiplying by
        <m:math><m:msup><m:mn>10</m:mn><m:mn>7</m:mn></m:msup></m:math>, so
        140<abbr title="Decibel" xml:lang="en">dB</abbr> <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr> equals a variation of 200 Pascals (effective
        level)—plenty. Ever wonder why going from 80<abbr title="Decibel" xml:lang="en">dB</abbr> to
        100<abbr title="Decibel" xml:lang="en">dB</abbr> is considered harmful while 60<abbr title="Decibel" xml:lang="en">dB</abbr> to 80<abbr title="Decibel" xml:lang="en">dB</abbr> isn’t?
    </p>
    <p>
        Yet another amusing calculation reveals that with a sinusoid of 196<abbr title="Decibel" xml:lang="en">dB</abbr>
        <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr>, the rarefying part of the fluctuation reaches vacuum. This is the
        theoretical limit on sinusoidal pressure fluctuations in normal
        atmospheric pressure, then. (Compressive impulses can, of course, reach
        much higher <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr>s; cf. the hydrogen bomb.) Doubling the pressure
        variation, an increase of 3<abbr title="Decibel" xml:lang="en">dB</abbr> <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr> is achieved. When we think a bit, we
        see that if two sounds with a significant <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr> difference (say, over
        15<abbr title="Decibel" xml:lang="en">dB</abbr>) are added together, their relative difference is much greater than
        we would think. In effect, adding a 30<abbr title="Decibel" xml:lang="en">dB</abbr> <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr> sound to one of 60<abbr title="Decibel" xml:lang="en">dB</abbr> does
        not increase the <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr> significantly beyond 60<abbr title="Decibel" xml:lang="en">dB</abbr>.
    </p>
    <mono:sidebar>
    <p>
        Similarly we define the <dfn>intensity level</dfn> (ten times the
        logarithm of the ratio of sound intensity to a reference intensity of
        <m:math><m:msup><m:mn>10</m:mn><m:mrow><m:mo>−</m:mo><m:mn>12</m:mn></m:mrow></m:msup></m:math>
        Watts per square meter) and the power level (ten times the logarithm
        of the ratio of power to a reference power of
        <m:math><m:msup><m:mn>10</m:mn><m:mrow><m:mo>−</m:mo><m:mn>12</m:mn></m:mrow></m:msup></m:math>
        Watts). These scales are used much less frequently than <abbr title="Sound Pressure Level" xml:lang="en">SPL</abbr>s.
    </p>
    </mono:sidebar>
    <p>
        Now, although it was established a while ago that not all sounds need
        to have a properly defined frequency, the concept of frequency still
        has its uses. This is because, as we shall see later on, it is quite
        possible to uniquely construct signals from sine waves with definite
        frequencies. This makes it possible to talk about frequency ranges of
        <em>any</em> signal—we break the signal into sine waves and discard
        everything but the frequencies of interest. This can also be accomplished
        directly. Such ranges (called <dfn>bands</dfn> or <dfn>subbands</dfn>)
        can then be processed and analysed separately, which, of course, is
        precisely what goes on when we watch the spectrum analyzer on a hip
        soundsystem, crank up the bass on a car stereo or speak through a
        telephone (which constitutes a severely bandlimited channel).
        Simultaneously measuring the relative contributions of all the
        different frequency ranges in a signal gives rise to the
        <dfn>spectrum</dfn> of a sound. Depending on the way in which we
        extract the subbands, we arrive at different kinds of spectra.
        Nevertheless, they all give some sort of budget of how much bass,
        middle and trebble our signal has. Since our ear performs an analysis
        somewhat reminiscent of the kind described above, spectra are
        invaluable in discussing and analysing sound and related technology.
        Even when working with the kind of simple, intuitive definition of
        the kind given above.
    </p>
    <mono:note>
    <p>
        We want to defer the introduction of math, so any rigorous treatment
        of spectra (amongst other things) is necessarily postponed as well.
        This will leave some holes and vaguely defined concepts, here. Be
        forewarned when we use such terms as periodic, continuous, discrete,
        spectrum and so on.
    </p>
    </mono:note>
    </div>

    <div>
    <h2>Periodicity, quasi‐periodicity and aperiodicity</h2>
    <p>
        Most traditional acoustic research has centered around highly
        reductionistic approaches, such as using anechoic chambers, sinusoid
        test tones and so on. In the real world, however, we never encounter
        strictly periodic sounds, let alone pure sinusoids—musical sounds are
        never pure enough and in addition are strictly time limited. In fact,
        most musical sounds do not even approximate periodic behavior. To get a
        hold on the following topics, we need to classify sounds a bit further,
        and to establish an intuition as to how the different types of tones
        behave and what they sound like.
    </p>
    <p>
        Periodic sounds we have already seen. The simplest example is the sine
        wave. All periodic sounds repeat over and over, reaching over all of
        time. It is clear that such sounds do not really exist, but they are a
        neat conceptual tool when analyzing sounds which are locally stable.
        This can be done after a system in a sense no longer remembers that
        some input has <q>started</q> a finite instance ago, that is, any
        transient phenomena have diminished sufficiently. As to why we would
        go with periodic analysis, periodic signals have extremely nice
        properties. For instance, frequency is a concept which is only defined
        for signals which are periodic. If we look at the spectrum of a periodic
        signal, we quickly learn that only whole multiples of some fundamental
        frequency (<dfn>harmonics</dfn>) are present. Later, when stated
        formally, this notion leads to the classic theorem on Fourier series.
    </p>
    <mono:sidebar>
    <p>
        This does not imply the fundamental or all the harmonics need to be
        present. When they are not, the actual frequency (rate of repetition) of
        the signal can be higher than the fundamental frequency. In fact, one
        can always think of a series of harmonic partials as containing only
        some of the even harmonics instead of successive harmonics. This leads
        to the fundamental being somewhat lower than before the shift in the
        point of view. Consequently the concept of fundamental frequency is not
        very well‐defined and certainly does not relate uniquely to the actual
        frequency of the signal. This permits some interesting acoustical
        illusions and even serious musical applications.
    </p>
    </mono:sidebar>
    <p>
        Investigating a bit further we find that the relative amplitudes and
        phases of constituent harmonics uniquely determine a periodic signal.
        Later we shall see that the absolute phases of the harmonics in a
        periodic sound actually matter little to us, and even the amplitudes
        are perceived a bit vaguely. There is no time information, either.
        This means that there are actually not so many perceptually separate
        periodic tones. Further, all of them sound <dfn>extremely</dfn> dull
        and sterile.
    </p>
    <mono:sidebar>
    <p>
        The importance of periodic signals and their spectra lies in the fact
        that they are exceedingly simple mathematically—periodic sounds avoid
        the topological complications of Fourier analysis. They lead to the
        Fourier series which is discrete and as such quite simple to understand
        and derive. The Fourier series serves as a starting point for the
        construction of the discrete Fourier transform which is of pivotal
        importance in DSP. More about all this in the math section.
    </p>
    </mono:sidebar>
    <p>
        In the previous we established that every periodic signal can be
        constructed from harmonics of some fundamental. Now, nobody says we cannot
        add together <dfn>partials</dfn> which are not in harmonic relationship
        with each other. When we do this, we obtain <dfn>quasi‐periodic</dfn>
        signals. These sounds still have discrete spectra, but they need not be
        periodic. Quasi‐periodic sounds are more relevant to musical acoustics
        than periodic ones—locally the steady‐state part of an instrumental
        sound is usually best described as being quasi‐periodic. Again we assume
        that all the partials are in the audible range. Unlike periodic signals,
        quasi‐periodic ones can have some time content—closely spaced partials
        beat against each other, possibly contributing to harshness and time
        evolution in the composite tone. Inharmonic partials often lead to
        bell‐like or metallic timbres, or even chord or noise like textures if
        many enough partials are present. No strict time features emerge,
        however, because any transient content would necessarily imply a
        continuous spectrum. For the same reason, any sound with a discrete
        spectrum will reach indefinitely back and forth in time.
    </p>
    <p>
        Finally we have signals with continuous (in the strict sense) spectra,
        i.e. aperiodic signals. Sounds like these can be practically anything,
        but they never display truly periodic time‐domain behavior. Usually
        white noise is given as an example, but actually all time localized
        and discontinuous signals belong to this class. All transients
        (because they are time‐localized) and physical signals (because they
        have finite energy) also have continuous spectra.
    </p>
    <mono:sidebar>
    <p>
        Strictly speaking, <em>noise</em> is mathematically defined in terms of
        its generating process and some statistical properties of that process.
        The actual signals we process are just <em>examples</em> (a numerable
        collection of which is, in proper mathematical terms, an <dfn>ensemble</dfn>)
        of what such a process can produce, and should be strictly separated
        from the process itself. This means that mathematically derived spectra
        for stochastic processes are expectations—they relate to <q>real</q>
        spectra like the expected result of half heads and half tails relates to
        an actual experimental record of coin tosses. In statistical analysis, a
        property called <dfn>ergodicity</dfn> then guarantees that averages taken
        in the time domain faithfully represent the properties of the stochastic
        process across its ensemble, so we can often handwave the distinction
        between the properties of the process and the properties of its example
        output. (Ergodicity guarantees that time averages taken over one output
        equal those taken over all the signals in an ensemble.) One should keep
        in mind that they are not the same thing, however. Otherwise one runs
        into some deep math. To get rid of the process description and to work
        solely on time series, one must first consider such fun subjects as
        information theory, Kolmogorov complexity, Bayesian statistics and
        estimation theory, to mention a few. Those are topics <em>well</em>
        outside both the scope of this presentation and the capability of the
        author.
    </p>
    </mono:sidebar>
    </div>
    </div>

    <?stamp?>
</body>
</html>