Sampling basics

This article aims at giving an audio DSP newcomer an intuitive understanding of what Shannon’s sampling theorem, discrete time and bandlimited functions mean, and how they affect practical audio signal processing algorithms. Generation of common bandlimited waveforms for digital sound synthesis will serve as a practical example of the difficulties encountered with discrete time signals, and will also be used to demonstrate how common intuition initially fails us in the discrete domain. For those with a previous background in signal processing and/or mathematics, a proof of the sampling theorem is given which highlights the connection between equidistant sampling and the Hilbert space structure of L2.

A first look at how sampling works

Signals

When we deal with sound, or any other physical signal, we are working in continuous time: the signals are defined at any point in time. When we represent these signals mathematically, we can choose an arbitrary moment in time and obtain the value of the signal at this moment, as opposed to having to deal with some predetermined set of instants of time outside of which we have no knowledge of the signal. Similarly, the value at each instant of time could be absolutely anything, instead of only having a finite set of possible values, and under the assumption that the signal is of tangible, physical origin, it will slide from value to value in a continuous manner. Nothing happens infinitely fast in the real world.

This domain of continuous time functions is also called the analogue domain, since the instant to instant value of the signal is usually in direct correspondence with the physical quantity under study. In our case, this quantity would most likely be air pressure variation at a point around the average air pressure, that is, the thing that travelling waves of sonic energy cause, locally. This sort of description is very elegant, straight forward, and mathematically sound.

But there’s a catch: computers don’t deal with continuity very well, since continuity, with its infinities of possibilities, does not exactly fit the strictly finite resources of real world computers. But computers do take finite sequences of finite precision numbers just fine.

So in order to process sound with a computer we want to turn it into numbers. The problem is, we do not yet have any guarantee that we can do this without losing some essential quality of the original, continuous signal. But it just so happens we have a very simple, practically lossless representation that suits our needs. This is called the pointwise sampled representation of our signal/sound, and we can prove that were infinite amplitude resolution available, it would be perfectly invertible. And while infinite resolution is not an option, we can obtain perceptually transparent representation anyway.

The basic idea is real simple: we want to track where the value of the signal is going, so we take regular snapshots of its value, at our chosen resolution. Such a snapshot is the sample we already mentioned, and represents the instantaneous value the analog signal takes at a specific moment in time. Constant times between samples make this uniform sampling. For now we assume that the precise value of the samples can indeed be stored and trust that the times between the samples are truly equal; the ramifications will be dealt with later. The time between samples is the sample period, and the number of times per second of samples taken is the sample rate or sampling frequency, expressed in Hertz (Hz).

The idea is that the higher the sample rate, the more samples there are per unit time, and the less the analogue waveform has time to change between successive samples. So we hope that by raising the sample rate high enough, we can in fact capture the essentials of what happens with the wave. Our only worry is, what if something happens between the sample times, when we aren’t watching? (Keep in mind that the samples are instantaneous, so by far most of the time is spent between samples and all sorts of nasty stuff might very well go on, there.)

And indeed, this is the primary complication of sampling: the signal cannot change too unpredictably. We need to make it smooth, somehow, before we can sample it. Obviously the necessary amount of smoothing depends inversely on the sample rate. This is very intuitive, but the precise formalization is actually far more subtle ‐ for instance, it is quite possible to embed arbitrary amounts of information into an arbitrarily short continuous signal with a strictly limited rate of change per unit time. All we have to do is to code the information so that the derivative of the resulting continuous signal is finite, a simple problem of modulating it suitably, then compress the time axis to make the duration of the signal small enough to fit where we want it, and then lower the amplitude to take the rate of change down. So limits on the derivative of the signal will not get us far. Besides, applying the idea backwards we see that raising the amplitude of a signal would make its rate of change change, anyway, and we do have to deal with arbitrary amplitudes for now.

Linearity in sampling

At this point we’ll have to take a detour into the basics

 ‐detour: LTI systems
  ‐characterization by impulse response
   ‐both continuous and discrete
  ‐sinusoids (actually complex exponentials) as the eigen functions
   ‐sinusoids are real special, then
   ‐we are lead to consider how sinusoids can be represented and recovered
   ‐the connection to Fourier analysis
    ‐decomposition into sinusoids is possible for all real signals
    ‐the superposition principle for LTI systems
    ‐the convolution theorem: this is why sinusoids are so special
    ‐the fact that the ear is almost linear too: we need fixed frequencies
     to come out intact

    ‐if we manage to deal with sinusoids, superposition takes care of the
     rest
     ‐the system is linear: if we can pass a number of signals through it
      unharmed, scaled sums of them will pass through the system unharmed as
      well
     ‐at this point, we borrow from math, and assume a bit: all physical
      signals can be represented as infinite weighted sums (actually,
      integrals) of sinusoids
      ‐sinusoids have a very special place in the theory of linear systems,
       which is what audio transmission chain aspire to
      ‐sinusoids are also very smooth: their instantaneous rate of change is
       another sinusoid of equal amplitude and frequency, and so their rate of
       change and, the rate of rate of change and so on are all bounded
       ‐really, the only thing affecting the maximum rate of change of any
        order of a sinusoid is the original one’s frequency
     ‐this means that if a signal can be decomposed into sines which can all
      be put through the system unharmed, then the signal itself can be
      passed through the system unharmed; this happens because we can first
      decompose the signal, pass the parts through without modifying them,
      and recompose the signal afterwards, arriving at the same signal
      ‐this of course is a mathematical way of showing that the original
       system doesn’t harm the signal, we never actually do this de/recompose
       step in practice
      ‐sinusoids are a wonderful bunch of functions to do this sort of thing
       in, since they are mathematically very well behaved
     ‐so we want to know the conditions under which the sampled
      representations of two sinusoids can be confused with each other

Aliasing, and how to avoid it

 ‐aliasing
  ‐it is obvious that two sinusoids which are identical except for amplitude
   will have separate sampled representations
  ‐it is also quite clear that two sinusoids with identical amplitude and
   frequency but different delays can be separated from each other if the
   delay is kept below the common period of the sinusoids; pick to samples
   which do not hit the same part of the wave, compute their fraction and you
   have a separating representation which does not depend on the overall
   amplitude of the wave, thus making the separation work even when amplitude
   is varied
   ‐this is simply because in this range, any two sample values which are not
    precisely in the same part of the sine cycle will unambiguously tell which
    sinusoid we are talking about, and if all samples are in the same part of
    the wave, the sampled representation will be confused with DC; if this
    can be prevented, separability is not an issue
   ‐at delay=period, the infinitely long signals agree completely, but then
    they are really the same function, so it is not really a bug at all, but a
    feature
  ‐frequency is then the only question
   ‐if f_1 is the frequency of our sinusoid s_1, s_1 becomes s_1(t)=sin(2*pi*t*f_1)
    ‐if we sample at integer times n, the sampled version becomes s_1(n)=sin(2*pi*n*f_1)
    ‐let’s then take another sinusoid s_2 such that s_2(n)=sin(2*pi*n*f_2)
    ‐is it possible to have s_1(n)=sin(2*pi*n*f_1)=sin(2*pi*n*f_2)=s_2(n) for
     all n?
    ‐yes: we choose f_1 and f_2 such that |f_2‐f_1|=i, with i an arbitrary
     integer; for instance f_1=0.2 and f_2=1.2; in this case we get
     s_1(n)=sin(2*pi*n*0.2), and via the periodicity of sin(x), sin(2*pi*n*0.2+2*pi)=
     sin(2*pi*n*0.2+2*pi*n)=sin(2*pi*n*0.2+2*pi*n*1)=sin(2*pi*n*1.2)=s_2(n)
     ‐this is an example of aliasing: frequencies above half the
      sample rate fold below it upon sampling; aliasing turns out to be the
      only thing we have to prevent to get unique sampled representations
    ‐in fact, if we set f_2 to 1‐f_1, we get s_2(n)=sin(2*pi*n*(1‐f_1))=
     sin(2*pi*n‐2*pi*n*f_1)=sin(‐2*pi*n*f_1)=sin(2*pi*n*f_1‐pi), i.e. the
     signal is now even confused with a sinusoid of different phase
     ‐this is a special case of aliasing, in which signals from half the
      sample rate to the sample rate fold into the range with inverted phase;
      after we’ve seen aliasing for a while, we recognize that all frequencies
      from (an integer times the sample rate) to ((the integer plus one) times
      the sample rate) will fold equally to the space between zero and half
      the sample rate; especially this means that frequencies from ((an
      integer times the sample rate) minus half the sample rate) to (an
      integer times the sample rate) will always alias with their phase
      inverted
    ‐to prevent both of these troubles we limit f_1 and f_2 so that we always
     have |f_2‐f_1|<ω
     ‐if we have all the frequencies f we might care for, setting f<ω will
      accomplish the above
     ‐as we are only concerned with physical possibilities here, setting f>=0
      is a good idea too, leaving us with 0≤f<ω
     ‐this means that all sinusoids we care to represent must have their
      frequency between zero and half the sample rate, called the Nyquist
      frequency (remember that we sampled at integers, and the f’s
      of the sinusoids must be less than ω, which corresponds to a period of
      two samples)
     ‐under this condition two sinusoids whose phases agree at zero will
      always have separate sampled representations; the proof is by
      contraposition: assuming the sampled values are the same
      (sin(2*pi*n*f_1)=sin(2*pi*n*f_2)), f_1,f_2<ω, the frequencies are
      still different (f_1!=f_2) and setting n=1 implies
      sin(2*pi*f_1)=sin(2*pi*f_2); f_1,f_2<ω implies both sin(2*pi*f_1) and
      sin(2*pi*f_2) will hit the first halfcycle of the sine (reaching from 0 to
      pi) and strictly positive (unless one of them is 0, in which case the other
      one has to be too, since f_1,f_2 must be strictly less than ω); in
      this case sin(2*pi*f_1)=sin(2*pi*f_2) implies that one of 2*pi*f_1
      and 2*pi*f_2 will be in the first quarterwave, the other in the second,
      since otherwise their values could not agree (by symmetry of the sine
      function around pi/2); if this is the case, setting n=2 means that one
      of 2*pi*n*f_1 and 2*pi*n*f_2 will be in the first halfwave of the sine,
      and the other one in the second; but in this case
      sin(2*pi*n*f_1)!=sin(2*pi*n*f_2) because the first halfwave is positive
      and the second negative; hence, the samples cannot agree for all of
      n=0,1,2 and the representations are separate
     ‐the same proof works when one of the waves is phase shifted, only this
        time we cannot use positivity/negativity but must use actual ranges
        for the above proof; this is trivial, and works with arbitrary
        amplitudes once we separate them by amplitude and normalize
     ‐since under our condition, the sampled representations of sinusoids with
      separate frequencies with identical phase are separable, and we already
      showed that under this condition, different phases are separable upto
      identity, and that under the condition of separable phases and
      frequencies different amplitudes are separable as well, we have showed
      that under our condition, all nonequal sinusoids can be discriminated
      based on their sampled representations
     ‐based on this, and the assumption that each analog signal can be
      injectively represented as a Fourier sum, we conclude that under our
      condition separate analog signals have separate sampled representations

So, we take a bit more sophisticated approach, borrowing liberally from the theory of Fourier decompositions: according to Fourier theory, certain well behaved continuous functions can be systematically decomposed into integrals of sine functions. This is an operation which is quite close to what our ears do when they pick out the different frequencies off complex sound signals, in that the sound is decomposed into a number of frequencies. These are the frequencies of the sine waves that must be integrated to get the original wave. To a math newcomer, the biggest surprise is that taking an exhaustive toll of which frequencies are needed, and recording their relative delays (phases) produces a perfect, invertible presentation of a wide class of usable signals. In fact, we might go as far as to say that all naturally occurring functions can be losslessly decomposed into Fourier integrals, or transforms. What makes Fourier decompositions so nice is that they are linear (more on that later) and that sine functions are very well behaved mathematical beasts. In particular, at a given amplitude, sinusoids will have continuous, limited derivatives of all orders, with the limit established as the inverse of the frequency.

The crucial condition that we need to guarantee invertible sampling, which we already established has something to do with smoothness, becomes the concept of bandlimitation. A bandlimited function is one which has a Fourier transform, and whose Fourier transformation is zero above some limit frequency. In essence, no content is present above some established frequency limit, making the signal smooth even as its rate of change can be arbitrarily large. This happens because arbitrarily high frequencies, which could make for uncontrollably rapid change (like outright jumps) in the value of the signal, are no longer present. We hope to show that there is now some sampling rate at which the samples accurately capture the global behavior of the function, set the frequency limit somewhere above the frequency of the highest sine wave humans can hear, derive the necessary sampling rate to perfectly describe the stuff we do hear, figure out how to actually reconstruct signal and be done with it. This is the content of Shannon’s sampling theorem, which we will prove in detail in part two. The sampling theorem is what makes digital processing of analog signals possible in the first place, and so is what the possibility of digital signal processing ultimately rests.

First a couple of intuitive observations, though. Math being dull n’all… The reconstruction step will of course have to derive a bandlimited waveform from our sampled representation.

 ‐we know that reconstruction will have to smooth the waveform and be
  bandlimited; strict LTI lowpass filtering will leave a bandlimited
  function alone and bandlimit others; it is also a smoothing operator; we
  can derive the necessary impulse response from the Fourier spectrum by
  explicit integration (carry it out); it becomes a sinc(x) function; a
  sinc(x) is bandlimited itself and, centered at a sample, must come
  out the same; at other sample times it equals zero; hence its sampled
  representation is a discrete impulse, and the reconstruction of a discrete
  impulse will have to be a sinc(x) function
 ‐sinc(x) is a bandlimited impulse: it is the bandlimited counterpart of
  Dirac’s delta; what follows? TEMP!!!

 ‐interpolation and common misunderstandings
  ‐zero degree interpolation would make the signal horizontal between
   samples, with jumps at sample times
   ‐this is terrible, since there are even discontinuities
   ‐it sounds pretty nasty
  ‐first degree interpolation would draw lines between the sample points
   ‐better, but if does this really have anything to do with the kinds of
    signals we input? none, really: there is a whole lot of nasty
    distortion because of the angles
  ‐actually a sample is instantaneous, it has no width
   ‐especially, it does not represent a staircase waveform, but
    instantaneous values of the continuous function; these are widely
    interspersed
   ‐if we draw these points, the analog waveform represented by the numbers
    will go through them, of course, but it will not follow any simple path,
    like a straight line, between the points!

 ‐reconstruction
  ‐we also have to get back from our sampled representation, so how does that
   work?
  ‐obviously the signal will have to assume the values which we sampled at
   sample times
  ‐by our above construction, we see that reconstruction ought to be
     possible, somehow; as we bandlimited the wave, there is an intrinsic
     smoothness to any bandlimited waveform (imposed on the bandlimited wave
     by the mean value theorem on the inverse Fourier transformation with a
     limited highest frequency: the derivatives of sinusoids are continuous
     and limited, and a limited highest frequency means limited, continuous
     derivatives of all orders for the integral at any point) that should come
     out unchanged after the reconstruction
  ‐linearity solves our amplitude‐raises‐rate‐of‐change problem too:
     different derivatives are determined along with the relative amplitudes
     of the impulses; it is a natural consequence of linear representation and
     reconstruction
  ‐TEMP!!!
 ‐quantization distortion
  ‐each of the individual samples is represented by a number, usually by
   dividing the maximum swing of the analog input voltage to a constant
   number of equally large steps, and picking the number of the closest
   accurately representable signal value as the output for any given input
   level; such round to closest type operations are called
   quantization
   ‐because of the constant change in the number with a constant change in
    the input voltage, this is called linear quantization
   ‐each of these steps is represented by a single number
   ‐since computers are binary, we usually want the total number of these
    steps to be a power of two, or better yet a power of two divisible by 8,
    since today’s computers deal with numbers that are multiples of 8 bits
    ‐hence, 8‐bit, 16‐bit and 32‐bit sampling, corresponding to 256, 65536
     and 4294967296 possible different values, or steps within the full
     analog operating range
   ‐the number of possible steps affects the accuracy with which we can
    represent sound, since differences smaller than one step cannot be
    directly represented
    ‐the input signal will have to be rounded to the nearest representable
     level, which results in noise, or distortion, whose amplitude depends
     on the size of one step: the average error will be one half step in
     either direction, a noise signal with a half bit amplitude
     ‐the ratio of the maximum representable voltage swing to this noise
      signal is called the signal‐to‐noise ratio (S/N) of our system
    ‐32 bits (some 192dB) is far more than we need, since then one step can
     be set to represent inaudible sonic details, and the largest swing
     will still be far beyond the threshold of pain
    ‐16 bits (about 96dB) is slightly less than we need, since setting the
     largest swing to the threshold of pain will leave the softest directly
     representable sound (one step, peak to peak) some 20dB above the
     threshold of hearing
    ‐also notice that our proof of unique representations is based on
     infinitely accurate sample values, and so can actually fail when we go to
     finite accuracy
     ‐example: take a system where we quantize the wave to, let’s say, two
      bits, making for only four different values; then represent a sinusoid
      with an arbitrary frequency in this system; if you add a very small
      phase shift to this sinusoid, the values of the continuous function at
      the sampling instants will change very little, and the rounding
      operation will cause the samples to fall into the same values
      as without the phase shift; this means that these two different phases
      will be confused
      ‐multibit systems are plagued by the same problem, only it’s not as
       severe: many more phases can be represented since the smaller sampling
       step will cause smaller phase shifts to be enough to change the sampled
       representation
      ‐in fact, since this problem arises because of the nonlinearity of
       quantization, we cannot use reasoning based on Fourier decompositions
       at all ‐ a certain phase may well be accurately representable if
       sinusoids of other frequencies are present; in fact, in complex program
       material there will likely not be any problems
       ‐this is, in essence, what dither is all about: it adds noise, which is
        actually sinusoids of all frequencies with random phase; it can be
        proven that this makes the average phase resolution of a
        multibit representation infinite
 ‐oftentimes we sample multiple signals simultaneously, like the two stereo
  channels, or the 5.1 channels of surround
  ‐these are then treated as a multidimensional vector, the sampled sound
   becomes a vector valued function of time
  ‐however, the 5.1 organization actually has the LFE channel running at
   about 0.1 times the main channel rate, meaning it doesn’t really fit into
   our concept of how vector valued functions should work
   ‐this is probably the easiest example of the kinds of trouble one
    encounters in multirate signal processing, where many different sample
    rates are present at the same time

Shannon’s sampling theorem

Now we will get to the real meat, proving the sampling theorem. This section will borrow considerably more from pure math than the preceding text, and isn’t really essential for a working understanding of sampling. Those who are not interested in mathematics proper can skip the proof with a clear conscience.

Before starting, we will need to fix a notation for sampling periods and rates, so we set $T \equiv \frac{1}{ω}$ where omega stands for half the sample rate (the so called Nyquist rate), and T the sampling period. From here on, saying that a function is bandlimited means that its Fourier transform equals zero beyond frequencies with absolute value greater than omega, and the sampling times are fixed at integral multiples of the sampling period.

The proof

The proof begins from the knowledge of Fourier transforms and their properties. The Fourier series is not used, and instead of conciseness we will aim at a proof in terms of vector spaces and their bases. More specifically, we will first show that each bandlimited function in L2 has a continuous representative, after which we will construct a function (the so called bandlimited impulse) with the property that taking a standard L2 inner product of some bandlimited f with it is equivalent to sampling f at zero. Then we will show that time‐shifted versions of the impulse, centered on sampling times, form an orthogonal basis of the space of ω‐bandlimited functions. This suffices to show that the stream of numbers obtained by uniform sampling is simply a coordinate representation with respect to a specific basis on the space of bandlimited functions, as such spaces are complete in L2. It also implies invertibility, turning the sampling operator into a bijective homomorphism from the space of bandlimited functions to square summable sequences of samples, in accordance with the basic theory of Hilbert bases. If we were to normalize the basis vectors it would also preserve the norm. But in the case of sampling, we only modify the basis vectors used in the reconstruction step. Once we do this, the normalized impulse used for reconstruction assumes the usual form of the sinc function.

From where I stand, this formulation nicely motivates the naïve reconstruction—the combination of a S/H stage followed by an anti‐imaging filter is really just a cheap easy approximation of a bunch of identical‐looking basis vectors, multiplied by the weights provided by the incoming stream of sample values.

We start by defining the bandlimited impulse and demonstrating its connection to sampling.

$imp (x) \equiv \int_{0}^{ω} \cos (π x t) d t imp (x) = {\begin{matrix} ω, & x = 0 \\ ω \frac{\sin ω π x}{ω π x}, & x \neq 0 \end{matrix}, for all ω ‐bandlimited f, imp * f = f, and < imp | f > = f (0) . For a constant x we have imp (x) \equiv \int_{0}^{ω} \cos (π x t) d t = {\begin{matrix} ω, & x = 0 \\ \frac{ω \int_{0}^{1} ω π x \cos (ω π x t) d t}{ω π x} = \frac{ω /_{t = 0}^{1} \sin (π x t)}{ω π x} = ω \frac{\sin ω π x}{ω π x}, & x \neq 0 \end{matrix} Any ω ‐bandlimited function will have a continuous representative
because taking the Fourier and inverse Fourier transforms, an
identity operation on bandlimited functions, produces such a
representative: the family of sinusoids utilized in the inverse
Fourier transform is \infty‐differentiable and hence an integral of the
sinusoids over a compact real interval will be \infty‐differentiable, and
so continuous. Since we defined imp as the above
integral, we know that its Fourier transform will have unity modulus
precisely upto ω, with zero phase,
after which the magnitude will drop to zero. Hence, the function is
bandlimited. Further, by the convolution theorem, its convolution with
any ω ‐bandlimited function will in
the Fourier domain turn into a multiplication which will then result
in the spectrum of f, unchanged. Taking
the inverse Fourier transform we see that the convolution will not
change f . Finally, if we look at the form of the convolution integral of imp and f, it is seen that at zero, it
simplifies into the normal L 2 inner
product which from the above is known to be equal to the value of the
function at zero: f (0) \overset{imp * f = f}{=} (imp * f) (0) \equiv \int_{t ∊ ℝ} imp (t) f (0 - t) d t = \int_{t ∊ ℝ} imp (t) f (t) d t . We see that sampling and bandlimited impulses have an intimate
connection. By the convolution theorem, it is easy to see that
sampling the function at some other point is equal to taking the inner
product with a imp time‐shifted to
center at the sample time; arbitrary sampling can be completely stated
in terms of inner products. Now we go on to prove that suitably
spaced and scaled imp functions form a
basis for our chosen space of bandlimited functions. This we do by first
noting that any space of bandlimited functions is necessarily complete,
i.e. as a subspace of L 2, it is
a Hilbert space. This is trivial once we realize that the Fourier
transform is an isometric isomorphism of L 2 onto L 2 which
maps bandlimited functions onto L^{2} ([- ω, ω]) : the latter is known to be a closed subspace of L 2 and so
complete. Via the inverse isometry, so is our space of bandlimited
functions. To prove that the representation of any bandlimited function by its
samples is unique and invertible, it now suffices to the set of time‐
shifted imp functions centered at sampling
times constitutes an orthogonal basis. To prove it as such in a Hilbert
space, we need to show that the functions are square integrable, are
orthogonal and constitute a maximal set. We define {imp}_{y} (x) \equiv imp (x - y), and note that sinc y is
trivially bandlimited. By Parseval’s equality, the norm squared of imp y equals the norm squared of its Fourier transformation, which from the
definition is seen to be ω, i.e. finite
and equal for all y . Hence, all imp y are
square integrable. By the first theorem, and the bilinearity of
convolution, ({imp}_{y} * {imp}_{z}) (x) = {imp}_{y + z} (x), which equals zero when y + z is a
multiple of the sampling period not equal to x . Applying the reasoning on convolutions
and inner products that we used to find the connection between sampling
and bandlimited impulses, we see that the inner product of two imp functions centered at arbitrary
inequal sample times is always null, making the functions orthogonal.
Now that we’ve proven the set orthogonal, and also have a constant bound
for the norms of the basis vectors, the set is definitely a basis for
the space of its linear combinations. All the linear combinations of the
basis vectors are guaranteed to be bandlimited as weighted sums of
bandlimited functions. It remains to be shown is that the set of imp ’s is maximal in the space of
bandlimited functions. Maximality is equivalent to showing that no non‐zero function in the
space can be simultaneously orthogonal to all the basis vectors. We
choose an arbitrary bandlimited function, project it into the space
spanned by the impulses, and define d as
the difference between the original function and its projection. The
difference will then be perpendicular to any finite linear combination
of the impulse basis functions. The crux of the proof is that we can approximate any bandlimited
impulse, even those centered arbitrarily between sampling
times, by finite linear combinations of the basis functions. This
means that the inner product of the difference with an impulse centered
over an arbitrary real number will be zero. But we’ve already proven
that bandlimited functions have continuous representatives, and that
sampling them is equal to taking inner products with shifted impulses.
Hence, sampling d at any point will give
us zero, and d becomes the zero function.
The original bandlimited function will then belong to the space spanned
by the imp basis. We then show that arbitrary bandlimited impulses can indeed be
approximated by finite linear combinations of the sampling time centered
ones. ‐take the projection x’ of an arbitrary ω ‐bandlimited signal x into
the subspace spanned by {sinc_n} signals; let d=x‐x’; by the properties
of the Fourier transform, d is bandlimited; by the above construction,
d is orthogonal to sinc_n for all n; hence it is orthogonal to all
linear combinations of sinc_n; if we can show that sinc_y is in
span({sinc_n}) for all real y, it follows that d must be orthogonal to
sinc_y for or real y; from this it follows that d(y)=<sinc_y|d>=0 for
all real y, i.e. that d==0, and x is in span({e_n}), proving that any ω ‐bandlimited function is in the space spanned by {sinc_n}; it
remains to be shown that any sinc_y can be written as a linear
combination of sinc_n, n integer; the projection of sinc_y into
span({sinc_n}) is sum_i=‐inf^inf{<sinc_y|sinc_i>sinc_i}, whence
(sum_(i=‐inf)^inf{<sinc_y|sinc_i>sinc_i})(x)=sum_(i=‐inf)^inf{sinc(y‐i)sinc(x‐i)}=
sum_i=‐inf^inf{sinc(y‐i‐y)sinc(x‐i‐y)}=sum_i=‐inf^inf{sinc(‐i)sinc(x‐y‐i)}=
sum_i=‐inf^inf{sinc(i)sinc(x‐y‐i)}, and since sinc(i)=0 for all i!=0,
sum_i=‐inf^inf{sinc(i)sinc(x‐y‐i)}=sinc(x‐y)=sinc_y(x), and so sinc_y
is in span({sinc_n}), ending the proof Finally, we must figure how to deal with the fact that the above impulse
basis is only orthogonal, not orthonormal as we’d like. This is a simple
problem of normalization. It was already pointed out in the introduction
that we do not want to make our decomposition completely symmetrical,
but instead wish to retain pure sampling as the decomposition operator.
This implies that the decomposition is done by taking inner products
with imp y, y ranging over the sampling times. If we
were to normalize imp y we
would end up dividing it by its length, and doing both the decomposition
and the reconstruction with respect to this normalized vector. Instead
we normalize only the reconstruction basis, so the normalization is by
the square of the length, that is, the norm. As it happens, starting
with the normalized impulse, we have sinc (x) \equiv \frac{imp (x)}{ω} = \frac{\sin ω π x}{ω π x} i.e. we get a nice, time stretched \sin (x) x ‐waveform with a unity maximum value and nulls aligned with
sampling times. In this form, it is easy to see that the
sampling/reconstruction operation finally amounts to simple
interpolation, just as we expected. Final remarks ‐after this we automatically see that if the domain of the projection‐reconstruct
pair is broadened to C(R), it becomes a projection to the corresponding
subspace of bandlimited functions
‐this is the rationale for anti‐aliasing functions which actually do not
belong to the particular subspace of bandlimited functions: the brickwall
filtered version is the optimum representative of any band‐unlimited
function in the subspace with respect to the normal L 2 norm (i.e the RMS error metric)
‐this characterization of sampling begs the question, what if we were to
use a subspace which is defined by another criterion, such as a one
spanned by a progressively less frequency localized wavelet basis in the
upper frequencies?
‐could this sort of construction be used to arrive at an extended sampled
representation
‐could some such representation have DSD ‐like properties like temporal
accuracy via sacrificing the spectral one?
‐could we have such properties with linearity, average quantized/dithered
linearity and processability properties like those of PCM ?
‐would something like this necessitate overcompleteness? Numerical generation of bandlimited waveforms ‐numerical generation of bandlimited signals
‐digital synthesis is about generating a stream of samples without sampling
an analog signal first
‐the first idea: we cannot make a continuous signal in memory, but we can
often calculate its precise value at any instant in time
‐let’s just calculate the quantized versions of these values at each
sampling instant
‐this approach works, but only if the formulas we are using to derive
the values describe a properly bandlimited function ‐many naïve synthesis algorithms fail in that they try to work with
analog waveforms that are not bandlimited
‐for instance, the variable duty cycle pulse
‐bandlimiting the formulas is a real pain, bandlimiting after sampling
does not work since the harm has already been done
‐so we need to start with bandlimited functions, and work from
there
‐after this, we at least know that any LTI operation will not alias
‐so we can filter; especially we can both integrate and differentiate
‐what would be the kind of function we would like to start with? where to
get its bandlimited version?
‐integrating an analog impulse would give a step
‐subtracting an equal, but delayed, step from that would give a
rectangular wave
‐linearity means that this is just an integral of a signal consisting of
two impulses with equal but opposite magnitudes, separated by a delay
equal to the length of the desired pulse
‐summing multiple such pulses with a constant increase in delay (which
must be in excess of the pulse width) gets us a varible duty cycle pulse
wave
‐again we use linearity: this is just the integral of the sum of two
regular trains of impulses with equal periods, equal but opposite
magnitudes, and a mutual delay equal to the pulse width
‐the proper bandlimited representation of such a wave would be brickwall
lowpassed
‐integration is linear, so this lowpass could be done before integration as well as after
‐so if the impulse train generators were already bandlimited,
the output would be bandlimited
‐surprise: there is a closed form, discrete summation formula for
bandlimited impulse trains
(BLIT s),
and integration can be approximated digitally to any desired level of
accuracy!!!
‐this means that we can use the formula to get alias‐free impulse
trains, do a weighted sum of their output and feed it to an integrator
(which is a kind of lowpass filter) to arrive at properly bandlimited PWM waveforms
‐we get square waves as a special case, when the second BLIT lags
precisely half a cycle behind the first one
‐what’s more, triangle waves are integrated square waves, so we can
generate them too, by adding a second integrator
‐if we make the absolute amplitudes of our impulses inequal, but make
sure that the integrated sum signal still does not have a DC component, we get an asymmetrical staircase waveform
‐yo! saw waves are integrals of asymmetrical PWM waves, assuming
there is no DC component which would make this second integral
diverge; just add another integrator and we’re done
‐and again we get duty cycle modulation for free!
‐alternatively we could generate sine waves, which are inherently
bandlimited
‐by utilizing the Fourier theorems, we could explicitly sum only as many
partials of the BLIT (or more probably the target waveform) as fit
below the Nyquist frequency, and work from there
‐three problems
‐#1 DC ‐removing them by balancing the duty cycle against the two BLIT amplitudes is too expensive to do on the fly
‐however, we can prepend a DC killer to the integrator
‐this compound is called a leaky integrator ‐these can be used with DC as well, and they are fairly efficient to
implement
‐#2 integration is not very efficient if done right
‐it has an infinite impulse response, and it’s not even remotely causal
‐shortening the response will result in high frequency attenuation
‐approximating it by an IIR filter will cause some nonlinear phase
shift
‐after we calculate the amount of attenuation, we can follow
the whole package with a highpass filter to emphasize the lost
frequencies; phase shift, on the other hand, is so common in analog
circuitry that nonlinear phase will almost be a feature instead of a
bug; besides, the leaky integrators already introduce ample low
frequency phase shift
‐summa summarum, so fuckin’ what? we want it to sound fat and analog, no?
‐we can in fact do with something like fourth to sixth degree
approximations for the leaky integrators without noticeable trouble
‐#3 we used steady‐state assumptions
‐the sequences we get from the above methods assume that the resulting
wave will be infinitely long, and of constant frequency
‐for discrete notes, we could just add an envelope generator and use it
to window the start and end of the sound into a sensible, finite note
‐but this does not solve the problem of frequency modulation (e.g. pitch
bends)
‐for FM, we often just go ahead and vary the parameters of the BLIT s in real time, dropping partials based on the current fundamental frequency; when the highest partial goes above Nyquist, we
jump to settings in which that partial is not present, and vice versa
when the fundamental goes low enough for new partials to appear from
beyond Nyquist
‐this calls for more complex control, as we have to keep track of the
relative phase shifts of our two BLIT s, et cetera
‐the problem is, even this does not guarantee that the result is properly
bandlimited if we try rapid frequency sweeps
‐bah! who cares? we just don’t do rapid FM ‐beyond who cares?, we get another problem when upper partials
drop out or drop in‐range: there will be a discontinuity in the BLIT waveforms, heard as a click
‐to combat this, we can establish a sort of guard zone below the
Nyquist frequency, duplicate each BLIT in the design, and run
them in parallel such that if one them has its highest partial in the
range [Nyquist‐Guard_Band_Width,Nyquist] or
[Nyquist‐Fundamental‐Guard_Band_Width,Nyquist‐Fundamental], the other
one will have its highest partial in the complementary range
‐then we weight these two BLIT s by
(Nyquist‐Highest_Partial_Freq_Brighter_BLIT)/Guard_Band_Width and
1‐(Nyquist‐Highest_Partial_Freq_Brighter_BLIT)/Guard_Band_Width, using
the first coefficient for the brighter and the second for the lower BLIT ‐this way any partials appearing from beyond Nyquist or raising above
it will get faded in/out when passing the guard zone
‐if we are doing the BLIT s via table summation instead of closed
form expressions, the above is equal to running the whole wave stack
and the highest partial in parallel, mixing in a variable proportion
of the highest wave based on how far along the guard band it is
‐of course, if it’s below the guard band, it will be at full volume;
it can never go above the guard band since then it would alias: this
is where the highest wave in the stack becomes the new highest
partial
‐alternatively, when we have to add or kill a high partial, we kick in
a copy of the current BLIT generator with the new settings, and
crossfade to it, usually over a single impulse time
‐this is not as clean as the above method, but considerably more
efficient$