===============================================================
THE APPLICATION OF DITHERING AND NOISE-SHAPING TO DIGITAL AUDIO
                  Christopher Hicks, June 1994.            V0.9
===============================================================

0.  COPYRIGHT
=============

Permission is hereby granted to distribute this document in electronic
or magnetic form, and to make hard copies for private study, education
or other non-commercial purposes, provided that both of the following
conditions are adhered to:

   1) the document must remain complete and unaltered, and
   2) no charge may be levied for copying it, or for access to it.

I reserve all other rights to this document. I accept no
responsibility for damage or loss caused by inaccuracies contained
herein.
                                        Christopher Hicks, June 1994.

1.  INTRODUCTION
================

This article discusses in some detail the theory and practice of
dithering and noise-shaping, with particular reference to the field of
digital audio. Inevitably this involves some mathematics but it has
been kept to a minimum. Furthermore the limitations of an ASCII
distribution medium mean that fewer diagrams have been included than
would have been ideal.


2.  DITHERING
=============

Quantisation of a signal is nothing more complex than rounding it to
the nearest whole number. If the input to a quantiser x is between p
and p+0.5, then we round down to p; if x lies between p+0.5 and p+1
then we round up to p+1. In general, the output step size (or quantum)
need not be unity, but can be of arbitrary size q; in this case p is
not an integer, but a whole multiple of q.

2.1 Non-dithered Quantisation
=============================
Consider the following simple system, where x(n) is a discrete-time
input signal, Q is a quantiser whose output is y(n).

                           +-----+
          x(n) ------------|  Q  |------------ y(n)
                           +-----+

If x(n) is random and uncorrelated then at any given sample instant it
is equally likely to lie anywhere in the interval (p - q/2) to (p + q/2).
In other words, the quantisation error e(n) = y(n) - x(n) is random
and uncorrelated, and has a rectangular distribution of peak value
q/2. Elementary statistics show that the expected (or average)
quantisation error is zero, and that the quantisation noise power is
(q^2)/12.

If however x(n) exhibits some correlation (as does music for example)
then this simple analysis is no longer valid. The errors become
correlated to the signal and show up as distortion rather than as
broadband hiss. Analysis of this phenomenon is very difficult, but
for audio this is of nought as it sounds terrible, and we need
concern ourselves with it no longer; we have to look for a way around
the problem, rather than analysing the problem itself.

2.2 Dithered Quantisation
=========================
If we add a suitable uncorrelated random noise source to the input
signal x, then the quantisation error becomes decorrelated from the
signal. This is because the value of the quantiser input relative to
its output steps no longer depends solely upon the input x, but also
upon an uncorrelated random process d.

                  d(n)
                    |
                 +  |+     +-----+
          x(n) -----O------|  Q  |------------ y(n)
                           +-----+

The distribution of d is critical; it must effectively decorrelate the
quantisation error from the input signal x, while adding a minimum of
noise power to the output signal y.

A zero-mean rectangular distribution of peak amplitude q/2 is
effective at decorrelating the expected error (ie the first moment of
the error) and this adds excess noise power of approximately (q^2)/12
to the output. This in conjunction with the noise contributed by the
quantiser itself makes the total output noise power approximately
(q^2)/6.

However, this still is not adequate for high-quality audio as the
output noise power (ie the second moment of the error) is still
correlated with the signal. Adding a second uniform random variable
removes this correlation. Now the total output noise power is (q^2)/4
which is three times the original of (q^2)/12. However we achieve more
acceptable audio performance as the neither the expected error nor the
error power depends upon the signal x.

We can go on adding more random numbers in this fashion, and the effect
of each is to decorrelate the next statistical moment of the error
signal from the input. However, each also adds more noise power to the
output signal, and two is found in practice to be the most satisfactory
solution for audio work. The sum of these two uniform random variables
has a zero-mean triangular distribution with peak value of q, and is
therefore commonly referred to as "LSB TPDF dither" (Least Significant
Bit Triangular Probability Density Function).


3.  THEORETICAL ASPECTS OF NOISE SHAPING
========================================

Consider the addition of a feedback loop to the dithered quantiser to
give the following system.

                         d(n)
                          |
                +   u(n)  |+     +-----+
    x(n) ---------O---+---O------|  Q  |-----+------ y(n)
                - |   | +        +-----+     |
                  |   |          -   +       |
                  |   +------------O---------+
                  |                |
                  |    +------+    |
                  +----| h(m) |----+
                       +------+

Here, x is the 'noiseless' input sample, d is the TPDF random dither
process and y is the quantised output sample. Block Q represents the
quantiser as before, and h(m) is a discrete-time filter which, as we
will see, affects the spectrum of the error in the output y.

The equations governing the behaviour of this system are

    u(n) = x(n) + ( y(n) - u(n) ) * h(m)
    y(n) = Q( u(n) + d(n) ),

where '*' represents the discrete convolution operator, and Q(.)
represents the quantisation function.

As they stand, these equations are difficult to analyse because of the
non-linearity introduced by the quantisation function. However, if the
random dither sample d(n) is drawn from a suitable distribution then
the combined effect of adding d(n) and then quantising is
statistically equivalent to the addition of a different random
variable e(n). We may therefore redraw the block diagram, and rewrite
the system equations thus:

                                  e(n)
                                   |
                +      u(n)     +  |+    
    x(n) ---------O---+------------O---------+------ y(n)
                - |   |                      |
                  |   |          -   +       |
                  |   +------------O---------+
                  |                |
                  |    +------+    |
                  +----| h(m) |----+
                       +------+

    u(n) = x(n) - ( y(n) - u(n) ) * h(m)
    y(n) = u(n) + e(n)

Now the system contains no non-linearities, and it therefore becomes
useful to take Z-transforms, thereby converting the convolution
operator to its equivalent multiplication.

                                  E(z)
                                   |
                +     U(z)         |+    
    X(z) ---------O---+------------O---------+------ Y(z)
                - |   | +                    |
                  |   |          -   +       |
                  |   +------------O---------+
                  |                |
                  |    +------+    |
                  +----| H(z) |----+
                       +------+

Now the system equations become

    U(z) = X(z) - (Y(z) - U(z)).H(z)
    Y(z) = U(z) + E(z)

and rearranging to eliminate U gives

    Y(z) = X(z) + E(z).(1 - H(z))

Replacing z by w' = exp(jwT) (ie calculating spectra by evaluating the
z-transform on the unit circle) gives

    Y(w') = X(w') + E(w').(1 - H(w'))                   (1)

where w is the frequency in radians per second, and T is the sample
period. Being engineers we use j = sqrt(-1), rather than i.

So now we see that the spectrum of the input signal is unchanged at
the output, but it has added to it a random process E whose spectrum
has been modified by the function (1 - H(w')), where H(w') is the
frequency response of the feedback filter in the noise shaper. 

3.1 Statistics of E
===================
It is sensible to use dither with a triangular distribution and peak
value of q (the quantiser step-size), for the same reasons as in the
non-noise-shaped case. It was shown that this results in a white
quantisation error with zero mean, and of approximate power (q^2)/4.

3.2 Design of the filter H
==========================
We have seen that the filter H affects the noise spectrum, and now the
strategy is to design a filter such that the noise is moved to
frequency bands where it does not matter.

A first glance at equation 1 suggests that H(z) = 1 would be perfect,
eliminating all the noise, since then (1 - H(z)) = 0. Alas this cannot
be as it would result in a non-causal system; for causality, the
filter must have a group delay of at least one sample period at all
frequencies, which is equivalent to saying that we can have only
negative powers of z in the filter transform.

So the simplest filter we can use is a single sample delay, whose
transfer function is 1/z; it is interesting to calculate the resulting
noise spectrum for this case, since to implement this filter requires
very little computation - one addition, and one subtraction per
sample.

We saw above that the output noise is given by

   N(z) = E(z).(1 - H(z))

and that replacing z by w' gives the equivalent spectral expression

   N(w') = E(w').(1 - H(w')). 

Additionally we saw that E(w') is constant (since E is a white
process). Now we do the same again, with H(z) = 1/z.

    N(z) = E(z).(1 - 1/z)

   N(w') = E(w') . (1 - 1/w')

The noise power gain is therefore given by | 1 - 1/w' |^2, and plugging
in some numbers we can calculate the noise gain at a few spot frequen-
cies (assuming 48kHz sampling for convenience):
                                                          2
       f              w           w'          | 1 - 1/w' |
     ========================================================
       0   (0kHz)     0           1            0   (-inf dB)       
     1/8T  (6kHz)   pi/4T    exp(-j pi/4)     0.5  (  -3 dB) 
     1/6T  (8kHz)   pi/3T    exp(-j pi/3)      1   (   0 dB) 
     1/4T (12kHz)   pi/2T    exp(-j pi/2)      2   (  +3 dB) 
     1/2T (24kHz)   pi/T         -1            4   (  +6 dB) 

So immediately we have more noise at high frequency than at low
frequency and, therefore, we have succeeded in our aims so far.

One more thing has to concern us before going any further; we should
derive an expression for the total noise power gain of the noise-
shaper, so that we can predict the total noise power at the system
output. We do this by integrating the noise power gain over frequency
thus:
 
             w=2pi/T
        T     /              2
      -----   | | 1 - H(w') |  dw
       2pi    /
            w=0

where w' = exp(jwT) as before. If we have no noise shaper (ie H(w')=0)
then the integral is trivial and evaluates to unity. This therefore
should be our target for non-zero H(w'). It can be easily shown that

             w=2pi/T
         T    /             2
       -----  | | 1 - 1/w' |  dw  =  2
        2pi   /
            w=0

so the simple noise-shaping filter we designed above increases the
total noise power by 3dB.


4.  SPECIFIC NOISE SHAPERS
==========================

We will now consider two specific applications of noise shaping and
the general forms of typical filters for each case. The applications
in question are analogue to digital conversion, and CD mastering.

4.1 Analogue to digital conversion
==================================
There is a quantisation implicit in any A/D conversion since the
analogue input is continuous but the digital word is of finite
length. Noise shaping can be employed to get the high-resolution low
frequency performance required for digital audio from a low-resolution
high-frequency converter.

Employing a high oversampling factor (typically 64 to 512 times for
audio) effectively means that we want the least quantisation noise
near dc as that is where the desired signal is found. The simple
noise-shaper designed above with noise transfer function (1 - 1/z) has
this property, but is generally not effective enough.

Instead, we could design the filter such that the noise transfer
function becomes (1 - 1/z)^K where K is an integer equal to the order
of the noise shaper required. Expanding with the binomial theorem
enables the filter itself, H(z), to be calculated thus:

                        K                                
                      -----                              
               K       \        k      K!        -k      
      (1 - 1/z)    =     >  (-1)   -----------  z        
                       /            (K-k)! k!            
                      -----                              
                       k=0                               

                               K                                
                             -----              
                             \        k-1      K!       -k      
                   =  1  -     >  (-1)    -----------  z        
                             /             (K-k)! k!            
                             -----
                              k=1                              

Now this is in the form (1 - H(z)) and we can implement a suitable
filter H(z) trivially, as an FIR filter with coefficients given by 

                  k-1      K!
       b(k) = (-1)    ------------      ;  k = 1, 2, ... K 
                       (K-k)! k!

4.2 CD Mastering
================
If a 16-bit CD master is to be prepared from a master of higher
resolution (for example 20 bits) then we can use noise-shaping to
preserve a higher dynamic range where the ear is most sensitive, by
forcing the quantisation noise associated with the word-length
reduction into frequency bands where the ear is relatively
insensitive.

Experimental evidence shows the ear to be most sensitive at around
3kHz, and to have a second, smaller sensitivity peak around 12kHz. If
we arrange for the noise gain (1 - H(w')) to have corresponding dips
at these frequencies then there will appear to be less background hiss
despite a small overall noise power gain due to the noise-shaper.

Typically one would use an FIR filter of order ten to fifteen for this
type of audio work. The design of such a filter, whose shape cannot be
described conveniently in mathematical terms, is, in general,
impossible to solve analytically. Usually such a problem has to be
tackled by a computer using an iterative numerical method. This is not
a problem as it only has to be done once, at the design stage; after
that, a list of a dozen numbers is stored for use in the noise-shaper.

It is important to note that once the word-length has been reduced in
this way, subsequent operations in the digital domain will completely
undo all the benefits of the noise-shaping. This is due to the
inherent quantisation that exists in even a simple operation, such as
application of gain in the digital domain. For this reason, all
digital processing should be done to as high a resolution as is
possible; the very last step in the mastering process is then the
production of a sixteen bit master, using noise-shaping if desired.


5.  CONCLUSION
==============

We have seen that dithering is essential prior to quantisation, in
order to avoid signal distortion arising from correlated error
components. Triangular pdf dither is found to be suitable for audio
applications as it eliminates quantisation-related distortion, while
adding only a small amount of noise to the final signal.

We have seen that noise-shaping filters can be used to further reduce
the perceived signal degradation caused by quantisation, by
concentrating noise power either at frequencies where it will be
subsequently filtered out, or at the frequencies where the ear is
least sensitive.

We have derived principles for the design of noise-shaping filters for
discrete time systems such as digital audio, and discussed two
specific applications in this area, namely analogue to digital conver-
sion and wordlength reduction for CD mastering.