=============================================================== THE APPLICATION OF DITHERING AND NOISE-SHAPING TO DIGITAL AUDIO Christopher Hicks, June 1994. V0.9 =============================================================== 0. COPYRIGHT ============= Permission is hereby granted to distribute this document in electronic or magnetic form, and to make hard copies for private study, education or other non-commercial purposes, provided that both of the following conditions are adhered to: 1) the document must remain complete and unaltered, and 2) no charge may be levied for copying it, or for access to it. I reserve all other rights to this document. I accept no responsibility for damage or loss caused by inaccuracies contained herein. Christopher Hicks, June 1994. 1. INTRODUCTION ================ This article discusses in some detail the theory and practice of dithering and noise-shaping, with particular reference to the field of digital audio. Inevitably this involves some mathematics but it has been kept to a minimum. Furthermore the limitations of an ASCII distribution medium mean that fewer diagrams have been included than would have been ideal. 2. DITHERING ============= Quantisation of a signal is nothing more complex than rounding it to the nearest whole number. If the input to a quantiser x is between p and p+0.5, then we round down to p; if x lies between p+0.5 and p+1 then we round up to p+1. In general, the output step size (or quantum) need not be unity, but can be of arbitrary size q; in this case p is not an integer, but a whole multiple of q. 2.1 Non-dithered Quantisation ============================= Consider the following simple system, where x(n) is a discrete-time input signal, Q is a quantiser whose output is y(n). +-----+ x(n) ------------| Q |------------ y(n) +-----+ If x(n) is random and uncorrelated then at any given sample instant it is equally likely to lie anywhere in the interval (p - q/2) to (p + q/2). In other words, the quantisation error e(n) = y(n) - x(n) is random and uncorrelated, and has a rectangular distribution of peak value q/2. Elementary statistics show that the expected (or average) quantisation error is zero, and that the quantisation noise power is (q^2)/12. If however x(n) exhibits some correlation (as does music for example) then this simple analysis is no longer valid. The errors become correlated to the signal and show up as distortion rather than as broadband hiss. Analysis of this phenomenon is very difficult, but for audio this is of nought as it sounds terrible, and we need concern ourselves with it no longer; we have to look for a way around the problem, rather than analysing the problem itself. 2.2 Dithered Quantisation ========================= If we add a suitable uncorrelated random noise source to the input signal x, then the quantisation error becomes decorrelated from the signal. This is because the value of the quantiser input relative to its output steps no longer depends solely upon the input x, but also upon an uncorrelated random process d. d(n) | + |+ +-----+ x(n) -----O------| Q |------------ y(n) +-----+ The distribution of d is critical; it must effectively decorrelate the quantisation error from the input signal x, while adding a minimum of noise power to the output signal y. A zero-mean rectangular distribution of peak amplitude q/2 is effective at decorrelating the expected error (ie the first moment of the error) and this adds excess noise power of approximately (q^2)/12 to the output. This in conjunction with the noise contributed by the quantiser itself makes the total output noise power approximately (q^2)/6. However, this still is not adequate for high-quality audio as the output noise power (ie the second moment of the error) is still correlated with the signal. Adding a second uniform random variable removes this correlation. Now the total output noise power is (q^2)/4 which is three times the original of (q^2)/12. However we achieve more acceptable audio performance as the neither the expected error nor the error power depends upon the signal x. We can go on adding more random numbers in this fashion, and the effect of each is to decorrelate the next statistical moment of the error signal from the input. However, each also adds more noise power to the output signal, and two is found in practice to be the most satisfactory solution for audio work. The sum of these two uniform random variables has a zero-mean triangular distribution with peak value of q, and is therefore commonly referred to as "LSB TPDF dither" (Least Significant Bit Triangular Probability Density Function). 3. THEORETICAL ASPECTS OF NOISE SHAPING ======================================== Consider the addition of a feedback loop to the dithered quantiser to give the following system. d(n) | + u(n) |+ +-----+ x(n) ---------O---+---O------| Q |-----+------ y(n) - | | + +-----+ | | | - + | | +------------O---------+ | | | +------+ | +----| h(m) |----+ +------+ Here, x is the 'noiseless' input sample, d is the TPDF random dither process and y is the quantised output sample. Block Q represents the quantiser as before, and h(m) is a discrete-time filter which, as we will see, affects the spectrum of the error in the output y. The equations governing the behaviour of this system are u(n) = x(n) + ( y(n) - u(n) ) * h(m) y(n) = Q( u(n) + d(n) ), where '*' represents the discrete convolution operator, and Q(.) represents the quantisation function. As they stand, these equations are difficult to analyse because of the non-linearity introduced by the quantisation function. However, if the random dither sample d(n) is drawn from a suitable distribution then the combined effect of adding d(n) and then quantising is statistically equivalent to the addition of a different random variable e(n). We may therefore redraw the block diagram, and rewrite the system equations thus: e(n) | + u(n) + |+ x(n) ---------O---+------------O---------+------ y(n) - | | | | | - + | | +------------O---------+ | | | +------+ | +----| h(m) |----+ +------+ u(n) = x(n) - ( y(n) - u(n) ) * h(m) y(n) = u(n) + e(n) Now the system contains no non-linearities, and it therefore becomes useful to take Z-transforms, thereby converting the convolution operator to its equivalent multiplication. E(z) | + U(z) |+ X(z) ---------O---+------------O---------+------ Y(z) - | | + | | | - + | | +------------O---------+ | | | +------+ | +----| H(z) |----+ +------+ Now the system equations become U(z) = X(z) - (Y(z) - U(z)).H(z) Y(z) = U(z) + E(z) and rearranging to eliminate U gives Y(z) = X(z) + E(z).(1 - H(z)) Replacing z by w' = exp(jwT) (ie calculating spectra by evaluating the z-transform on the unit circle) gives Y(w') = X(w') + E(w').(1 - H(w')) (1) where w is the frequency in radians per second, and T is the sample period. Being engineers we use j = sqrt(-1), rather than i. So now we see that the spectrum of the input signal is unchanged at the output, but it has added to it a random process E whose spectrum has been modified by the function (1 - H(w')), where H(w') is the frequency response of the feedback filter in the noise shaper. 3.1 Statistics of E =================== It is sensible to use dither with a triangular distribution and peak value of q (the quantiser step-size), for the same reasons as in the non-noise-shaped case. It was shown that this results in a white quantisation error with zero mean, and of approximate power (q^2)/4. 3.2 Design of the filter H ========================== We have seen that the filter H affects the noise spectrum, and now the strategy is to design a filter such that the noise is moved to frequency bands where it does not matter. A first glance at equation 1 suggests that H(z) = 1 would be perfect, eliminating all the noise, since then (1 - H(z)) = 0. Alas this cannot be as it would result in a non-causal system; for causality, the filter must have a group delay of at least one sample period at all frequencies, which is equivalent to saying that we can have only negative powers of z in the filter transform. So the simplest filter we can use is a single sample delay, whose transfer function is 1/z; it is interesting to calculate the resulting noise spectrum for this case, since to implement this filter requires very little computation - one addition, and one subtraction per sample. We saw above that the output noise is given by N(z) = E(z).(1 - H(z)) and that replacing z by w' gives the equivalent spectral expression N(w') = E(w').(1 - H(w')). Additionally we saw that E(w') is constant (since E is a white process). Now we do the same again, with H(z) = 1/z. N(z) = E(z).(1 - 1/z) N(w') = E(w') . (1 - 1/w') The noise power gain is therefore given by | 1 - 1/w' |^2, and plugging in some numbers we can calculate the noise gain at a few spot frequen- cies (assuming 48kHz sampling for convenience): 2 f w w' | 1 - 1/w' | ======================================================== 0 (0kHz) 0 1 0 (-inf dB) 1/8T (6kHz) pi/4T exp(-j pi/4) 0.5 ( -3 dB) 1/6T (8kHz) pi/3T exp(-j pi/3) 1 ( 0 dB) 1/4T (12kHz) pi/2T exp(-j pi/2) 2 ( +3 dB) 1/2T (24kHz) pi/T -1 4 ( +6 dB) So immediately we have more noise at high frequency than at low frequency and, therefore, we have succeeded in our aims so far. One more thing has to concern us before going any further; we should derive an expression for the total noise power gain of the noise- shaper, so that we can predict the total noise power at the system output. We do this by integrating the noise power gain over frequency thus: w=2pi/T T / 2 ----- | | 1 - H(w') | dw 2pi / w=0 where w' = exp(jwT) as before. If we have no noise shaper (ie H(w')=0) then the integral is trivial and evaluates to unity. This therefore should be our target for non-zero H(w'). It can be easily shown that w=2pi/T T / 2 ----- | | 1 - 1/w' | dw = 2 2pi / w=0 so the simple noise-shaping filter we designed above increases the total noise power by 3dB. 4. SPECIFIC NOISE SHAPERS ========================== We will now consider two specific applications of noise shaping and the general forms of typical filters for each case. The applications in question are analogue to digital conversion, and CD mastering. 4.1 Analogue to digital conversion ================================== There is a quantisation implicit in any A/D conversion since the analogue input is continuous but the digital word is of finite length. Noise shaping can be employed to get the high-resolution low frequency performance required for digital audio from a low-resolution high-frequency converter. Employing a high oversampling factor (typically 64 to 512 times for audio) effectively means that we want the least quantisation noise near dc as that is where the desired signal is found. The simple noise-shaper designed above with noise transfer function (1 - 1/z) has this property, but is generally not effective enough. Instead, we could design the filter such that the noise transfer function becomes (1 - 1/z)^K where K is an integer equal to the order of the noise shaper required. Expanding with the binomial theorem enables the filter itself, H(z), to be calculated thus: K ----- K \ k K! -k (1 - 1/z) = > (-1) ----------- z / (K-k)! k! ----- k=0 K ----- \ k-1 K! -k = 1 - > (-1) ----------- z / (K-k)! k! ----- k=1 Now this is in the form (1 - H(z)) and we can implement a suitable filter H(z) trivially, as an FIR filter with coefficients given by k-1 K! b(k) = (-1) ------------ ; k = 1, 2, ... K (K-k)! k! 4.2 CD Mastering ================ If a 16-bit CD master is to be prepared from a master of higher resolution (for example 20 bits) then we can use noise-shaping to preserve a higher dynamic range where the ear is most sensitive, by forcing the quantisation noise associated with the word-length reduction into frequency bands where the ear is relatively insensitive. Experimental evidence shows the ear to be most sensitive at around 3kHz, and to have a second, smaller sensitivity peak around 12kHz. If we arrange for the noise gain (1 - H(w')) to have corresponding dips at these frequencies then there will appear to be less background hiss despite a small overall noise power gain due to the noise-shaper. Typically one would use an FIR filter of order ten to fifteen for this type of audio work. The design of such a filter, whose shape cannot be described conveniently in mathematical terms, is, in general, impossible to solve analytically. Usually such a problem has to be tackled by a computer using an iterative numerical method. This is not a problem as it only has to be done once, at the design stage; after that, a list of a dozen numbers is stored for use in the noise-shaper. It is important to note that once the word-length has been reduced in this way, subsequent operations in the digital domain will completely undo all the benefits of the noise-shaping. This is due to the inherent quantisation that exists in even a simple operation, such as application of gain in the digital domain. For this reason, all digital processing should be done to as high a resolution as is possible; the very last step in the mastering process is then the production of a sixteen bit master, using noise-shaping if desired. 5. CONCLUSION ============== We have seen that dithering is essential prior to quantisation, in order to avoid signal distortion arising from correlated error components. Triangular pdf dither is found to be suitable for audio applications as it eliminates quantisation-related distortion, while adding only a small amount of noise to the final signal. We have seen that noise-shaping filters can be used to further reduce the perceived signal degradation caused by quantisation, by concentrating noise power either at frequencies where it will be subsequently filtered out, or at the frequencies where the ear is least sensitive. We have derived principles for the design of noise-shaping filters for discrete time systems such as digital audio, and discussed two specific applications in this area, namely analogue to digital conver- sion and wordlength reduction for CD mastering.