Introduction to DSP

DSP processors store data in fixed or floating point formats.

It is worth noting that fixed point format is not quite the same as integer:

The integer format is straightforward: representing whole numbers from 0 up to the largest whole number that can be represented with the available number of bits. Fixed point format is used to represent numbers that lie between 0 and 1: with a 'binary point' assumed to lie just after the most significant bit. The most significant bit in both cases carries the sign of the number.

The size of the fraction represented by the smallest bit is the precision of the fixed point format.
The size of the largest number that can be represented in the available word length is the dynamic range of the fixed point format

To make the best use of the full available word length in the fixed point format, the programmer has to make some decisions:

If a fixed point number becomes too large for the available word length, the programmer has to scale the number down, by shifting it to the right: in the process lower bits may drop off the end and be lost
If a fixed point number is small, the number of bits actually used to represent it is small. The programmer may decide to scale the number up, in order to use more of the available word length

In both cases the programmer has to keep a track of by how much the binary point has been shifted, in order to restore all numbers to the same scale at some later stage.

Floating point format has the remarkable property of automatically scaling all numbers by moving, and keeping track of, the binary point so that all numbers use the full word length available but never overflow:

Floating point numbers have two parts: the mantissa, which is similar to the fixed point part of the number, and an exponent which is used to keep track of how the binary point is shifted. Every number is scaled by the floating point hardware:

If a number becomes too large for the available word length, the hardware automatically scales it down, by shifting it to the right
If a number is small, the hardware automatically scale it up, in order to use the full available word length of the mantissa

In both cases the exponent is used to count how many times the number has been shifted.

In floating point numbers the binary point comes after the second most significant bit in the mantissa.

The block floating point format provides some of the benefits of floating point, but by scaling blocks of numbers rather than each individual number:

Block floating point numbers are actually represented by the full word length of a fixed point format.

If any one of a block of numbers becomes too large for the available word length, the programmer scales down all the numbers in the block, by shifting them to the right
If the largest of a block of numbers is small, the programmer scales up all numbers in the block, in order to use the full available word length of the mantissa

In both cases the exponent is used to count how many times the numbers in the block have been shifted.

Some specialised processors, such as those from Zilog, have special features to support the use of block floating point format: more usually, it is up to the programmer to test each block of numbers and carry out the necessary scaling.

The floating point format has one further advantage over fixed point: it is faster. Because of quantisation error, a basic direct form 1 IIR filter second order section requires an extra multiplier, to scale numbers and avoid overflow. But the floating point hardware automatically scales every number to avoid overflow, so this extra multiplier is not required:

DSP processors: Data formats