Floating Point to Fixed Point Conversion

The floating point to fixed point conversion is necessary to interface a floating processor to a fixed point implementation. So that there maintains a smooth transition from one type of architecture to another.

The steps involved in this conversion are

Concatenate the hidden bit and add leading zeros according to the fixed point length.
Find the absolute difference between the exponent of the floating point number and bias.
If $E>bias$ left shift the input number by their absolute difference. Otherwise if $E<bias$ right shift is executed.
Finally, invert the number if sign bit is 1.

Example: Floating Point to Fixed Point Conversion

Input data is represented in floating point as $a = 0\_1011\_01000000000$ .
Prepare the data as $0000\_1\_01000000000$ for 16-bit fixed point representation with 6 integer bits.
The difference between exponent and bias is $1011 - 0111 = 0100$ and exponent is greater than the bias.
Left shift the number by 4-bit. Result is $1010000000000000$ .
As the sign bit 0, no need of inversion. Discard the LSB and concatenate the sign bit at the MSB side. The final output is $0101000000000000$ .

In this above example, the LSB is discarded to fit the result in 16-bit format. Thus there exists error in the conversion process. Only a certain range of floating point representation can be represented in fixed point for same word length. An architecture for this conversion is shown in Figure 1. Here two variable shifters are used, viz, VLSH and VRSH. VLSH does the right shifting like the VRSH block. Two 4-bit adder/subtractors and one 16-bit adder/subtractor is used. This architecture can be adopted for any word length.

Related Posts