Floating Point to Fixed Point Conversion

The floating point to fixed point conversion is necessary to interface a floating processor to a fixed point implementation. So that there maintains a smooth transition from one type of architecture to another.

The steps involved in this conversion are

  • Concatenate the hidden bit and add leading zeros according to the fixed point length.
  • Find the absolute difference between the exponent of the floating point number and bias.
  • If E>bias left shift the input number by their absolute difference. Otherwise if E<bias right shift is executed.
  • Finally, invert the number if sign bit is 1.

Example: Floating Point to Fixed Point Conversion

  • Input data is represented in floating point as a = 0\_1011\_01000000000 .
  • Prepare the data as 0000\_1\_01000000000 for 16-bit fixed point representation with 6 integer bits.
  • The difference between exponent and bias is 1011 - 0111 = 0100 and exponent is greater than the bias.
  • Left shift the number by 4-bit. Result is 1010000000000000 .
  • As the sign bit 0, no need of inversion. Discard the LSB and concatenate the sign bit at the MSB side. The final output is 0101000000000000 .

In this above example, the LSB is discarded to fit the result in 16-bit format. Thus there exists error in the conversion process. Only a certain range of floating point representation can be represented in fixed point for same word length. An architecture for this conversion is shown in Figure 1. Here two variable shifters are used, viz, VLSH and VRSH. VLSH does the right shifting like the VRSH block. Two 4-bit adder/subtractors and one 16-bit adder/subtractor is used. This architecture can be adopted for any word length.

Figure 1: An Basic Scheme for Floating Point to Fixed Point Conversion
Shopping Basket