# Floating Point Addition and Subtraction

Compared to a fixed point addition and subtraction, a floating point addition and subtraction is more complex and hardware consuming. This is because exponent field is not present in case of fixed point arithmetic. A floating point addition of two numbers and can be expressed as Here, it is considered that . In this case, represents the right shifted version of by bits. Similar operation is carried out for . Thus floating point addition and subtraction is not as simple as fixed point addition and subtraction.

The major steps for a floating point addition and subtraction are

• Extract the sign of the result from the two sign bits.
• Subtract the two exponents and . Find the absolute value of the exponent difference ( ) and choose the exponent of the greater number.
• Shift the mantissa of the lesser number by bits Considering the hidden bits.
• Execute addition or subtraction operation between the shifted version of the mantissa and the mantissa of the other number. Consider the hidden bits also.
• Normalization for addition: In case of addition, if there is an carry generated then the result right shifted by 1-bit. This shift operation is reflected on exponent computation by an increment operation.
• Normalization for subtraction: A normalization step is performed if there are leading zeros in case of subtraction operation. Depending on the leading zero count the obtained result is left shifted. Accordingly the exponent value is also decremented by the number of bits equal to the number of leading zeros.

• Representation: The input operands are represented as and • Sign extraction: As both the numbers are positive then sign of the output will be positive. Thus S = 0.
• Exponent subtraction: and . Thus result of the subtraction is E = 0001.
• Shifting of mantissa of lesser number: The mantissa is shifted by 1 bit right and the result is .
• Result of the mantissa addition is 000010000000 and generates a carry. This means the result is greater than 1.
• The output of the adder is right shifted and the exponent value is incremented to get the correct results. The new mantissa value is now 00001000000 choosing the last 11-bits from the LSB and exponent is 1010.
• The final result is 0_1010_00001000000 which is equivalent to 8.25 in decimal.

Example: Floating Point Subtraction

• Representation: The input operands are represented as and .
• Sign extraction: As sign of is negative and is greater thus S = 1.
• Exponent subtraction: and . Thus result of the subtraction is .
• Shifting of mantissa of lesser number: The mantissa is shifted by 2 bit right and the result is .
• Result of the mantissa subtraction is . This leading zero indicates that the result is lesser than 1.
• The output of the adder is left shifted by 1 bit as there is one leading zero and the exponent value is decremented by 1-bit to get the correct results. The new mantissa value is now choosing the last 11-bits from the LSB and exponent is 1001.
• The final result is which is equivalent to -5.0625 in decimal.

A simple architecture of a floating point adder is shown below in Figure 1.