We will not discuss the RCA here as it is discussed in the tutorial of combinational circuits. But it is used in implementing other adders.

Table 1: Truth table of full adder

 ROW A B Cin S Cout 1 0 0 0 0 0 2 0 0 1 1 0 3 0 1 0 1 0 4 0 1 1 0 1 5 1 0 0 1 0 6 1 0 1 0 1 7 1 1 0 0 1 8 1 1 1 1 1

From the truth table of the full adder, some conclusions can be made. In the first two rows output carry ($C_{out}$) is always zero. In the rows from 3 to 6, input carry ($C_{in}$) propagates to output carry. And in the last two rows, $C_{out}$ is one. These are the conditions for output carry. The general Boolean expression for the output carry can be written in the following style.

$G_i = A_iB_iC_{in}+A_iB_i\overline{C_{in}}=A_iB_i$

$P_i = A_i\mathbin{\oplus}B_i$

$C_{i+1}(C_{out}) = G_i+P_iC_i$

An alternate equation of propagated carry is

$P_i = A_i+B_i$

Both the equations can be used. The first equation is hardware efficient but the second equation is more popular as only a ‘OR ’ gate is required. The use of the second equation will be discussed later. Until now stick to the first one.

The variable $G_i$ represents the generated carry and $P_i$ is the propagation condition. In the rows from 3 to 6, when $C_i$ is true $C_{out}$ is equal to $C_{in}$. Here i = 0, 1… (n-1), where n is data width. Initially $C_0$ is equal to $C_{in}$. Then the equation for sum (S) is

$S_i = A_i\mathbin{\oplus}B_i\mathbin{\oplus}C_i$

The architecture for the 4-bit CLA is given below.

Figure 1: 4-bit CLA

Carry look ahead adders proves to be very fast regardless of the data width. It takes $5t_g$ delay to compute sum of numbers of any width where $t_g$ is the delay of a single gate. But for higher data width the hardware complexity to generate carry is extremely high and thus area increases. To reduce the hardware overhead, several techniques are proposed to efficiently design the carry generation block for higher order adders. Higher order adders can be implemented in terms of smaller adders. Like a 64-bit adder can be implemented using 16 numbers of 4-bit adders. Generally, 4-bit adders are taken as the basic building block.

In a 4-bit adder, 4-bits are group together and so the corresponding signals can also be grouped. The equations for block propagated carry, block generated carry and carryout are given below.

$P_{i:j} = P_{i}$ for  $i = j$

$P_{i:j} = P_iP_{i-1:j}$ for  $i > j$

$G_{i:j} = G_{i}$for  $i = j$

$G_{i:j} = G_{i}+P_iG_{i-1:j}$ for  $i > j$

$C_{i+1} = G_{i-1:j}+P_{i-1:j}C_j$

First consider an 8-bit adder is to be implemented using CLA. It can be designed using two 4-bit adders.

Figure 2: 8-bit adder using CLA

Go to the Top

The carry Generation Unit in the Carry look adders is the most critical. Several tree based designs are available in literature for the carry generation unit. The tree based designs are implemented using parallel prefix circuit. A prefix circuit receives inputs $x_1$, $x_2$,…, $x_n$ and generates outputs like $x_1$, $x_2\circ x_1$,….$x_n\circ x_{n-1}$….$\circ x_1$ where $\circ$ is symbol for binary associative operation. Carry look ahead adders with prefix circuit are sometimes called as prefix adders. To understand the prefix tree adders a new associative binary function is defined.

$(P_{i:m},G_{i:m})\circ(P_{m-1:j},G_{m-1:j})=(P_{i:m}P_{m-1:j},G_{i:m}+P_{i:m}G_{m-1:j})$

For n=4, associative operation will be

$(P_{3:2},G_{3:2})\circ(P_{1:0},G_{1:0})=(P_{3:2}P_{1:0},G_{3:2}+P_{3:2}G_{1:0})$

Some of the popular prefix adders are discussed in this tutorial. In the figures, the empty circles represents the block which generates the individual propagate and generate conditions and the bullets represents the block which performs the associative binary operation which is discussed above.

Brent-Kung Parallel Prefix Tree-

Figure 3: Brent-Kung Parallel Prefix Tree

• Total stages – $(2\log_2 n – 1)$ (Higher Depth and so delay is higher)
• Fan out = 2 at each stage, avoids explosion of wires, Odd computation

Figure 4: Ladner-Fischer Parallel Prefix Tree

• Stages – $\log_2 n$(Low depth).
• Total nodes = $n/2\log_2 n$
• Nodes are having high fan-out.

• Stages – $\log_2 n$ (Low depth).
• Total Nodes = $(n-1)+(n-2)+(n-4)+(n-8)+….$ (Higher Area, Long Wires)
• Fan out is minimum – 2 at each stage. (Faster Performance)

• Stages – $\log_2 n+1$ (Moderate Stage)
• Hybrid Adder, 1st stage Kogge-Stone and 2nd stage is Brent-kung adder.
• Efficient and suitable for VLSI implementation.

Manchester Carry Chain Module (MCC)

The Manchester carry chain is a variation of the carry-lookahead adder that uses shared logic to lower the transistor count. A Manchester carry chain generates the intermediate carries by tapping off nodes in the gate that calculates the most significant carry value. However, not all logic families have these internal nodes, CMOS being a major example. Dynamic logic can support shared logic, as can transmission gate logic. One of the major drawbacks of the Manchester carry chain is that the capacitive load of all of these outputs, together with the resistance of the transistors causes the propagation delay to increase much more quickly than a regular carry lookahead. A Manchester-carry-chain section generally doesn’t exceed 4 bits. The Manchester carry scheme for the group of 4 is shown below.

Figure 7: Manchester Carry Chain Scheme

The Sarry Skip Adder, also known as carry bypass adder, based on the similar carry propagation criteria. If the propagation criterion is satisfied, the input carry is passed to the output. Thus the status of output carry is evaluated using propagation criteria. The simple carry skip adder for n=8 is shown in figure 8. The performance of the carry skip adder scheme depends on the size of carry skip adder block size. Here 8-bit adder is implemented using the block size of 4-bit. The condition for the block size is  $\sqrt{n/2}$. The variable block size is adopted to achieve fast addition.

Go to the Top

The design of Carry Increment Adder (CIA) consists of an adder block (CLA, RCA) and an incremental circuitry. The incremental circuit can be designed using HA’s in ripple carry chain with a sequential order. For example, the addition operation for 8-bit data can be done by dividing the total number of bits in two groups of 4bits and addition operation is done using 4bit RCA or CLA. This fast addition technique is not much popular since a carry chain still exists. The architecture of CIA is shown in figure 9.

Go to the Top

The carry select adder consists of two addition paths. One path calculates sum considering carry in is equal to zero and the other path calculates sum with carry in is equal to one. After the sum is calculated, correct sum and correct carry out is selected through a mux. Thus it has two adder blocks and two mux units. The adder block can be an RCA or a CLA. The size of the carry select block can be fixed or can be variable. For fixed-size carry select adder, optimum delay occurs when block size is $\sqrt{n}$ . For example, the block size is 4 for n = 16. So, there will be two 4-bit RCA/CLA in a carry select adder block. Thus carry select adders achieve fast addition by consuming more hardware. The simple block diagram of a carry select adder is shown in figure 10.

Figure 10: 8-bit Carry Select Adder

If the block size is fixed then carry select adder is called linear. For example, a 16-bit adder can be realized using block sizes 4-4-4-4. On the other hand to improve the performance non-linear carry select adders choose variable block sizes. For example, the same 16-bit adder can be implemented by block sizes 2-2-3-4-5.

The idea of carry select adder is behind the idea of fast conditional sum adders. An n-bit adder can be designed using smaller n/2 or n/4 bit adders using the same carry select concept. For example, a 4-bit adder can be built using seven 1-bit adders. This example is shown in figure 11.

Figure 11: 4-bit Conditional Sum Adder

Ling adders are variations of carry-look ahead adders. It uses the simpler equation of group generated carry and thus resulting fast addition. The equation for the output carry of a 4-bit adder block can be expressed as

$C_{4} = G_{3:0}=G_{3:2}+P_{3:2}G_{1:0}$

Assume,  $C_0=0$ or $G_0=A_0B_0+P_0C_0$. The equation of group generated carry in a block of 4-bit adder is

$G_{3:0} = G_{3}+P_3G_2+P_{3}P2(G_1+P_1G_0)$

Assume As, $P_i=A_i+B_i$ , $G_3P_3=G3$ . So, in the above equation $P_3$ can be taken as common factor and the equation be re-written as

$G_{3:0} = G_{3}H_{3:0}$

Where  $H_{3:0}$ can be written as

$H_{3:0} = H_{3:2}+P_{2:1}H_{1:0}$

And

$H_{3:2} = G_{3}+G_2,H_{1:0}=G_1+G_0$

The general expression is

$H_{i:0} = G_1+P_{i-1}H_{i-1:0}$

The sum is calculated by the following equation

$S_{i} = C_i\mathbin{\oplus}(A_i\mathbin{\oplus}B_i)=P_{i-1}H_{i-1:0}(A_i\mathbin{\oplus}B_i)$

$= \overline{H_{i-1:0}}A_iC_i\mathbin{\oplus}(A_i\mathbin{\oplus}B_i)+H_{i-1:0}(P_{i-1}(A_i\mathbin{\oplus}B_i))$

The calculation of  $H_{i-1:0}$ is faster than calculation of $C_i$  which reduces the delay in calculating sum.

A hybrid adder is which uses two or more above kind of Addition techniques to implement higher order adder. Generally, a hybrid adder employs one kind of adder for generating the carry and another kind for computing sum. For example, Manchester Carry Chain (MCC) can be used for Carry generation and Carry select Adder can be used for sum calculation.

Up to now, we have discussed fast addition with two operands. If several operands are to be added then these adders are needed to be used several times. Direct application of the adders is in the multiplication process where several operands are to be added. Carry save addition is one such technique which reduces the complexity involved in multi-operand addition.

Carry Save Adder (CSA) is actually a three input adder which receives three operands and produces two outputs. For 1-bit data, CSA is actually a full adder. It sometimes called as a 3:2 counter as it compresses three inputs to two outputs. The general equation for a CSA is given below

$x+y+z=2c+s$

Where s and c are the sum and carry outputs from CSA. They are evaluated as

$s = (x+y+z)mod\hspace{0.1pt}2$   and   $c = \frac{(x+y+z)-s}{2}$

A simple CSA based addition of three operands is shown below. Here we need one stage CSA and one Carry Propagated Adder (CPA).

For fast addition of multi operands tree of CSAs can be formed. An example of the addition of seven 8-bit operands is shown below. The scheme mentioned below to add seven operands is popularly known as Wallace Tree. Five CSA modules and one CPA module are used here.

Figure 13: 7 operands addition using CSA

The scheme of the fast addition of multiple operands is needed to be understood. A basic scheme of the addition of 7 four bit numbers is shown below using Wallace tree addition method.

Figure 13: Scheme for the addition of 7 operands

There are other type CSA trees are available for multi-operand addition. We will discuss them in more detail in the tutorial for multiplication.

Though BCD addition does not belong to the category of fast addition, we discussed it here as it is also a type of adder. In BCD addition inputs are in BCD format and the output is also in BCD format. In BCD format, counting is done from 0 to 9. If the summation result is greater than 9, then correction is needed. And that correction is done by adding 6 to the summation result. The equation for the correction logic is

$C_c = C_3+S_3(S_2+S_1)$

Here $C_3$ is the carryout of the last adder in the first stage. The logic diagram is shown below in figure 3.

Conclusion

We have discussed several techniques for fast addition in this tutorial but a proper comparison not shown. So that the general question arises that which adder is suitable for system design? For two operands Carry Lookahead Adder performs well. For higher data width, CLA with prefix adders is a common choice. In signal processing, vector multiplication is a common operation to be performed and it involves multipliers and several multi-operand additions. Carry save adders are always suitable for these type of multi-operand addition. We will discuss further multi-operand addition techniques in the tutorial for multiplication.