Several techniques are suggested for partial products accumulation. Some of them targets to reduce the logic elements to reduce hardware complexity whereas some of them targets to reduce numbers of levels in the tree of partial products to achieve high speed. As the number of levels increases irregularity in the design also increases. The irregularities in the design create problem in generating area efficient layout. Thus some techniques tried to achieve more modular architectures for easy implementation.
Previously efficient accumulation of partial products is achieved by performing carry save addition using 2:2 or 3:2 counters. More reduction of levels is possible using compressors such as 4:2 compressor and 7:2 compressor. A basic 4:2 compressor operates on four operands and produce two results. The advantage of using a compressor is that is not a function of so that ripple carry effect is eliminated. Thus compressor has lower overall delay. A simple design of 4:2 compressor using 3:2 counters is shown in Figure 1.
A 4:2 compressor can designed as a multilevel circuit as shown in Figure 2. This type of design shows lesser delay compared to the compressor circuit using 3:2 counters.
The delay of the compressor circuit using 3:2 counter is of maximum four XOR gates whereas there are three XOR gates in the critical path of the compressor circuit of Figure 2. Many realizations of compressor is possible but all the compressor circuits should follow the following equation.
The use of compressors reduces the number of levels in the accumulation process. Also, the delay of a 4:2 compressor is 1.5 times that of a 3:2 counter. The accumulation of partial products using compressor circuits is supposed to be faster than the accumulation using 3:2 counters. But this is not always true in all cases.
Several techniques are suggested to improve the performance of the CSA based accumulation of partial products using 3:2 counters. The objective is to either reduce the overall delay by reducing number of levels or to obtain a more regular structure. Three such techniques are shown in this tutorial.
- Firstly the Wallace tree structure as shown in Figure 3 provides regular structure. It uses total 16 CSA blocks and it has 6 levels. In the layout perspective, the Wallace tree uses 6 wiring tracks between adjacent bit slices.
- Researches suggested overturned-stair trees shown in Figure 4 to reduce the wiring tracks from 7 to 3. Overturned-stair trees are more regular, uses same number of CSA modules and have same number of levels.
- Further a balance tree structure is suggested as shown in Figure 5. The balance tree structures have highest delay due to presence of 7 levels but requires only two wiring tracks.
Similarly a compressor tree is shown in Figure 6. The compressor tree needs only 4 levels to operate on the (17-32) operands whereas CSA based tree takes (6-8) levels. Thus it can be said that compressor tree provides low overall delay for higher number of operands.