In the previous tutorials, we have discussed techniques which can be used to achieve fast multiplication. But when the multiplicand and the multiplier are same, there must be some way to simplify the implementation. Thus squaring operation does not require the full length hardware of a multiplier. In applications where a squaring operation is required, a dedicated square block can be used.

In Figure 1, the squaring operation for two bits is shown. Here the simplification of the result is shown.

Further the squaring operation by a dedicated square block for 8-bits is shown in Figure 2. The first array of partial products shows the original structure. The second array shows the rearrangement of the the previous array.

The logic for simplification is also shown in Figure 2. The fast multiplication techniques can be applied to the array of partial products as shown in the previous tutorials. It is some times requires to arrange the partial products so as the general techniques can be applied. Thus the array of partial products is arranged. Though it is not necessary.

Figure 3 shows a possible architecture of the dedicated square block. Here the so called CSA optimization techniques are not applied. The partial products are added in parallel to increase speed. The values of *m, n, p* are 13, 5, 11. Though this is not an optimized architecture but simple to implement.

Click here to download the code.