Sorting Processor Design to Sort a Serial Stream

Till now the sorting architectures discussed are based on the accessing of data elements in parallel. In real time situation the data streams are serial and serial to parallel conversion is costly as well as time consuming. On the other hand, parallel sorting architectures are very costly in terms of comparators. Thus alternate sorting architectures should be adopted in case of huge data stream. Here, we will discuss a sorting processor which is based on the insertion sort technique and works on a serial stream of M data words.

In insertion sort technique, values from the unsorted part is inserted at the correct position of the sorted part. Two data elements are sorted in similar way as it was done previously but the BN block is different from the BN block discussed in previous tutorials. Architecture of this BN block is shown in Figure 1. Maximum of two elements is stored in a register and again fed to the comparator. One rst signal is there to clear the contents of the registers as per requirement.

Maximum value finder
Figure 1: Architecture of the BN block for the Sorting Processor

Insertion sorting algorithm based sorting network to sort N data elements is shown in Figure 2. Here, N number of BN blocks are used. Serial data is fed to the C_0 pin. The BN_1 block sorts two data and maximum of them is A_1. The minimum value from the BN_1 block is passed to the next block through a MUX. If a data greater than the present value of A_1 occurs at input of BN_1 then this value will be assigned to A_1 and the past value of A_1 is passed to the next BN block. Similar operation is carried out for the other BN blocks. After the M clock cycles A_1 holds the maximum of M data elements. Similarly values of A_2, A_3, … , A_N are evaluated. The Sorting Network-N block takes (N+N) clock cycles to sort N data elements.

Sorting a serial data stream
Figure 2: Architecture of the Sorting Network-N using BN blocks

The next job is to sort the (M-N) elements. The sorting processor architecture is built using the Sorting Network-N block and is shown in Figure 3. The sorted elements can be stored in an output memory through a MUX unit. Here, unsorted elements are again fed to the Sorting Network-N through the C_N net. The unsorted elements are again sorted serially using the same block. After N clock cycles A_1 holds the maximum value of the unsorted part. This way the whole array of M words is sorted.

sorting processor
Figure 3: Data path of the insertion sort algorithm based sorting processor

This sorting processor sorts a serial array of data elements and results sorted array array of N elements at a time. Apart from the MUX and registers, the sorting processor mainly consists of N BN blocks. If the value of N is to be increased, the number of BN blocks will increase. The timing complexity of this sort processor can be estimated by expressing M in terms of N as M = k\times N. Then total timing complexity will be T_{sort} = M+(M-N)+(M-2N)+...+(M-(k-1)N)+N. If the value of N is 8 then total time to sort 32 data elements will be 88 number of clock cycles. This is obviously higher than the processing time of the parallel sorting processors but this sorting processor is hardware efficient. The XILINX output of this sorting processor is shown in Figure 4.

Figure 4: XILINX Output.

Shopping Basket