The parallel sort architecture described in [1] for n=8 has a regular structure. This structure can be divided in similar stages. This structure consumes 23 basic node blocks but sorts 8 data elements in just 1 iteration. A serial sort architecture can be derived from this structure which sorts 8 data elements in some iterations. Here a basic sub block is used designed which is reused in every iteration. The structure of this sub block is shown in Figure 1.
This sub bloc consumes only 7 BN blocks. The BN blocks are described in [1]. The serial sort architecture is shown below in Figure 2.
Here, in this serial sort architecture, inputs are fed to the sub block through 8 MUXes. Initially a start signal selects the inputs and fed them to the sub block in the first iteration. In the next iterations, output of the sub blocks are fed to the sub block. After some iterations, the 8 data elements are sorted. For 8 elements, four iterations are sufficient to run. This serial structure definitely takes more time to sort but uses very less resources. Thus can be used in architectures where less resources must be used.
[1]. Parallel Sorting
Verilog Code of Serial Sort Architecture (3293 downloads )