scispace - formally typeset
Search or ask a question

Showing papers on "Bitonic sorter published in 1999"


DissertationDOI
01 Jan 1999
TL;DR: This thesis presents efficient algorithms for internal and external parallel sorting and remote data update and examines a number of related algorithms for text compression, differencing and incremental backup.
Abstract: This thesis presents efficient algorithms for internal and external parallel sorting and remote data update. The sorting algorithms approach the problem by concentrating first on highly efficient but incorrect algorithms followed by a cleanup phase that completes the sort. The remote data update algorithm, rsync, operates by exchanging block signature information followed by a simple hash search algorithm for block matching at arbitrary byte boundaries. The last chapter of the thesis examines a number of related algorithms for text compression, differencing and incremental backup.

431 citations


Journal ArticleDOI
TL;DR: This paper shows how futures (a parallel language construct) can be used to implement pipelining without requiring the user to code it explicitly, allowing for much simpler code and more asynchronous execution.
Abstract: Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 2-3 trees. The approach was later used by Cole in the first O( lg n) time sorting algorithm on the PRAM not based on the AKS sorting network, and has since been used to improve the time of several other algorithms. Although the approach has improved the asymptotic time of many algorithms, there are two practical problems: maintaining the pipeline is quite complicated for the programmer, and the pipelining forces highly synchronous code execution. Synchronous execution is less practical on asynchronous machines and makes it difficult to modify a schedule to use less memory or to take better advantage of locality. In this paper we show how futures (a parallel language construct) can be used to implement pipelining without requiring the user to code it explicitly, allowing for much simpler code and more asynchronous execution. A runtime system manages the pipelining implicitly. As with user-managed pipelining, we show how the technique reduces the depth of many algorithms by a logarithmic factor over the nonpipelined version. We describe and analyze four algorithms for which this is the case: a parallel merging algorithm on trees, parallel algorithms for finding the union and difference of two randomized balanced trees (treaps), and insertion into a variant of the PVW 2-3 trees. For three of these, the pipeline delays are data dependent making them particularly difficult to pipeline by hand. To determine the runtime of algorithms we first analyze the algorithms in a language-based cost model in terms of the work w and depth d of the computations, and then show universal bounds for implementing the language on various machine models.

46 citations


Journal ArticleDOI
TL;DR: This work proposes a simple sorting architecture whose main feature is the pipelined use of a sorting network of fixed I/O size p to sort an arbitrarily large data set of N elements and shows that by using the design N elements can be sorted in /spl Theta/(N/p log N/p) time without memory access conflicts.
Abstract: Sorting networks of fixed I/O size p have been used, thus far, for sorting a set of p elements. Somewhat surprisingly, the important problem of using such a sorting network for sorting arbitrarily large datasets has not been addressed in the literature. Our main contribution is to propose a simple sorting architecture whose main feature is the pipelined use of a sorting network of fixed I/O size p to sort an arbitrarily large data set of N elements. A noteworthy feature of our design is that no extra data memory space is required, other than what is used for storing the input. As it turns out, our architecture is feasible for VLSI implementation and its time performance is virtually independent of the cost and depth of the underlying sorting network. Specifically, we show that by using our design N elements can be sorted in /spl Theta/(N/p log N/p) time without memory access conflicts. Finally, we show how to use an AT/sup 2/-optimal sorting network of fixed I/O size p to construct a similar architecture that sorts N elements in /spl Theta/(N/p log N/p log p) time.

30 citations


Proceedings ArticleDOI
01 Jun 1999
TL;DR: The simple randomized merging (SRM ) mergesort algorithm proposed by Barve et al. is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice.
Abstract: External sorting—the process of sorting a file that is too large to fit into the computer's internal memory and must be stored externally on disks—is a fundamental subroutine in database systems[G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM ) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [K,Section 5.4.9] recently identified SRM (which he calls ``randomized striping'') as the method of choice for sorting with parallel disks.

16 citations


Proceedings ArticleDOI
22 Aug 1999
TL;DR: This paper investigates the fault-tolerance properties of a special class of sorting networks called the odd-even transposition sorting networks, which have a simple and reliable hardware structure, which is easy to implement with VLSI technology.
Abstract: Sorting networks are important hardware and software models of parallel sorting operations. They have several applications such as ATM switching, distributed processing, and optical implementation of sorting. In this paper we investigate the fault-tolerance properties of a special class of sorting networks called the odd-even transposition sorting networks. These networks have a simple and reliable hardware structure, which is easy to implement with VLSI technology. A simulation program of these networks' operation has been developed in C++. The simulation results revealed two important properties of odd-even transposition sorting networks: Any single stuck-at-X fault occurring in an internal comparator is redundant. And any two stuck-at-X faults occurring in a large number of internal comparators is redundant.

7 citations


Journal ArticleDOI
TL;DR: It is shown that ZZ-sort can be used to convert a non-adaptive parallel sorting algorithm into an in-place and adaptive one by considering the problem of sorting an arbitrarily large input on fixed-size reconfigurable meshes.
Abstract: We present a simple and general parallel sorting scheme, ZZ-sort, which can be used to derive a class of efficient in-place sorting algorithms on realistic parallel machine models. We prove a tight bound for the worst case performance of ZZ-sort. We also demonstrate the average performance of ZZ-sort by experimental results obtained on a MasPar parallel computer. Our experiments indicate that ZZ-sort can be incorporated into a distributed memory parallel computer system as a standard routine, and this routine is useful for space critical situations. Finally, we show that ZZ-sort can be used to convert a non-adaptive parallel sorting algorithm into an in-place and adaptive one by considering the problem of sorting an arbitrarily large input on fixed-size reconfigurable meshes.

6 citations



Journal ArticleDOI
TL;DR: Ak-bitonic sort which generalizes the bitonic sort is proposed which merges two monotonic sequences into one order sequence and is the Batcher's bitonicsort whenk=1.
Abstract: Ak-bitonic sort which generalizes the bitonic sort is proposed. The theorem of the bitonic sort, which merges two monotonic sequences into one order sequence, is extended into the theorem ofk-bitonic sort. Thek-bitonic sort merges (K (=2k or 2k−1) monotonic sequences into one order sequence in\(\left\lceil {log_2 K} \right\rceil \left\lceil {log_2 N} \right\rceil - \tfrac{{\left\lceil {log_2 K} \right\rceil (\left\lceil {log_2 K} \right\rceil - 1)}}{2}\) steps, where\(k = \left\lceil {\tfrac{K}{2}} \right\rceil \) is an integer andk≥1. Thek-bitonic sort is the Batcher's bitonic sort whenk=1.

1 citations