Journal ArticleDOI
Tight Bounds on the Complexity of Parallel Sorting
TLDR
Tight upper and lower bounds are proved on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network.Abstract:
In this paper, we prove tight upper and lower bounds on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network. Our most important new results are: 1) the construction of an N-node degree-3 network capable of sorting N numbers in O(log N) word steps; 2) a proof that any network capable of sorting N (7 log N)-bit numbers in T bit steps requires area A where AT2 = ?(N2 log2 N); and 3) the construction of a ``small-constant-factor'' bounded-degree network that sorts N ?(log N)-bit numbers in T = ?(log N) bit steps with A = ?(N2) area.read more
Citations
More filters
Journal ArticleDOI
A bridging model for parallel computation
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Journal ArticleDOI
The input/output complexity of sorting and related problems
Alok Aggarwal,S. Vitter Jeffrey +1 more
TL;DR: Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
Book
Fat-trees: universal networks for hardware-efficient supercomputing
TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Journal ArticleDOI
Fat-trees: Universal networks for hardware-efficient supercomputing
TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Journal ArticleDOI
External memory algorithms and data structures: dealing with massive data
TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
References
More filters
Proceedings ArticleDOI
Sorting networks and their applications
TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.
Journal ArticleDOI
Parallel Processing with the Perfect Shuffle
TL;DR: Given a vector of N elements, the perfect shuffle of this vector is a permutation of the elements that are identical to aperfect shuffle of a deck of cards.
Proceedings ArticleDOI
Universal schemes for parallel communication
Leslie G. Valiant,G. J. Brebner +1 more
TL;DR: This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.
Proceedings ArticleDOI
An 0(n log n) sorting network
TL;DR: A sorting network of size 0(n log n) and depth 0(log n) is described, and a derived procedure (&egr;-nearsort) are described below, and the sorting network will be centered around these elementary steps.
Proceedings ArticleDOI
Area-time complexity for VLSI
TL;DR: The complexity of the Discrete Fourier Transform is studied with respect to a new model of computation appropriate to VLSI technology, which focuses on two key parameters, the amount of silicon area and time required to implement a DFT on a single chip.