scispace - formally typeset
Journal ArticleDOI

Tight Bounds on the Complexity of Parallel Sorting

TLDR
Tight upper and lower bounds are proved on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network.
Abstract
In this paper, we prove tight upper and lower bounds on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network. Our most important new results are: 1) the construction of an N-node degree-3 network capable of sorting N numbers in O(log N) word steps; 2) a proof that any network capable of sorting N (7 log N)-bit numbers in T bit steps requires area A where AT2 = ?(N2 log2 N); and 3) the construction of a ``small-constant-factor'' bounded-degree network that sorts N ?(log N)-bit numbers in T = ?(log N) bit steps with A = ?(N2) area.

read more

Citations
More filters
Journal ArticleDOI

A bridging model for parallel computation

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Journal ArticleDOI

The input/output complexity of sorting and related problems

TL;DR: Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
Book

Fat-trees: universal networks for hardware-efficient supercomputing

TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Journal ArticleDOI

Fat-trees: Universal networks for hardware-efficient supercomputing

TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Journal ArticleDOI

External memory algorithms and data structures: dealing with massive data

TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
References
More filters
Proceedings ArticleDOI

Sorting networks and their applications

TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.
Journal ArticleDOI

Parallel Processing with the Perfect Shuffle

TL;DR: Given a vector of N elements, the perfect shuffle of this vector is a permutation of the elements that are identical to aperfect shuffle of a deck of cards.
Proceedings ArticleDOI

Universal schemes for parallel communication

TL;DR: This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.
Proceedings ArticleDOI

An 0(n log n) sorting network

TL;DR: A sorting network of size 0(n log n) and depth 0(log n) is described, and a derived procedure (&egr;-nearsort) are described below, and the sorting network will be centered around these elementary steps.
Proceedings ArticleDOI

Area-time complexity for VLSI

TL;DR: The complexity of the Discrete Fourier Transform is studied with respect to a new model of computation appropriate to VLSI technology, which focuses on two key parameters, the amount of silicon area and time required to implement a DFT on a single chip.