A logarithmic time sort for linear size networks

doi:10.1145/7531.7532

Home
/
Papers
/
A logarithmic time sort for linear size networks

Journal Article•DOI•

A logarithmic time sort for linear size networks

John H. Reif¹, Leslie G. Valiant¹•Institutions (1)

Harvard University¹

01 Jan 1987-Journal of the ACM (ACM)-Vol. 34, Iss: 1, pp 60-76

TL;DR: A randomized algorithm that sorts on an N- node network with constant valence in O(log N) time with probability at least 1 - N- “α” - “ α” for all large enough items.

read less

Abstract: A randomized algorithm that sorts on an N node network with constant valence in O(log N) time is given. More particularly, the algorithm sorts N items on an N-node cube-connected cycles graph, and, for some constant k, for all large enough a, it terminates within ka log N time with probability at least 1 - N-a.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A digital signature scheme secure against adaptive chosen-message attacks

[...]

Shafi Goldwasser¹, Silvio Micali¹, Ronald L. Rivest¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1988-SIAM Journal on Computing

TL;DR: A digital signature scheme based on the computational difficulty of integer factorization possesses the novel property of being robust against an adaptive chosen-message attack: an adversary who receives signatures for messages of his choice cannot later forge the signature of even a single additional message.

...read moreread less

Abstract: We present a digital signature scheme based on the computational difficulty of integer factorization. The scheme possesses the novel property of being robust against an adaptive chosen-message attack: an adversary who receives signatures for messages of his choice (where each message may be chosen in a way that depends on the signatures of previously chosen messages) cannot later forge the signature of even a single additional message. This may be somewhat surprising, since in the folklore the properties of having forgery being equivalent to factoring and being invulnerable to an adaptive chosen-message attack were considered to be contradictory. More generally, we show how to construct a signature scheme with such properties based on the existence of a "claw-free" pair of permutations--a potentially weaker assumption than the intractibility of integer factorization. The new scheme is potentially practical: signing and verifying signatures are reasonably fast, and signatures are compact.

...read moreread less

3,150 citations

Book•

The Art of Multiprocessor Programming

[...]

Maurice Herlihy¹•Institutions (1)

Brown University¹

14 Mar 2008

TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.

...read moreread less

Abstract: Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problem.Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems.

...read moreread less

1,268 citations

Book Chapter•DOI•

Parallel algorithms for shared-memory machines

[...]

Richard M. Karp¹, Vijaya Ramachandran²•Institutions (2)

University of California, Berkeley¹, University of Texas at Austin²

02 Jan 1991

TL;DR: In this paper, the authors discuss parallel algorithms for shared-memory machines and discuss the theoretical foundations of parallel algorithms and parallel architectures, and present a theoretical analysis of the appropriate logical organization of a massively parallel computer.

...read moreread less

Abstract: Publisher Summary This chapter discusses parallel algorithms for shared-memory machines. Parallel computation is rapidly becoming a dominant theme in all areas of computer science and its applications. It is estimated that, within a decade, virtually all developments in computer architecture, systems programming, computer applications and the design of algorithms will be taking place within the context of parallel computation. In preparation for this revolution, theoretical computer scientists have begun to develop a body of theory centered on parallel algorithms and parallel architectures. As there is no consensus yet on the appropriate logical organization of a massively parallel computer, and as the speed of parallel algorithms is constrained as much by limits on interprocessor communication as it is by purely computational issues, it is not surprising that a variety of abstract models of parallel computation have been pursued. Closest to the hardware level are the VLSI models, which focus on the technological limits of today's chips, in which gates and wires are packed into a small number of planar layers.

...read moreread less

812 citations

Proceedings Article•DOI•

Designing efficient sorting algorithms for manycore GPUs

[...]

Nadathur Satish¹, Mark J. Harris², Michael Garland²•Institutions (2)

University of California, Berkeley¹, Nvidia²

23 May 2009

TL;DR: The design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA, are described, which are the fastest GPU sort and the fastest comparison-based sort reported in the literature.

...read moreread less

Abstract: We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-based sort reported in the literature. Our radix sort is up to 4 times faster than the graphics-based GPUSort and greater than 2 times faster than other CUDA-based radix sorts. It is also 23% faster, on average, than even a very carefully optimized multicore CPU sorting routine. To achieve this performance, we carefully design our algorithms to expose substantial fine-grained parallelism and decompose the computation into independent tasks that perform minimal global communication. We exploit the high-speed onchip shared memory provided by NVIDIA's GPU architecture and efficient data-parallel primitives, particularly parallel scan. While targeted at GPUs, these algorithms should also be well-suited for other manycore processors.

...read moreread less

684 citations

Cites methods from "A logarithmic time sort for linear ..."

...Another elegant parallelization technique is used by sample sort [30], [31], [32]....
[...]

Book•

An Efficient Parallel Biconnectivity Algorithm

[...]

Robert E. Tarjan, Uzi Vishkin

25 Aug 2011

TL;DR: A new algorithm for finding the blocks (biconnected components) of an undirected graph and a general algorithmic technique that simplifies and improves computation of various functions on trees is introduced.

...read moreread less

Abstract: In this paper we propose a new algorithm for finding the blocks (biconnected components) of an undirected graph. A serial implementation runs in $O(n + m)$ time and space on a graph of n vertices and m edges. A parallel implementation runs in $O(\log n)$ time and $O(n + m)$ space using $O(n + m)$ processors on a concurrent-read, concurrent-write parallel RAM. An alternative implementation runs in $O(n^2 /p)$ time and $O(n^2 )$ space using any number $p \leqq n^2 /\log ^2 n$ of processors, on a concurrent-read, exclusive-write parallel RAM. The last algorithm has optimal speedup, assuming an adjacency matrix representation of the input. A general algorithmic technique that simplifies and improves computation of various functions on trees is introduced. This technique typically requires $O(\log n)$ time using processors and $O(n)$ space on an exclusive-read exclusive-write parallel RAM.

...read moreread less

501 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

[...]

Herman Chernoff

01 Dec 1952-Annals of Mathematical Statistics

TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.

...read moreread less

Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.

...read moreread less

3,760 citations

Proceedings Article•DOI•

Sorting networks and their applications

[...]

Kenneth E. Batcher¹•Institutions (1)

Goodyear Aerospace¹

30 Apr 1968

TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.

...read moreread less

Abstract: To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently. A major problem in the design of such a computing system is the connecting together of the various parts of the system (the I/O devices, memories, processing units, etc.) in such a way that all the required data transfers can be accommodated. One common scheme is a high-speed bus which is time-shared by the various parts; speed of available hardware limits this scheme. Another scheme is a cross-bar switch or matrix; limiting factors here are the amount of hardware (an m × n matrix requires m × n cross-points) and the fan-in and fan-out of the hardware.

...read moreread less

2,553 citations

"A logarithmic time sort for linear ..." refers background in this paper

...The bitonic sorter of Batcher [ 4 ] achieves this bound on such networks as the cube-connected cycles network [ 131....
[...]

Journal Article•DOI•

The cube-connected cycles: a versatile network for parallel computation

[...]

Franco P. Preparata¹, Jean Vuillemin•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 May 1981-Communications of The ACM

TL;DR: This work describes in detail how to program the cube-connected cycles for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.

...read moreread less

Abstract: An interconnection pattern of processing elements, the cube-connected cycles (CCC), is introduced which can be used as a general purpose parallel processor. Because its design complies with present technological constraints, the CCC can also be used in the layout of many specialized large scale integrated circuits (VLSI). By combining the principles of parallelism and pipelining, the CCC can emulate the cube-connected machine and the shuffle-exchange network with no significant degradation of performance but with a more compact structure. We describe in detail how to program the CCC for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.

...read moreread less

1,046 citations

Book•

Computational Aspects of Vlsi

[...]

Jeffrey D Ullma

01 Jan 1984

862 citations

Proceedings Article•DOI•

Universal schemes for parallel communication

[...]

Leslie G. Valiant, G. J. Brebner

11 May 1981

TL;DR: This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.

...read moreread less

Abstract: In this paper we isolate a combinatorial problem that, we believe, lies at the heart of this question and provide some encouragingly positive solutions to it. We show that there exists an N-processor realistic computer that can simulate arbitrary idealistic N-processor parallel computations with only a factor of O(log N) loss of runtime efficiency. The main innovation is an O(log N) time randomized routing algorithm. Previous approaches were based on sorting or permutation networks, and implied loss factors of order at least (log N)2.

...read moreread less

694 citations