scispace - formally typeset
Open AccessJournal ArticleDOI

Optimal bounds for decision problems on the CRCW PRAM

TLDR
Lower bounds on the time for CRCW PRAMS with polynomially bounded numbers of processors or memory cells to compute parity and a number of related problems are proven and almost all Boolean functions of n bits require log.
Abstract
Optimal O(log n/log log n) lower bounds on the time for CRCW PRAMS with polynomially bounded numbers of processors or memory cells to compute parity and a number of related problems are proven. A strict time hierarchy of explicit Boolean functions of n bits on such machines that holds up to O(log n/log log n) time is also exhibited. That is, for every time bound T within this range a function is exhibited that can be easily computed using polynomial resources in time T but requires more than polynomial resources to be computed in time T - 1. Finally, it is shown that almost all Boolean functions of n bits require log n - log log n + O(1) time when the number of processors is at most polynomial in n. The bounds do not place restrictions on the uniformity of the algorithms nor on the instruction sets of the machines.

read more

Content maybe subject to copyright    Report

Optimal Bounds for Decision Problems
on the CRCW PRAM
PAUL BEAME
University of Wushington, Seattle, Wushington
AND
JOHAN HASTAD
Royal Institute of Technology, Stockholm, Sweden
Abstract. Optimal Q(logn/log logn) lower bounds on the time for CRCW PRAMS with polynomially
bounded numbers of processors or memory cells to compute parity and a number of related problems
are proven. A strict time hierarchy of explicit Boolean functions of n bits on such machines that holds
up to O(logn/loglogn) time is also exhibited. That is, for every time bound T within this range a
function is exhibited that can be easily computed using polynomial resources in time T but requires
more than polynomial resources to be computed in time T - 1. Finally, it is shown that almost all
Boolean functions of n bits require logn - loglogn + fi( 1) time when the number of processors is at
most polynomial in n. The bounds do not place restrictions on the uniformity of the algorithms nor on
the instruction sets of the machines.
Categories and Subject Descriptors: F.
1.2 [Computation by Abstract Devices]:
Modes of Computation-
parallelism; F. 1.3
[Computation by Abstract Devices]:
Complexity Classes-complexity hierarchies,
relations among complexify measures;
F.2.3 [Analysis of Algorithms and Problem Complexity]:
Trude-
offs among complexity classes
General Terms: Theory, Verification
Additional Key Words and Phrases: Concurrent-write, lower bounds, parallel random-access machines,
parity, sorting
I. Introduction
One of the most widely used models of parallel computation is the parallel random
access machine (PRAM). In this model any processor can access any memory
location at a given time-step. The most powerful form of the PRAM, the CRCW
PRAM, in which both concurrent read and concurrent write accesses are allowed,
has received particular attention both from designers of algorithms and from those
The work of P. Beame was supported by a University of Toronto Open Fellowship and by National
Science Foundation grant PYI-25800. The work of J. Hastad was supported by an IBM Postdoctoral
Fellowship and supported in part by NSF grant DCR MCS-85-09905.
This research was done while P. Beame was at the University of Toronto and while both authors were
at the Massachusetts Institute of Technology.
Authors’ present addresses: P. Beame, Computer Science Department, FR-35, University of Washington,
Seattle, Washington 98195; J. Hastad, Royal Institute of Technology, Stockholm, S-100-44, Sweden.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association for
Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
0 1989 ACM 0004-54 1 l/89/0700-0643 0 1.50
Journal of the Association for Computing Machinery, Vol. 36, No. 3, July 1989. pp. 643-670.

644
P. BEAME AND J. HASTAD
studying the limitations of parallel machine computation. Despite the significant
interest, the only nontrivial lower bounds for decision problems on CRCW PRAMS
that do not have drastic restrictions placed on either their processor and memory
resources or on the instruction sets of their processors are due independently to
Beame [3] and to Li and Yesha [ 131. The lower bounds are for parity and related
problems and are far from optimal. In both of these bounds no restriction is placed
on the instruction set of the processors, no limitation is placed on how much
information a single memory location may store, and the resources allowed are
only polynomially bounded. We call a machine with these properties an
abstract
or
ideal PRAM.
In this very general setting we prove the first optimal bound for any non-
trivial decision problem on the CRCW PRAM by showing a time lower bound of
Q(logn/loglog n) for parity that matches the known upper bound. This lower
bound holds even in the cases when only one of the two resources, processors or
memory cells, is bounded by a polynomial in the input size. Because parity
constant-depth reduces to a large number of problems, this O(log n/log log n)-time
lower bound for the CRCW PRAM applies to a wide variety of interesting functions
that include sorting or adding y1 bits, as well as multiplying two n-bit integers.
Also, by looking at the so-called “Sipser” functions, which are defined by
circuits, we obtain a very sharp time hierarchy for CRCW PRAMS of polynomial-
bounded resources. That is, for every time bound r(n) at most log n/(3 loglogn) -
@(log n/(log log n)2 we exhibit a family of functions which is computable in time
bound
T
with n processors and memory cells, but which cannot be computed just
one step faster by any machine with a polynomial bound on the number of
processors even with no bound on the number of memory cells. A similar separation
holds for machines with a polynomial bound on the number of memory cells even
without a bound on the number of processors.
The proofs of both these results follow lines similar to the proofs in [2] and [3]
and involve new lemmas that generalize the key lemmas used in Hastad’s un-
bounded fan-in circuit lower bounds [lo] and [ 111.
We also prove a tight 8(logn) lower bound on the time to compute almost
all n-bit Boolean functions on CRCW PRAMS with polynomial numbers of
processors.
A preliminary version of these results appeared in [5]. Many of these results also
form a part of the first author’s Ph.D. dissertation [4].
2. History of the Problem
Much of the lower bound work for CRCW PRAMS has been based on their close
relationship to unbounded fan-in circuits. These were defined by Furst et al. [9]
largely as a tool for trying to get an oracle to separate the polynomial-time hierarchy
from PSPACE. Stockmeyer and Vishkin [ 151 showed that simple CRCW PRAMS
can simulate unbounded fan-in circuits with essentially the same number of
processors as the circuit size and the same time as the circuit depth. In fact, by
restricting the instruction set of the CRCW PRAM to a limited set that includes
addition, comparison, indirect addressing and a few related instructions, Stock-
meyer and Vishkin also showed that unbounded fan-in circuits can easily simulate
restricted CRCW PRAMS. The size of the resulting circuit is polynomial in the
number of processors multiplied by the time and its depth is only a constant factor
larger than the time. Using the latter result and a Q(log*n) lower bound of Furst et
al. [9] on the depth of polynomial size unbounded fan-in circuits computing parity,

Optimal Bounds for Decision Problems on the CRCW PRAM
645
Stockmeyer and Vishkin [ 151 obtained lower bounds for this restricted form
of CRCW PRAM.
Because disjunctive normal-form formulas are unbounded fan-in circuits of
depth two it follows that all Boolean functions may be computed in two steps using
exponential resources on the CRCW PRAM. However, it is not reasonable to be
using exponentially many processors and memory cells. With polynomial resource
bounds, CRCW PRAMS can compute any function with formula size no(‘) in time
O(logn/loglogn), using an algorithm based on an upper bound of size no(‘) and
depth O(log n/loglog n) for unbounded fan-in circuits given by Chandra et al. [7].
Since Stockmeyer and Vishkin’s paper, the lower bounds for unbounded fan-in
circuits have been significant1 improved. Ajtai, extending [ 11, and L. Babai (private
communication) derived Q(
Js-
log n) depth lower bounds for polynomial size circuits
computing parity. Yao [ 161 markedly improved these results by showing truly
exponential size lower bounds for circuits of constant de th but this improvement
did not increase the depth lower bound beyond 1;2( log n). Finally, Hastad [lo]
se
using some techniques similar to those used by Yao, obtained an Q(log n/loglog n)
depth lower bound for polynomial-size circuits computing parity, which matches
the bound from the algorithm of Chandra et al. However, the CRCW PRAM lower
bounds that follow using Stockmeyer and Vishkin’s simulation are still not entirely
satisfactory since the bounds rely in an essential way on the specific restriction that
is placed on the instruction set. Some operations that are prohibited in this model
seem to be perfectly reasonable ones.
Abstract CRCW PRAMS can be shown to be much more powerful than these
restricted machines; because of their equivalence with unbounded fan-in circuits,
restricted CRCW PRAMS with polynomially many processors require exponential
time to compute almost all Boolean functions whereas an abstract PRAM only
takes O(logn) time without even using its power of concurrent reads or writes.
Nevertheless, for certain specific functions we shall see that, by using direct
techniques, lower bounds as strong as those derived for these restricted CRCW
machines can be obtained for the most powerful model of CRCW PRAM.
By applying and modifying the techniques of [9], Beame [2] derived the first
nontrivial lower bound that applies to the CRCW PRAM model described here.
He showed that any CRCW PRAM computing the parity function with nO”’
memory cells and an unbounded number of processors requires time Q(e).
Later, using the main lemma in [lo], Beame [3] obtained the following: any CRCW
PRAM thaiJgmputes the parity function with n O(I) processors (in fact with as
man asn-
+
processors for some 6 > 0) and unbounded memory requires time
52( log n). With the same techniques, an a( &) lower bound is easily shown for
common-write CRCW PRAMS (for definitions, see Section 3) that have no bound
on the number of processors but have a bound of O(nz6*) on the num-
ber of cells for some 6 > 0.
It was shown by B. Chor (private communication) and Li and Yesha [ 131 that a
simulation of abstract CRCW PRAMS by unbounded fan-in circuits can be
combined directly with Hastad’s circuit lower bound to obtain the a( 6) lower
bound. However, this simulation does not yield the above lower bound for the
common-write model with an unbounded number of processors. The simulation
states that any CRCW PRAM solving a decision problem on n Boolean inputs
using p(n) processors and T(n) time can be simulated by an unbounded fan-in
circuit of size
p(n)2”“‘+0”)
and depth O(r(n)).
Beame [3] and Li and Yesha [ 131 have also independently shown optimal bounds
on the time needed by CRCW PRAMS to compute functions whose many-bit

646
P. BEAME AND J. HASTAD
output is required to appear in a single memory cell. However, as was noted in
[3], such an output requirement is somewhat artificial and the lower bounds
disappear if each bit of the output is allowed to appear in a separate memory cell.
3. Definitions and Preliminaries
Definition. A CRCW PRAM is a shared memory machine with processors
p, , . . . , p,,c,,,
which communicate through memory cells C, , . . . , C,., ,,,. The
values of the input variables xl, . . . ,
x,, are initially stored in the first y1 cells of
memory C,, . . . . C,,, respectively. Initially all cells other than the input cells
contain the value 0. The output of the machine is the value in the cell C, at
termination.
Before each step t, processor P, is in state q:. At time step t, depending on qi,
processor P; reads some cell Cj of shared memory, then, depending on the contents,
(C,), and q:, assumes a new state q:+’ and depending on this state, writes a value
v = v( q:“) into some cell.
When several processors are attempting to write into a single cell at the same
time step the one that succeeds will be the lowest numbered processor. (A CRCW
PRAM is defined to be a common-write machine if, whenever several processors
are attempting to write into the same cell at a given time step, they all try to write
the same value.)
The CRCW PRAM defined above has been called the PRIORITY CRCW
PRAM and is the most powerful version of CRCW PRAM normally considered.
Thus lower bounds for this model will apply to any standard model of CRCW
PRAM.
In studying the progress of CRCW PRAM computations, what is important is
the set of inputs which lead to a given value in a memory cell or a given state of a
processor at a particular time step. The computation then may be viewed as
operating not on actual values so much as on the partitions associated with them.
Definition. Let M be a CRCW PRAM. For any processor Pi the processor
partition, P(M, i, t), of the input set at time step t is defined so that two inputs are
in the same equivalence class of P(M, i, t) if and only if they lead to the same state
of processor P, at the end of time step t.
For any cell C, the cell partition, C(M, j, t), of the input set at time t is defined
so that two inputs are in the same equivalence class of C(M, j, t) if and only if they
lead to the same contents of cell Cj at the end of time step t.
At time 0, the cell partitions for the first n memory cells have exactly two
equivalence classes, one consisting of those inputs for which the value of the
variable in the cell is 0, the other consisting of those inputs for which the value of
that variable is 1. Initially all other processor and cell partitions have only one
equivalence class consisting of all the inputs.
We now look at a measure of the complexity of partitions that was used in [2]
and [3] to prove lower bounds for CRCW PRAMS.
Definition. Letfbe a Boolean function defined on a set I C (0, 1)“. A Boolean
formula F represents f on I if the inputs x E I satisfy F exactly when f(x) = 1. Let
the maximum clause length of a DNF formula F be the maximum number of
literals in any clause of F. The (Boolean) degree off on I, S(f), is the smallest
maximum clause length of all disjunctive normal form (DNF) formulas represent-
ing f on I. We extend this definition to sets of functions 9 by letting 6(Y) =
maxlE.i WI.

Optimal Bounds for Decision Problems on the CRCW PRAM 647
The terminology of degree is derived from the standard way of writing a formula
with the Boolean
V
as addition and the Boolean A as multiplication and then
viewing the resulting formula as a polynomial. This should not be confused with
the degree of a polynomial in the finite field of two elements where the exclusive-
OR rather than the
V
is the appropriate additive operation.
In the notation of many lower bound proofs for monotone formulas, we could
define the prime implicants and prime clauses of a Boolean function f: (Prime
clauses are essentially prime implicants of T) These have been described as
minterms and maxterms, respectively, in the notation used by Yao [ 161 or Hastad
[lo]. Observe that the degree of a function and the length of its longest minterm
or maxterm may differ because its longest minterm may be longer than the longest
clause in an optimal DNF formula representing it. Consider the function
f
defined
by the DNF formula x1x2x3 + X,x4x5. It has a minterm
~2~3~4x5,
which is larger
than 6(f).
Definition. Let A be a partition of a set I C (0, 11”. Define the degree of A,
6(A), to be 6(55) on I where 35 is the set of characteristic functions of the
equivalence classes of A in I.
The major proof technique of the lower bounds for parity on unbounded fan-in
circuits is the use of restrictions to set some of the input bits. Using restrictions
permits a simplified description of the results of computations but does not
drastically reduce the difficulty of the function being computed. The main idea
behind using them is that, although apparently complex operations like the OR of
n bits are computed in one step, by setting relatively few inputs to 0 or 1 the results
of these operations are simple. In the case of the OR of n bits, setting a single input
to I makes it trivial.
Definition. A restriction 7r on K C ( 1, . . . , n) is a function 7r: K -+ (0, 1, *)
where:
1
1
means x, is set to 1,
7r(i) = 0 means x, is set to 0,
*
means xi is unset.
We define the results of applying a restriction P to a partition, Al,, a function,
f r,,
a Boolean formula, Ff,, a circuit, CT,, as well as sets of these objects, Zf, etc., in
the natural way. If u and 7 are restrictions, then UT is a restriction that is the result
of applying u first and then applying 7. For any Kc (1, . . . , n) define Proj[K] to
be the set of restrictions that assign 0 or 1 exactly to the inputs in K.
Definition. If a circuit D is Cl, for some restriction r, then we say that C
contains D and the gates of C that remain undetermined in D will be said to take
on the value * in C when K is applied.
In several places we need the following simple observation.
LEMMA
3.1. Let A be a partition of a set I G (0, 11”. For every K C ( 1, . . . , n),
there exists a restriction ,J E Proj[K] such that 6(A) 5 1 K 1 + 6(Ar,).
PROOF.
For each u E Proj[K] let 9$ be a set of DNF formulas that represent
the characteristic functions of the equivalence classes in AT,, and that have
maximum clause length bounded by 6(Ar,). To each clause of every formula
in 9& append the conjunctive clause C,, which is true exactly on those inputs in
(0, 11” that agree with u, to obtain a set of formulas E. By construction, the

Citations
More filters
Journal ArticleDOI

Exponential lower bounds for the pigeonhole principle

TL;DR: An exponential lower bound on the size of bounded-depth Frege proofs for the pigeonhole principle (PHP) is proved and an Ω(loglogn)-depth lower bound for any polynomial-sized Frege proof of the pigeon hole principle is obtained.
Proceedings ArticleDOI

Sorting in linear time

TL;DR: In this paper, it was shown that a unit-cost RAM with a word length of bits can sort integers in the range in time, for arbitrary!, a significant improvement over the bound of " # $ achieved by the fusion trees of Fredman and Willard, provided that % &'( *),+., for some fixed /102, the sorting can even be accomplished in linear expected time with a randomized algorithm.
Journal ArticleDOI

A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures

TL;DR: By understanding the GPU architecture and its massive parallelism programming model, one can overcome many of the technical limitations found along the way, design better GPU-based algorithms for computational physics problems and achieve speedups that can reach up to two orders of magnitude when compared to sequential implementations.
Posted Content

Parallel Algorithms for Geometric Graph Problems

TL;DR: A general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem, and has implications beyond the MapReduce model.
Proceedings ArticleDOI

Integer priority queues with decrease key in constant time and the single source shortest paths problem

TL;DR: A deterministic linear space solution that with n integer keys support delete in O(log log n) time, which is a deterministic, worst-case, with no restriction to monotonicity, and exponentially faster.
References
More filters
Book

Random Graphs

Journal ArticleDOI

Parity, circuits and the polynomial time hierarchy

TL;DR: A super-polynomial lower bound is given for the size of circuits of fixed depth computing the parity function and connections are given to the theory of programmable logic arrays and to the relativization of the polynomial-time hierarchy.
Proceedings ArticleDOI

Almost optimal lower bounds for small depth circuits

TL;DR: Improved lower bounds for the size of small depth circuits computing several functions are given and it is shown that there are functions computable in polynomial size and depth k but requires exponential size when the depth is restricted to k 1.
Book

Computational limitations of small-depth circuits

Johan Håstad
TL;DR: The techniques described in "Computational Limitations for Small Depth Circuits" can be used to demonstrate almost optimal lower bounds on the size of small depth circuits computing several different functions, such as parity and majority.