Optimal bounds for decision problems on the CRCW PRAM

doi:10.1145/65950.65958

Optimal Bounds for Decision Problems

on the CRCW PRAM

PAUL BEAME

University of Wushington, Seattle, Wushington

AND

JOHAN HASTAD

Royal Institute of Technology, Stockholm, Sweden

Abstract. Optimal Q(logn/log logn) lower bounds on the time for CRCW PRAMS with polynomially

bounded numbers of processors or memory cells to compute parity and a number of related problems

are proven. A strict time hierarchy of explicit Boolean functions of n bits on such machines that holds

up to O(logn/loglogn) time is also exhibited. That is, for every time bound T within this range a

function is exhibited that can be easily computed using polynomial resources in time T but requires

more than polynomial resources to be computed in time T - 1. Finally, it is shown that almost all

Boolean functions of n bits require logn - loglogn + fi( 1) time when the number of processors is at

most polynomial in n. The bounds do not place restrictions on the uniformity of the algorithms nor on

the instruction sets of the machines.

Categories and Subject Descriptors: F.

1.2 [Computation by Abstract Devices]:

Modes of Computation-

parallelism; F. 1.3

[Computation by Abstract Devices]:

Complexity Classes-complexity hierarchies,

relations among complexify measures;

F.2.3 [Analysis of Algorithms and Problem Complexity]:

Trude-

offs among complexity classes

General Terms: Theory, Verification

Additional Key Words and Phrases: Concurrent-write, lower bounds, parallel random-access machines,

parity, sorting

I. Introduction

One of the most widely used models of parallel computation is the parallel random

access machine (PRAM). In this model any processor can access any memory

location at a given time-step. The most powerful form of the PRAM, the CRCW

PRAM, in which both concurrent read and concurrent write accesses are allowed,

has received particular attention both from designers of algorithms and from those

The work of P. Beame was supported by a University of Toronto Open Fellowship and by National

Science Foundation grant PYI-25800. The work of J. Hastad was supported by an IBM Postdoctoral

Fellowship and supported in part by NSF grant DCR MCS-85-09905.

This research was done while P. Beame was at the University of Toronto and while both authors were

at the Massachusetts Institute of Technology.

Authors’ present addresses: P. Beame, Computer Science Department, FR-35, University of Washington,

Seattle, Washington 98195; J. Hastad, Royal Institute of Technology, Stockholm, S-100-44, Sweden.

Permission to copy without fee all or part of this material is granted provided that the copies are not

made or distributed for direct commercial advantage, the ACM copyright notice and the title of the

publication and its date appear, and notice is given that copying is by permission of the Association for

Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

0 1989 ACM 0004-54 1 l/89/0700-0643 0 1.50

Journal of the Association for Computing Machinery, Vol. 36, No. 3, July 1989. pp. 643-670.

644

P. BEAME AND J. HASTAD

studying the limitations of parallel machine computation. Despite the significant

interest, the only nontrivial lower bounds for decision problems on CRCW PRAMS

that do not have drastic restrictions placed on either their processor and memory

resources or on the instruction sets of their processors are due independently to

Beame [3] and to Li and Yesha [ 131. The lower bounds are for parity and related

problems and are far from optimal. In both of these bounds no restriction is placed

on the instruction set of the processors, no limitation is placed on how much

information a single memory location may store, and the resources allowed are

only polynomially bounded. We call a machine with these properties an

abstract

or

ideal PRAM.

In this very general setting we prove the first optimal bound for any non-

trivial decision problem on the CRCW PRAM by showing a time lower bound of

Q(logn/loglog n) for parity that matches the known upper bound. This lower

bound holds even in the cases when only one of the two resources, processors or

memory cells, is bounded by a polynomial in the input size. Because parity

constant-depth reduces to a large number of problems, this O(log n/log log n)-time

lower bound for the CRCW PRAM applies to a wide variety of interesting functions

that include sorting or adding y1 bits, as well as multiplying two n-bit integers.

Also, by looking at the so-called “Sipser” functions, which are defined by

circuits, we obtain a very sharp time hierarchy for CRCW PRAMS of polynomial-

bounded resources. That is, for every time bound r(n) at most log n/(3 loglogn) -

@(log n/(log log n)2 we exhibit a family of functions which is computable in time

bound

T

with n processors and memory cells, but which cannot be computed just

one step faster by any machine with a polynomial bound on the number of

processors even with no bound on the number of memory cells. A similar separation

holds for machines with a polynomial bound on the number of memory cells even

without a bound on the number of processors.

The proofs of both these results follow lines similar to the proofs in [2] and [3]

and involve new lemmas that generalize the key lemmas used in Hastad’s un-

bounded fan-in circuit lower bounds [lo] and [ 111.

We also prove a tight 8(logn) lower bound on the time to compute almost

all n-bit Boolean functions on CRCW PRAMS with polynomial numbers of

processors.

A preliminary version of these results appeared in [5]. Many of these results also

form a part of the first author’s Ph.D. dissertation [4].

2. History of the Problem

Much of the lower bound work for CRCW PRAMS has been based on their close

relationship to unbounded fan-in circuits. These were defined by Furst et al. [9]

largely as a tool for trying to get an oracle to separate the polynomial-time hierarchy

from PSPACE. Stockmeyer and Vishkin [ 151 showed that simple CRCW PRAMS

can simulate unbounded fan-in circuits with essentially the same number of

processors as the circuit size and the same time as the circuit depth. In fact, by

restricting the instruction set of the CRCW PRAM to a limited set that includes

addition, comparison, indirect addressing and a few related instructions, Stock-

meyer and Vishkin also showed that unbounded fan-in circuits can easily simulate

restricted CRCW PRAMS. The size of the resulting circuit is polynomial in the

number of processors multiplied by the time and its depth is only a constant factor

larger than the time. Using the latter result and a Q(log*n) lower bound of Furst et

al. [9] on the depth of polynomial size unbounded fan-in circuits computing parity,

Optimal Bounds for Decision Problems on the CRCW PRAM

645

Stockmeyer and Vishkin [ 151 obtained lower bounds for this restricted form

of CRCW PRAM.

Because disjunctive normal-form formulas are unbounded fan-in circuits of

depth two it follows that all Boolean functions may be computed in two steps using

exponential resources on the CRCW PRAM. However, it is not reasonable to be

using exponentially many processors and memory cells. With polynomial resource

bounds, CRCW PRAMS can compute any function with formula size no(‘) in time

O(logn/loglogn), using an algorithm based on an upper bound of size no(‘) and

depth O(log n/loglog n) for unbounded fan-in circuits given by Chandra et al. [7].

Since Stockmeyer and Vishkin’s paper, the lower bounds for unbounded fan-in

circuits have been significant1 improved. Ajtai, extending [ 11, and L. Babai (private

communication) derived Q(

Js-

log n) depth lower bounds for polynomial size circuits

computing parity. Yao [ 161 markedly improved these results by showing truly

exponential size lower bounds for circuits of constant de th but this improvement

did not increase the depth lower bound beyond 1;2( log n). Finally, Hastad [lo]

se

using some techniques similar to those used by Yao, obtained an Q(log n/loglog n)

depth lower bound for polynomial-size circuits computing parity, which matches

the bound from the algorithm of Chandra et al. However, the CRCW PRAM lower

bounds that follow using Stockmeyer and Vishkin’s simulation are still not entirely

satisfactory since the bounds rely in an essential way on the specific restriction that

is placed on the instruction set. Some operations that are prohibited in this model

seem to be perfectly reasonable ones.

Abstract CRCW PRAMS can be shown to be much more powerful than these

restricted machines; because of their equivalence with unbounded fan-in circuits,

restricted CRCW PRAMS with polynomially many processors require exponential

time to compute almost all Boolean functions whereas an abstract PRAM only

takes O(logn) time without even using its power of concurrent reads or writes.

Nevertheless, for certain specific functions we shall see that, by using direct

techniques, lower bounds as strong as those derived for these restricted CRCW

machines can be obtained for the most powerful model of CRCW PRAM.

By applying and modifying the techniques of [9], Beame [2] derived the first

nontrivial lower bound that applies to the CRCW PRAM model described here.

He showed that any CRCW PRAM computing the parity function with nO”’

memory cells and an unbounded number of processors requires time Q(e).

Later, using the main lemma in [lo], Beame [3] obtained the following: any CRCW

PRAM thaiJgmputes the parity function with n O(I) processors (in fact with as

man asn-

+

processors for some 6 > 0) and unbounded memory requires time

52( log n). With the same techniques, an a( &) lower bound is easily shown for

common-write CRCW PRAMS (for definitions, see Section 3) that have no bound

on the number of processors but have a bound of O(nz6*) on the num-

ber of cells for some 6 > 0.

It was shown by B. Chor (private communication) and Li and Yesha [ 131 that a

simulation of abstract CRCW PRAMS by unbounded fan-in circuits can be

combined directly with Hastad’s circuit lower bound to obtain the a( 6) lower

bound. However, this simulation does not yield the above lower bound for the

common-write model with an unbounded number of processors. The simulation

states that any CRCW PRAM solving a decision problem on n Boolean inputs

using p(n) processors and T(n) time can be simulated by an unbounded fan-in

circuit of size

p(n)2”“‘+0”)

and depth O(r(n)).

Beame [3] and Li and Yesha [ 131 have also independently shown optimal bounds

on the time needed by CRCW PRAMS to compute functions whose many-bit

646

P. BEAME AND J. HASTAD

output is required to appear in a single memory cell. However, as was noted in

[3], such an output requirement is somewhat artificial and the lower bounds

disappear if each bit of the output is allowed to appear in a separate memory cell.

3. Definitions and Preliminaries

Definition. A CRCW PRAM is a shared memory machine with processors

p, , . . . , p,,c,,,

which communicate through memory cells C, , . . . , C,., ,,,. The

values of the input variables xl, . . . ,

x,, are initially stored in the first y1 cells of

memory C,, . . . . C,,, respectively. Initially all cells other than the input cells

contain the value 0. The output of the machine is the value in the cell C, at

termination.

Before each step t, processor P, is in state q:. At time step t, depending on qi,

processor P; reads some cell Cj of shared memory, then, depending on the contents,

(C,), and q:, assumes a new state q:+’ and depending on this state, writes a value

v = v( q:“) into some cell.

When several processors are attempting to write into a single cell at the same

time step the one that succeeds will be the lowest numbered processor. (A CRCW

PRAM is defined to be a common-write machine if, whenever several processors

are attempting to write into the same cell at a given time step, they all try to write

the same value.)

The CRCW PRAM defined above has been called the PRIORITY CRCW

PRAM and is the most powerful version of CRCW PRAM normally considered.

Thus lower bounds for this model will apply to any standard model of CRCW

PRAM.

In studying the progress of CRCW PRAM computations, what is important is

the set of inputs which lead to a given value in a memory cell or a given state of a

processor at a particular time step. The computation then may be viewed as

operating not on actual values so much as on the partitions associated with them.

Definition. Let M be a CRCW PRAM. For any processor Pi the processor

partition, P(M, i, t), of the input set at time step t is defined so that two inputs are

in the same equivalence class of P(M, i, t) if and only if they lead to the same state

of processor P, at the end of time step t.

For any cell C, the cell partition, C(M, j, t), of the input set at time t is defined

so that two inputs are in the same equivalence class of C(M, j, t) if and only if they

lead to the same contents of cell Cj at the end of time step t.

At time 0, the cell partitions for the first n memory cells have exactly two

equivalence classes, one consisting of those inputs for which the value of the

variable in the cell is 0, the other consisting of those inputs for which the value of

that variable is 1. Initially all other processor and cell partitions have only one

equivalence class consisting of all the inputs.

We now look at a measure of the complexity of partitions that was used in [2]

and [3] to prove lower bounds for CRCW PRAMS.

Definition. Letfbe a Boolean function defined on a set I C (0, 1)“. A Boolean

formula F represents f on I if the inputs x E I satisfy F exactly when f(x) = 1. Let

the maximum clause length of a DNF formula F be the maximum number of

literals in any clause of F. The (Boolean) degree off on I, S(f), is the smallest

maximum clause length of all disjunctive normal form (DNF) formulas represent-

ing f on I. We extend this definition to sets of functions 9 by letting 6(Y) =

maxlE.i WI.

Optimal Bounds for Decision Problems on the CRCW PRAM 647

The terminology of degree is derived from the standard way of writing a formula

with the Boolean

V

as addition and the Boolean A as multiplication and then

viewing the resulting formula as a polynomial. This should not be confused with

the degree of a polynomial in the finite field of two elements where the exclusive-

OR rather than the

V

is the appropriate additive operation.

In the notation of many lower bound proofs for monotone formulas, we could

define the prime implicants and prime clauses of a Boolean function f: (Prime

clauses are essentially prime implicants of T) These have been described as

minterms and maxterms, respectively, in the notation used by Yao [ 161 or Hastad

[lo]. Observe that the degree of a function and the length of its longest minterm

or maxterm may differ because its longest minterm may be longer than the longest

clause in an optimal DNF formula representing it. Consider the function

f

defined

by the DNF formula x1x2x3 + X,x4x5. It has a minterm

~2~3~4x5,

which is larger

than 6(f).

Definition. Let A be a partition of a set I C (0, 11”. Define the degree of A,

6(A), to be 6(55) on I where 35 is the set of characteristic functions of the

equivalence classes of A in I.

The major proof technique of the lower bounds for parity on unbounded fan-in

circuits is the use of restrictions to set some of the input bits. Using restrictions

permits a simplified description of the results of computations but does not

drastically reduce the difficulty of the function being computed. The main idea

behind using them is that, although apparently complex operations like the OR of

n bits are computed in one step, by setting relatively few inputs to 0 or 1 the results

of these operations are simple. In the case of the OR of n bits, setting a single input

to I makes it trivial.

Definition. A restriction 7r on K C ( 1, . . . , n) is a function 7r: K -+ (0, 1, *)

where:

1

means x, is set to 1,

7r(i) = 0 means x, is set to 0,

*

means xi is unset.

We define the results of applying a restriction P to a partition, Al,, a function,

f r,,

a Boolean formula, Ff,, a circuit, CT,, as well as sets of these objects, Zf, etc., in

the natural way. If u and 7 are restrictions, then UT is a restriction that is the result

of applying u first and then applying 7. For any Kc (1, . . . , n) define Proj[K] to

be the set of restrictions that assign 0 or 1 exactly to the inputs in K.

Definition. If a circuit D is Cl, for some restriction r, then we say that C

contains D and the gates of C that remain undetermined in D will be said to take

on the value * in C when K is applied.

In several places we need the following simple observation.

LEMMA

3.1. Let A be a partition of a set I G (0, 11”. For every K C ( 1, . . . , n),

there exists a restriction ,J E Proj[K] such that 6(A) 5 1 K 1 + 6(Ar,).

PROOF.

For each u E Proj[K] let 9$ be a set of DNF formulas that represent

the characteristic functions of the equivalence classes in AT,, and that have

maximum clause length bounded by 6(Ar,). To each clause of every formula

in 9& append the conjunctive clause C,, which is true exactly on those inputs in

(0, 11” that agree with u, to obtain a set of formulas E. By construction, the

Optimal bounds for decision problems on the CRCW PRAM

Citations

Exponential lower bounds for the pigeonhole principle

Sorting in linear time

A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures

Parallel Algorithms for Geometric Graph Problems

Integer priority queues with decrease key in constant time and the single source shortest paths problem

References

Random Graphs

Parity, circuits and the polynomial time hierarchy

Almost optimal lower bounds for small depth circuits

∑11-Formulae on finite structures

Computational limitations of small-depth circuits

Related Papers (5)

Upper and lower time bounds for parallel random access machines without simultaneous writes

Faster optimal parallel prefix sums and list ranking

Parallelism in Comparison Problems

An introduction to parallel algorithms

Parallel merge sort