scispace - formally typeset
Open AccessJournal ArticleDOI

The Parallel Evaluation of General Arithmetic Expressions

Richard P. Brent
- 01 Apr 1974 - 
- Vol. 21, Iss: 2, pp 201-206
Reads0
Chats0
TLDR
It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log 2 + 10(n - 1) using processors which can independently perform arithmetic operations in unit time.
Abstract
It is shown that arithmetic expressions with n ≥ 1 variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log2n + 10(n - 1)/p using p ≥ 1 processors which can independently perform arithmetic operations in unit time. This bound is within a constant factor of the best possible. A sharper result is given for expressions without the division operation, and the question of numerical stability is discussed.

read more

Content maybe subject to copyright    Report

The Parallel Evaluation of General Arithmetic Expressions
RICHARD P.
BRENT
Australian Nalional University, Canberra, Australia
ABSTRACT. It is shown that arithmetic expressions with n > 1 variables and constants; operations
of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in
time 4 log2n + 10(n -
1)/p
using p > 1 processors which can independently perform arithmetic
operations in unit time. This bound is within a constant factor of the best possible. A sharper result
is given for expressions without the division operation, and the question of numerical stability is
discussed.
KEY WORDS AND PHRASES:
arithmetic expressions, compilation of arithmetic expressions, compu-
tational complexity, general arithmetic expressions, numerical stability, parallel computatioR,
code optimization
ca CATEOORIES. 4.12, 5.11, 5.25
1. Introduction
The question of how quickly arithmetic expressions can be evaluated on a computer with
several independent arithmetic processors is of theoretical and practical interest. In this
paper we determine the answer to within a constant multiplicative factor (see Corollary
2 in Section 4). All our proofs are constructive, and reasonably efficient algorithms for
compiling expressions for subsequent execution on a parallel computer may be derived
from our proofs. These algorithms compare favorably with those given in [1, 2].
We assume that a number of processors are available and that each can perform an
arithmetic operation (addition, multiplication, and sometimes division) in unit time.
The time required for accessing data, storing results, communicating between processors,
etc., is ignored. Also, the effect of rounding errors is neglected, except in Section 5. The
results hold for exact arithmetic with expressions over any commutative field.
Several special cases have been considered previously. For example, Maruyama [14]
and Munro and Paterson [19] have shown that polynomials of degree n can be evaluated in
time log2n ~ 0 ((log2n) ~) if sufficiently many processors are available, and Brent [3] has
shown that this is true for expressions of the form ao --{- xl (a~ ~ x2 (a2 T "" (a~_l T
a,x,) ) ).
Baer and Bovet [1] and 5~uraoka [20] considered expressions with n distinct
variables and operations of addition and multiplication over a commutative ring. It has
recently been shown in [5] that such expressions can be evaluated in time 2.465 log2n if
sufficiently many processors are available. (For results that apply if a fixed number of
processors is available, see Section 5.) Kuck and Maruyama [12] have shown that con-
tinued fractions of the form
bo + aJ (bl + a2/ (. . . (b~-i + a~/b~). ))
can be evaluated
in time 2 log:n + 0 (1). Kuck [10], Maruyama [15], and Muraoka [20] have bonsidered
expressions with a limited depth of parenthesis nesting and/or a limited number of
divisions. See also [6, 8, 9, 13, 18] and the references given there.
Copyright (~) 1974, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted provided that ACM's copyright notice is
given and that reference is made to the publication, to its date of issue, and to the fact that reprinting
privileges were granted by perm,ission of the Association for Computing Machinery.
Author's address: Computer Centre, Australian National University, P.O. Box 4, Canberra, A.C.T.
2600, Australia.
Journal of the Association for Computing Machinery, Vol. 21,
No. 2, April 1974, pp. 201-206.

202 R.P. BRENT
Our results (Corollary i and Theorem 2) show that parallelism may be used to speed
up the evaluation of large arithmetic expressions. Knuth [7] has shown that most expres-
sions which occur in real FORTRAN programs have only a small number of operands.
Nevertheless, our results (or the method used to obtain them) may ultimately be of
practical value, for Kuck [11] has shown that an optimizing compiler for a parallel
machine might generate large expressions when compiling programs like those studied by
Kmlth [7].
In this paper we assume commutativity, but Maruyama [16] has recently extended
some of our results to expressions over noncommutative rings (e.g. rings of matrices).
2. Notation and Assumptions
We consider well-formed arithmetic expressions with the operations addition
("-F"),
multiplication
("."),
and division ("/") ; any level of parenthesis nesting; and distinct
indeterminates (or "atoms") xl, x2, ... over a commutative field. We neglect the sub-
traction operation because expressions containing it can easily be transformed into
equivalent expressions with" +",
"*","/"
and (at most) some unary subtractions acting
onatoms, e.g.a-
(b-~ c/(d-- e) -- f) = a+ ((-b) + c/((-d) We) -~ f).
']?he restriction to expressions with distinct atoms means that we do not consider ex-
pressions such as
a -4- x(b + x(c + x) ), a + 1/ (b + 1/ (c +
l/d)), and x l°°.
However, our results give upper bounds on the time required to evaluate such expres-
sions, because they apply to the more general expressions
a ~ ~l(b ~ x2(c ~ x3)),
a ~ Ul/(b + u2/(e + ua/d)),
and
xlx2 ... Xloo
respectively. For further discussion and
examples, see [5].
If E is an arithmetic expression then [ E I denotes the number of atoms (relabeled if
necessary to become distinct) in E. If T is a parse tree for E then [ T [ = [ E[ is the
number of terminal nodes of T. If [ T I > 1 we write T = L R, where L and R are the
maximal proper subtrees of T. A subexpression of E is the expression corresponding to a
subtree (not necessarily proper) of a parse tree for E.
If r is a real number then Fr"] denotes the integer satisfying r _~ Fr~ ~ r + 1.
3. Main Theorem
Theorem i states slightly more than we use subsequently, but the statement is necessary
so that the result may be proved by induction. The most interesting consec~uences of the
theorem are stated in Corollaries 1 and 2 (Section 4).
We first state, without proof, a trivial but useful lemma.
iLEMMA 1.
If 1 ~ m ~ n and T is a binary tree with I T I = n, then there is a subtree
X1 = L~ Rl of T such that [ X I ~- m, I Ll l < m, and l Rl l < m. Als°, if x is °ne °f the
terminal nodes of T, there is a subtree X2 = L2 R~ of T such that [ X2 [ >_ m and either
(1)
x is a terminal node of L2 and I L2 [ < m, or
(2)
x is a terminal node of R2 and I R2 [ <
m.
THEOR~ 1.
Let E be any arithmetic expression with n (distinct) atoms and operations
"-~-", "*", and "/" over a commutative field. Suppose that su~eiently many processors
capable of performing "+" and "." (but not necessarily "/") in unit time are available.
Let P1 (n) = 3 (n -
1), P2 (n) =
max(o, 3n -
4), Qi(n) =
max(O,
lOn- 19), Q2(n) =
max
(O, lOn - 29),
and
~n --~ 1 /f n<2,
k = (W41og2(n--
1)3 /f n_>3.
Then
(1)
and (2) below hold:
(1)
E = F/G, where F and G are expressions which can be evaluated simultaneously in
time k -- 2 with P1 (n) processors and Q1 (n) operations.

The Parallel Evaluation of General Arithmetic Expressions
203
(2)
If x is any atom of E, then E = (Ax + B)/ (Cx + D), where A, B, C, and D are
expressions which do not contain x and which can be evaluated simultaneously in time k with
P2(n) processors and Q2 (n) operations. (Note that some of A, ... , G may be identically
0 or
1.)
PROOF. By inspection, the result holds for n < 4, so we assume that n = N >_ 5
(so k > 8). The proof is by induction on N. As inductive hypothesis we assume that
parts (1) and (2) of the theorem hold for n < N.
We shall show that part (1) holds with n = N. Applying Lemma 1 with
m = F (n -{- 1)/27 to a parse tree for E, we see that there is a subexpression Xt =
L101R1of E such that
IX1 I _> (n ~- 1)/2,
151[ < n/2, IR, I -< n/2,
and 01 = "-F",
",", or "/".
From the definition of k, n _~ 2 kI4 + 1 ; so
I Lt I ~- n/2 <
2
TM
+ I, and similarly for
Ri Thus, by part (I) of the inductive hypothesis, Li =
F,/Gi
and R, =
F~/G~,
where
Fi, Gi, F2, and G2 can be evaluated simultaneously in time (k -- 4) -- 2 = k -- 6
with P,(I Lt I) + P,(I R1 I) processors and QI(I Lt I) + Qi(I R1 I) operations.
Now
X1 = LiO,R, = (F,/G1)Oi(F2/G2) = F3/G3,
where
{F1G2-~-F2G,
if 0,=
"+",~ ~GiG2
if 0,=
"'~-"
or"*",}
F~ = ~F,F2
if 0, "*'" I and Ga =
(G1F~
if 0,
"/"
[F1G2
if 01 "/",
Hence F3 and G3 can be evaluated in time k - 4.
Let Et be the expression formed by replacing Xt by an atom in E. Since [ E1 t =
n + 1 - [Xi[ < (n + 1)/2 < 2 (k-4)/4 + 1, part (2) of the inductive hypothesis (applied
to E~) gives E =
(AiX~ + B~)/(C1X~ + DQ,
where A1, B1, C,, and Di can be evaluated
simultaneously in time/c - 4 with P2 ([ E1 {) processors and Q~ (I Ell) operations. Since
X, = F3/G3,
it follows that E =
F/G,
where F =
A1F3 + B1Ga
and G =
CxF3 + D1G3
can be evaluated in time k -- 2.
Consider the number of processors required to compute F and G as above. In the first
k -
6 steps we compute F1, G,, F~, G2 and start computing A1, B1, C1, and Dt, using
Pi(I L, [) + Pl([ R1 [) + P2(] E1 I) processors. From time k -- 6 to/c -- 4 we compute
Fa and G3 and finish computing A1, B~, C~, and D,, using 2 + P2(I Et I) processors.
Finally, from time k -- 4 to k -- 2 we compute F and G, using four processors. Thus,
the number of processors required is
max [Pl([ L, [) + P,(I R~ I) + P2(I E~ I), 2 + P~(I E1 I), 4]
= max[3(IL, I +]Rll + IE, I) - 10,3(ILll + IR,]) - 6,31Etl - 2,41
<3(n- 1)
= P~(n),
aslL~ I + IRll + IEll = n+ 1, ILtl +lRll <n, lEvi _~ (n+ 1)/2, andn > 2.
Now consider the number of operations required to compute F and G as above..Since
3 _~ (n + i ) /2 _~ I X, I = I LI I T I Rl l, the definition of Ql gives Q~ (I Ll l)
+Q,(IRII-~
10 (I L~ I + I Rt I) - 29. Thus, the number of operations is at most
10 + Qi(I L1
I) +
Q,(I R,
I) -i-
Q~(I E1
I)
_< max [10(I L1 [ -t- l Ri I + [ E, l) - 48, 10(I L1 [ q- [Rt 1) - 191
< 10n -- 19 =
Ql(n),
so part (1) holds with
n
= N.
To complete the proof, we must show that part (2) holds with n = N. Let x be an
atom of E. Applying the second half of Lemma 1 with m = F(n q- 1)/2"1 to a parse
tree for E, we see that there is a subexpression X~ =
L~O~R2
of E such that ] Xz I >-
(n + 1)/2, 0~ = "q--", ".", or "/", and either x is an atom of L~ and I L~ I -<
n/2,
or
x is an atom of
Rz
and [
Rz [ <_ n/2.
We shall suppose that x is an atom of L~. (The
proof is similar if x is an atom of R2 .)

204 R.P. BRENT
Let E2 be the expression formed by replacing X2 by an atom in E. Thus I E~ I =
n "t- 1 -- I X2 ] < (n + 1)/2 < 2 (k-~)/~ + 1, and part (2) of the inductive hypothesis
(applied to E2) gives E = (A2X2 -4- B~)/(C~X2 -4- D2), where A~, B2, C2, and D~ can
be evaluated simultaneously in time k - 4 with P2(I E2 I) processors and Q2(I E~ t)
operations.
Similarly, L~ = (Asx + Bs)/(C3x "4- D~), where A~, Bs, C3, and D8 can be evaluated
in time k - 4 with P2(I L2 1) processors and Q2(I L2 t) operations. Also, since I R21 <_
n - 1, part (1) of the inductive hypothesis shows that R~ = F4/G4, where F4 and G4 can
be evaluated in time k -- 2 with P~ (I R2 [) processors and Q1 ([ R2 l) operations.
From X2 -- L282R~ and the above expressions for E, L2, and R2, we find that E =
(Ax "4- B)/ (Cx + D), where
{(A~C3)F4 + (A2A3 + B2Ca)G4 if 02 = "+",
A = ~(A2A3)F4-4- (B2C3)G4 ' if 02 = "*",
( (A~A3)G4 + (B~C3)F4 if 02 = "/",
and B, C, and D are given by similar expressions. Thus A, B, C, and D can be evaluated
in time k.
The number of processors required to compute A, , D simultaneously in time k is
at :most
max [P2(] E~ I) "4- P2([ L2 ]) + P~(I R2 I), 8 "4- P~([ R2 {)]
= max [3(] E2I + IL~I-4-]R2I) - 11, 3(tL~I-4-]R2I) -7,
3(IE2 I-4-IR~I) -7, 3
IR~I + 51.
Since [ E~ [ -4-IL~[ + IR~I = n + 1, [n2[ "4- IRE[ < n, IE21 -4-IR~I _< n,
and n > 1, the number of processors required is at most 3n - 4 = P~(n) provided
31R~l + 5 _< 3n- 4, i.e. provided [R~[ < n- 3. If[R~[ = n - 2orn- 1, the
expressions for A, B, C, and D simplify, and a straightforward examination of cases
shows that P~ (n) processors suffice.
Similarly, if I E~ I > 2 and I L~ I > 2, the number of operations required is at most
28 + q2([ E~ I) + Q2([ L2 [) + Q~([ R2 [) _< 10n - 30 < q2(n). If[E~ I -< 2 or[ L~ [ < 2
or both, the expressions for A, B, C, and D simplify, and Q~ (n) operations suffice. This
completes the proof of part (2), so the theorem follows by induction on N.
4. Consequences of Theorem 1
We need the following lemma, which is of some independent interest.
LEMMA 2.
~ff a
computation C can be performed in time t with q operations and suffi-
cie.ntly many processors which perform arithmetic operations in unit time, then C can be"
performed in time t -4- (q -- t)/p with p such processors.
PROOF. Suppose that st operations are performed at step i, for i = 1, 2, , t. Thus
~t
Z,~-x s~ = q. Using p processors, we can simulate step i in time Fsdp'3 Hence, the
computation C can be performed with p processors in time
~-1 [-s,/p-1 _< (1 -- 1/p)t -4- (l/p) ~=1 s, = t -4- (q -- t)/p.
COROLLARY 1. Let E be as in Theorem I and suppose that p processors which can perform
addition, multiplication, and division in unit time are available. Then E can be evaluated in
time 4 log2n + 10(n -- 1)/p.
PROOF. Suppose that n _> 3, for otherwise the result is trivial. By Theorem 1,
E = F/G, where F and G can be evaluated in time [-4 log2 (n -- 1)7 -- 2 < 4 log2n -- 1
with less than I0 (n - 1) operations. Applying Lemma 2 with t = r4 log2 (n - 1)'3 -- 2
and q = 10(n- 1), we see that F and G can be evaluated in time 41og2n- 1 +
10 (n -- 1)/p with p processors. Finally, E = F/G can be evaluated in one more unit of
time. (Note that only one division is performed, so the result is easily modified if a divi-
sion takes longer than an addition or multiplication.)

The Parallel Evaluation of General Arithmetic Expressions
205
COROLLARY 2.
Let r (n, p) be the maximum time required to evaluate arithmetic expres-
sions with n atoms, using p processors which can perform arithmetic operations in unit time.
Let
¢(n,
p) = max(log2n, (n--
1)/p).
Then, for all n >_ 1 and
p~ 1, ¢(n, p) _~
v(n, p) ~ 14~(n, p).
PROOF. Consider the expression x~ ~ x2 ~ ... + xn. By a fan-in argument, its
evaluation requires time at least logan. Also, at least n -- 1 operations must be performed,
so p processors require time at least
(n - 1)/p.
Hence, the lower bound on r(n, p) is
established. The upper bound follows from Corollary 1.
5. Concluding Remarks
Corollary 2 establishes the complexity of parallel evaluation of general arithmetic ex-
pressions to within a constant factor. The constant 14 can doubtless be reduced by more
refined arguments, and the lower bound for T (n, p) can be improved slightly (see [5]).
The proof of Theorem 1 simplifies, and the constants can be reduced, if division is
excluded. Corresponding to Corollary 1 we have the following, which is slightly weaker
than Theorems 1 and 2 of [5] if p ~ n, but much stronger if p is of order n or less.
THEOREM 2.
Let E be any arithmetic expression with' n (distinct) atoms and operations
" ~ " and "." over a commutative ring. If p processors which can perform " ~ " and "," in
unit time are available, then E can be evaluated in time 4 log2n -b 2 (n -- 1)/p.
A proof of Theorem 2 is given in [4], where we also show that, for real expressions and
approximate arithmetic, the evaluation of E in the time given by Theorem 2 is numeri-
cally stable (in the sense
that
the computed result can be obtained by making small
relative changes in the values assigned to She atoms and then performing exact arithme-
tic). Unfortunately, this result does not extend to expressions with division, and exam-
ples found by a program of Miller I17] show that the algorithm implied by the proof of
Theorem 1 is not always numerically stable. Hence, it is an open question whether gen-
eral arithmetic expressions can be evaluated stably in the time given by Corollary 1.
Acknowledgments.
David Kuck and Kiyoshi Maruyama made several stimulating
suggestions, without which this paper might not have been written. Webb Miller kindly
verified the numerical instability mentioned above, and a referee's comments were
useful in clarifying the proof of Theorem 1.
REFERENCES
I.
BAER,
J. L.,
AND BOVET,
D. P.
2.
3.
4.
Compilation of arithmetic expressions for parallel computa-
tions. Proc. IFIP Congr. 1968, North-Holland Pub. Co., Amsterdam, pp. 340-346.
BEATTY,
J.C. An axiomatic approach to code optimization for expressions.
J. ACM 19,
4 (Oct.
1972), 613-640.
BRENT,
R. P. On the addition of binary numbers.
IEEE Trans. Cornp. C-19
(Aug. 1970),
758-759.
BRENT,
R. P. The parallel evaluation of arithmetic expressions in logarithmic time. Proc.
Symposium on Complexity of Sequential and Parallel Numerical Algorithms (Carnegie-Mellon
U., Pittsburgh, Pa., May 1973), Academic Press, New York, 1973, pp. 83-102.
5. BRENT, R. P., KUCK, D. J., AND MARUYAMA, U. M. The parallel evaluation of arithmetic
expressions without division.
IEEE Trans. Comput. C-22
(May 1973), 532-534.
6. HOBOS, L. C. (Ed.) Parallel processor systems, technologies and applications. Spartan Books,
New York, 1970.
7. KNU'rH, D.E. An empirical study of FORTRAN programs.
Software i
(April 1971), 105-133.
8. KOGG~, P.M. Parallel algorithms for the efficient solution of recurrence problems; the nu-
merical stability of parallel algorithms for solving recurrence problems; and minimal parallel-
ism in the solution of recurrence problems. Stanford Electronics Lab. Reps. 43-45, Sept. 1972.
9. KOGGE, P. M., AND STONE, H. S. A parallel algorithm for the efficient solution of a general
class of recurrence equations. Rep. CS-72-298, Comput. Sci. Dep., Stanford U., Stanford, Calif.,
March 1972.
10. KVCK, D.J. Evaluating arithmetic expressions of n atoms and k divisionsin a (log2n ..{- 2 log2k)
c steps. Manuscript, March 1973.

Citations
More filters
Journal ArticleDOI

Cilk: An Efficient Multithreaded Runtime System

TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.
Book

An introduction to parallel algorithms

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Proceedings ArticleDOI

The implementation of the Cilk-5 multithreaded language

TL;DR: Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler are presented.
Journal ArticleDOI

Scheduling multithreaded computations by work stealing

TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time to execute a fully strict computation on P processors using this scheduler is 1:1.
Proceedings ArticleDOI

Cilk: an efficient multithreaded runtime system

TL;DR: This paper shows that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance, and proves that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.
References
More filters
Journal ArticleDOI

A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

TL;DR: This paper uses a technique called recursive doubling in an algorithm for solving a large class of recurrence problems on parallel computers such as the Iliac IV.
Journal ArticleDOI

An empirical study of FORTRAN programs

TL;DR: The principal conclusion which may be drawn is the importance of a program ‘profile’, namely a table of frequency counts which record how often each statement is performed in a typical run; there are strong indications that profile‐keeping should become a standard practice in all computer systems, for casual users as well as system programmers.
Journal ArticleDOI

On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup

TL;DR: Algorithms are presented for handling arithmetic assignment statements, DO loops and IF statement trees, and evidence is given that for very simple Fortran programs 16 processors could be effectively used operating simultaneously in a parallel or pipeline fashion.
Journal ArticleDOI

A Survey of Parallelism in Numerical Analysis

W. L. Miranker
- 01 Oct 1971 - 
TL;DR: A survey of a number of studies of aspects of parallelism in numerical analysis dealing with optimization, root finding, differential equations and solutions of linear systems is presented here.
Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "The parallel evaluation of general arithmetic expressions" ?

In this paper, it was shown that arithmetic expressions with n > 1 variables and constants, operations of addition, multiplication, and division, and any depth of parenthesis nesting can be evaluated in time 4 log 2n + 10 ( n 1 ) /p using p > 1 processors which can independently perform arithmetic operations in unit time. 

In the first k - 6 steps the authors compute F1, G,, F~, G2 and start computing A1, B1, C1, and Dt , using Pi(I L, [) + Pl([ R1 [) + P2(] E1 I) processors. 

I f p processors which can perform " ~ " and " , " in unit time are available, then E can be evaluated in time 4 log2n -b 2 (n -- 1)/p.A proof of Theorem 2 is given in [4], where the authors also show that , for real expressions and approximate arithmetic, the evaluation of E in the t ime given by Theorem 2 is numerically stable (in the sense that the computed result can be obtained by making small relative changes in the values assigned to She atoms and then performing exact ari thmet ic) . 

LEMMA 2. ~ff a computation C can be performed in time t with q operations and sufficie.ntly many processors which perform arithmetic operations in unit time, then C can be" performed in time t -4- (q -- t ) /p with p such processors. 

By Theorem 1, E = F/G, where F and G can be evaluated in time [-4 log2 (n -- 1)7 -- 2 < 4 log2n -- 1 with less than I0 (n - 1) operations. 

since The authorR21 <_ n - 1, part (1) of the inductive hypothesis shows that R~ = F4/G4, where F4 and G4 can be evaluated in time k -- 2 with P~ (I R2 [) processors and Q1 ([ R2 l) operations. 

Proc. Symposium on Complexity of Sequential and Parallel Numerical Algorithms (Carnegie-Mellon U., Pittsburgh, Pa., May 1973), Academic Press, New York, 1973, pp. 83-102. 

Since X, = F3/G3, it follows that E = F/G, where F = A1F3 + B1Ga and G = CxF3 + D1G3 can be evaluated in time k -- 2.Consider the number of processors required to compute F and G as above. 

Now X1 = LiO,R, = (F,/G1)Oi(F2/G2) = F3/G3, where{F1G2-~-F2G, if 0 , = "+",~ ~GiG2 if 0 , = "'~-" o r " * " , } F~ = ~F,F2 if 0, "* '" The authorand Ga = (G1F~ if 0, " / "[F1G2 if 01 " / " ,Hence F3 and G3 can be evaluated in time k - 4. 

- 1 [-s,/p-1 _< (1 -- 1/p) t -4- ( l /p ) ~ = 1 s, = t -4- (q -- t ) /p .COROLLARY 1. Let E be as in Theorem The authorand suppose that p processors which can perform addition, multiplication, and division in unit time are available. 

The number of processors required to compute A, • • • , D simultaneously in time k is at :mostmax [P2(] E~ I) "4- P2([ L2 ]) + P~(I R2 I), 8 "4- P~([ R2 {)] = max [3(]