scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Parallel Searching in Generalized Monge Arrays

01 Nov 1997-Algorithmica (Springer Science and Business Media LLC)-Vol. 19, Iss: 3, pp 291-317
TL;DR: This paper investigates the parallel time and processor complexities of several searching problems involving Monge, staircase-Monge, and Monge-composite arrays, and presents array-searching algorithms for concurrent-read-exclusive-write PRAMs, hypercubes, and several hypercubic networks.
Abstract: This paper investigates the parallel time and processor complexities of several searching problems involving Monge, staircase-Monge, and Monge-composite arrays. We present array-searching algorithms for concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, and several hypercubic networks. All these algorithms run in near-optimal time, and their processor-time products are all within an \(O (\lg n)\) factor of the worst-case sequential bounds. Several applications of these algorithms are also given. Two applications improve previous results substantially, and the others provide novel parallel algorithms for problems not previously considered.

Summary (1 min read)

1. Introduction

  • Larmore and Przytycka in [30] reduce Huffman coding to theConcave Least Weight Subsequence (CLWS) problem(defined in Section 4.2) and then show how to solve CLWS, and thereby Huffman coding, inO( √ n lg n) time usingn processors on a CREW PRAM.
  • For this problem, the authors obtain (in Section 4.4) a CREW-PRAM algorithm that takesO(lg n) time and usesn processors.

2. CREW-PRAM Algorithms to Compute Row Minima in Staircase-Monge

  • In this section the authors give CREW-PRAM algorithms for computing row minima in staircase-Monge arrays.
  • From [3], the minima ofAt induce a partitioning ofA such that certain regions can be omitted from further searching for row minima because of the Monge condition.
  • Thus, bothµ1 andµ3 are bracketed byµ0, adding regionsF3 andF2, respectively.
  • Thus, Atallah and Kosaraju [14] show how to find the row minima for anm×n Monge array inO(lg mn) time usingm+n processors.
  • The authors first determine the minima in all the feasible Monge arrays.

PRAM.

  • In this section the authors give anO(lg m lg n)-timemn-processor hypercube algorithm for the string editing problem.
  • Respectively, determine, for each vertexx of P, 1. the vertex ofQ nearest tox that is not visible tox, 2. the vertex ofQ farthest fromx that is not visible tox, 3. the vertex ofQ nearest tox that is visible tox, and 4.the authors.

4. Applications

  • 1. The All Pairs Shortest Path Problem.
  • The following theorem is due to Aggarwalet al. [2].
  • Given a directed acyclic graph whose edge weights satisfy the Monge condition(or the inverse-Monge condition), the APSP problem can be solved in O(lg2 n) time using n2 processors on a CREW PRAM.
  • This decomposition ofD reduces the computation ofDn to lgn array multiplications and additions.
  • In this section the authors present the first NC algorithm for the Huffman coding problem that doeso(n2) work.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Algorithmica (1997) 19: 291–317
Algorithmica
©
1997 Springer-Verlag New York Inc.
Parallel Searching in Generalized Monge Arrays
A. Aggarwal,
1
D. Kravets,
2
J. K. Park,
3
and S. Sen
4
Abstract. This paper investigates the parallel time and processor complexitiesofseveral searching problems
involving Monge, staircase-Monge, and Monge-composite arrays. We present array-searching algorithms for
concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, and several hypercubic networks. All these
algorithms run in near-optimal time, and their processor-time products are all within an O(lg n) factor of
the worst-case sequential bounds. Several applications of these algorithms are also given. Two applications
improve previous results substantially, and the others provide novel parallel algorithms for problems not
previously considered.
Key Words. Monge arrays, CREW-PRAM algorithms, Hypercubes.
1. Introduction
1.1. Background.Anm×narray A ={a[i,j]}containing real numbers is called
Monge if, for 1 i < k m and 1 j < l n,
a[i, j] + a[k, l] a[i, l] +a[k, j].(1.1)
We refer to (1.1) as the Monge condition. Monge arrays have many applications. In the
late eighteenth century, Monge [34] observed that if unit quantities (cannonballs, for
example) need to be transported from locations X and Y (supply depots) in the plane
to locations Z and W (artillery batteries), not necessarily respectively, in such a way
as to minimize the total distance traveled, then the paths followed in transporting these
quantities must not properly intersect. In 1961, Hoffman [24] elaborated upon this idea
and showed that a greedy algorithm correctly solves the transportation problem for m
sources and n sinks if and only if the corresponding m ×n cost array is a Monge array.
1
IBM Research Division, T. J. Watson Research Center, Yorktown Heights, NY 10598, USA.
aggarwa@watson.ibm.com.
2
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.
dina@cis.njit.edu. This author’sresearch was supported by the NSF Research Initiation Award CCR-9308204,
and the New Jersey Institute of Technology SBR Grant #421220. Part of this research wasdone while the author
was at MIT and supported by the Air Force under Contract AFOSR-89-0271 and by the Defense Advanced
Research Projects Agency under Contracts N00014-87-K-825 and N00014-89-J-1988.
3
Bremer Associates, Inc., 215 First Street, Cambridge, MA 02142, USA. james.park@bremer-inc.com. This
author’s work was supported in part by the Defense Advanced Research Projects Agency under Contract
N00014-87-K-0825 and the Office of Naval Research under Contract N00014-86-K-0593 (while the author
was a graduate student at MIT) and by the Department of Energy under Contract DE-AC04-76DP00789
(while the author was a member of the Algorithms and Discrete Mathematics Department of Sandia National
Laboratories).
4
Department of Computer Science, Indian Institute of Technology, New Delhi, India. ssen@cse.iitd.ernet.in.
Part of the work was done when the author was a summer visitor at IBM T. J. Watson Research Center.
Received July 25, 1994; revised March 5, 1996. Communicated by B. Chazelle.

292 A. Aggarwal, D. Kravets, J. K. Park, and S. Sen
More recently, Monge arrays have found applications in a many other areas. Yao [37]
used these arrays to explain Knuth’s [28] efficient sequential algorithm for computing
optimal binary trees. Aggarwal et al. [4] showed that the all-farthest-neighbors problem
for the vertices of a convex n-gon can be solved in linear time using Monge arrays.
Aggarwal and Park [6] gave efficient sequential algorithms based on the Monge-array
abstraction for several problems in computational geometry and VLSI river routing.
Furthermore, many researchers [6], [31], [21], [22] have used Monge arrays to obtain
efficient dynamic programming algorithms for problems related to molecular biology.
More recently, Aggarwal and Park [9] have used Monge arrays to obtain efficient algo-
rithms for the economic-lot size model.
In many applications, the underlying array satisfies conditions that are similar but not
the same as in (1.1). An m × n array A is called inverse-Monge if, for 1 i < k m
and 1 j < l n,
a[i, j] + a[k, l] a[i, l] + a[k, j].
5
(1.2)
An m × n array S ={s[i,j]}is called staircase-Monge if
(i) every entry is either a real number or ,
(ii) s[i, j] =∞implies s[i,`]=∞for `> jand s[k, j] =∞for k > i, and
(iii) for 1 i < k m and 1 j <`n, (1.1) holds if all four entries s[i, j], s[i,`],
s[k, j], and s[k,`] are finite.
The definition of a staircase-inverse-Monge array is similar:
(i) every entry is either a real number or ,
(ii) s[i, j] =∞implies s[i,`]=∞for `< jand s[k, j] =∞for k > i, and
(iii) for 1 i < k m and 1 j <`n, (1.2) holds if all four entries s[i, j], s[i,`],
s[k, j], and s[k,`] are finite.
ObservethataMongearrayis aspecialcaseofastaircase-Mongearray. Finally,a p×q×r
array C ={c[i,j,k]}is called Monge-composite if c[i, j, k] = d[i, j] + e[ j, k] for all
i, j, and k, where D ={d[i,j]}is a p × q Monge array and E ={e[j,k]}is a q ×r
Monge array.
Like Monge arrays, staircase-Monge arrays have also found applications in many
areas. Aggarwal and Park [6], Larmore and Schieber [31], and Eppstein et al. [21],
[22] use staircase-Monge arrays to obtain algorithms for problems related to molecular
biology. Aggarwal and Suri [10] used these arrays to obtain fast sequential algorithms
for computing the following largest-area empty rectangle problem: given a rectangle
containingn points,find the largest-arearectangle that lies inside thegivenrectangle, that
doesnot containanypoints inits interior,and whosesides areparallel to those of thegiven
rectangle. Furthermore, Aggarwal and Klawe [3] and Klawe and Kleitman [27] have
demonstrated other applications of staircase-Monge arrays in computational geometry.
Finally, both Monge and Monge-composite arrays have found applications in parallel
computation.Inparticular,AggarwalandPark[5]exploitMongearrays toobtainefficient
CRCW- and CREW-PRAM algorithms for certain geometric problems, and they exploit
Monge-composite arrays to obtain efficient CRCW- and CREW-PRAM algorithms for
5
We refer to (1.2) as the inverse-Monge condition.

Parallel Searching in Generalized Monge Arrays 293
string editing and other related problems. (See also [12].) Similarly, Atallah et al. [15]
haveusedMonge-composite arraystoconstruct Huffmanand othersuchcodes onCRCW
and CREW PRAMs. Larmore and Przytycka in [30] used Monge arrays to solve the
Concave Least Weight Subsequence (CLWS) problem (defined in Section 4.2).
Unlike Monge and Monge-composite arrays, staircase-Monge arrays have not been
studied in a parallel setting (in spite of their immense utility). Furthermore, even for
Monge and Monge-composite arrays, the study of parallel array-search algorithms has
so far been restricted to CRCW and CREW PRAMs. In this paper we fill in these gaps
by providing efficient parallel algorithms for searching in Monge, staircase-Monge,
and Monge-composite arrays. We develop algorithms for the CREW-PRAM models
of parallel computation, as well as for several interconnection networks including the
hypercube, the cube-connected cycles, the butterfly, and the shuffle-exchange network.
Before we can describe our results, we need a few definitions which we give in the next
section.
1.2. Definitions. In this section we explain the specific searching problems we solve
and give the previously known results for these problems. The row-minima problem
for a two-dimensional array is that of finding the minima entry in each row of the
array. (If a row has several minima, then we take the leftmost one.) In dealing with
Monge arrays we assume that for any given i and j, a processor can compute the
(i, j)th entry of this array in O(1) time. For parallel machines without global memory
we need to use a more restrictive model. The details of this model are given in later
sections. Aggarwal et al. [4] showed that the row-minima problem for an m ×n Monge
array can be solved in O(m + n) time, which is optimal. Also, Aggarwal and Park [5]
have shown that the row-minima problem for such an array can be solved in O(lg mn)
time on an (m + n)-processor CRCW PRAM, and in O(lgmn lg lg mn) time on an
((m + n)/lg lg mn)-processor CREW PRAM. Atallah and Kosaraju in [14] improved
this to O(lgmn) using m +n processors on a (weaker) EREW PRAM. Note that all the
algorithms dealing with finding row-minima in Monge and inverse-Monge arrays can
also be used to solve the analogously defined row-maxima problem for the same arrays.
In particular, if A ={a[i,j]}is an m × n Monge (resp. inverse-Monge) array, then
A
0
={a
0
[i,j]:a
0
[i,j]=−a[i,nj+1]} is a m × n Monge (resp. inverse-Monge)
array. Thus, solving the row-minima problem for A
0
gives us row-maxima for A.
Unfortunately, the row-minima and row-maxima problems are not interchangeable
when dealing with staircase-Monge and staircase-inverse-Monge arrays. Aggarwal and
Klawe [3] showed that the row-minima problem for an m ×n staircase-Monge array can
be solved in O((m +n) lg lg(m +n)) sequential time, and Klawe and Kleitman [27] have
improved the time bound to O(m + nα(m)), where α(·) is the inverse of Ackermann’s
function. However, if we wanted to solve the row-maxima problem (instead of the row-
minima problem) for an m ×n staircase-Monge array, then we could, in fact, employ the
sequential algorithm given in [4] and solve the row-maxima problem in O(m +n) time.
No parallel algorithms were known for solving the row-minima problem for staircase-
Monge arrays.
Given a p ×q ×r Monge-composite array, for 1 i p and 1 k r, the (i, k)th
tube consists of all those entries of the array whose first coordinate is i and whose third
coordinate is k. The tube-minima problem for a p × q × r Monge-composite array

294 A. Aggarwal, D. Kravets, J. K. Park, and S. Sen
is that of finding the minimum entry in each tube of the array. (If a tube has several
minima, then we take the one with the minimum second coordinate.) For sequential
computation, the result of [4] can be trivially used to solve the tube-minima problem in
O(( p +r)q) time. Aggarwal and Park [5] and Apostolico et al. [12] have independently
shown that the tube-minima problem for an n × n × n Monge-composite array can
be solved in O(lg n) time using n
2
/lg n processors on a CREW PRAM, and, recently,
Atallah [13] has shown that this tube-minima problem can be solved in O(lglg n) time
using n
2
/lg lg n processors on a CRCW PRAM. Both results are optimal with respect
to time and processor-time product. In view of the applications, we assume that the two
n × n Monge arrays D ={d[i,j]}and E ={e[j,k]}, that together form the Monge-
composite array, are stored in the global memory of the PRAM. Again, for parallel
machines without a global memory, we need to use a more restrictive model; the details
of this model are given later. No efficient algorithms (other than the one that simulates
the CRCW-PRAM algorithm) were known for solving the tube-minima problem for a
hypercube or a shuffle-exchange network.
1.3. Our Main Results. The time andprocessor complexitiesof algorithms for comput-
ing row minima in two-dimensional Monge, row minima in two-dimensional staircase-
Monge arrays, and tube minima in three-dimensional Monge-composite arrays are listed
in Tables 1.1, 1.2, and 1.3, respectively. We assume a normal model of hypercube com-
putation, in which each processor uses only one of its edges in a single time step, only
one dimension of edges is used at any given time step, and the dimension used at time
step t + 1 is within 1 module d of the dimension used at time step t, where d is the
dimension of the hypercube (see Section 3.1.3 of [32]). It is known that such algorithms
for the hypercube can be implemented on other hypercubic bounded-degree networks
like Butterfly and shuffle-exchange without asymptotic slow-down. Observe that our
results for staircase-Monge arrays match the corresponding bounds for Monge arrays.
Following are some applications of these new array-searching algorithms.
1. All Pairs Shortest Path (APSP) Problem. Consider the following problem: given a
weighted directed graph G = (V, E), |V |=n,|E|=m, we want to find the shortest
path between every pair of vertices in V . In the sequential case, Johnson [26] gave an
O(n
2
lg n +mn)-time algorithm for APSP. In the parallel case, APSP can be solved by
repeated squaring in O(lg
2
n) time using n
3
/lg n processors on a CREW PRAM. Atallah
et al. [15] showhowto solve APSP in O(lg
2
n) time using n
3
/lg n processorson a CREW
PRAM (this solution follows from their O(lg
2
n)-time (n
2
/lg n)-processor solution to
the single source shortest paths problem on such a graph). In Section 4.1 we give the
algorithm of Aggarwal et al. [2] which runs in O(lg
2
n) time using n
2
CREW-PRAM
Table 1.1. Row-minima results for an n × n Monge array.
Model Time Processors Reference
CREW PRAM O(lg n) n [14]
Hypercube O(lg n lglg n) n Theorem 3.2

Parallel Searching in Generalized Monge Arrays 295
Table 1.2. Row-minima results for an n × n staircase-Monge array.
Model Time Processors Reference
CREW PRAM O(lg n) n Theorem 2.3
Hypercube O(lg n lglg n) n Theorem 3.4
processors for the special case of the APSP problem when the graph is acyclic and the
edge weights satisfy the quadrangle inequality.
6
2. Huffman Coding Problem. Consider the following problem: given an alphabet C
of n characters and the function f
i
indicating the frequency of character c
i
C in a
file, construct a prefix code which minimizes the number of bits needed to encode the
file, i.e., construct a binary tree T such that each leaf corresponds to a character in the
alphabet and the weight of the tree, W(T ), is minimized, where
W(T ) =
n
X
i=1
f
i
d
i
,(1.3)
and d
i
is the depth in T of the leaf corresponding to character c
i
. The weight of the tree
W(T ) is exactly the minimum number of bits needed to encode the file (see [18]). The
construction of such an optimal code (which is called a Huffman code) is a classical
problem in data compression. In the sequential domain, Huffman in [25] showed how to
construct Huffman codes greedily in O(n) time (once the character frequencies are in
sortedorder).In[15],Atallahet al.reducedHuffmancodingto O(lgn) tubeminimization
problemsonMonge-compositearrays,therebyobtainingparallelalgorithmsforHuffman
coding that run in O(lg
2
n) time using n
2
/lg n processors on a CREW PRAM and in
O(lg n(lglg n)
2
) time using n
2
/(lg lg n)
2
processors on a CRCW PRAM. Larmore and
Przytycka in [30] reduce Huffman coding to the Concave Least Weight Subsequence
(CLWS) problem (defined in Section 4.2) and then showhow to solve CLWS, and thereby
Huffmancoding, in O(
n lg n) time usingn processorsona CREWPRAM. Theirs is the
first known parallel algorithm for Huffman coding requiring o(n
2
) work. In Section 4.2
we present the result of Czumaj [20] for finding the Huffman code in O(lg
r+1
n) time
and a total of O(n
2
lg
2r
n) work on a CREW PRAM, for any r 1. This is the first
NC algorithm that achieves o(n
2
) work.
Table 1.3. Tube-minima results for an n × n × n Monge-composite array.
Model Time Processors Reference
CREW PRAM O(lg n) n
2
/lg n [5], [12]
Hypercube O(lg n) n
2
Theorem 3.5
6
Given an ordering of the vertices of a graph, the quadrangle inequality states that any four distinct vertices
appearing in increasing order in that ordering, i
1
, i
2
, j
1
, and j
2
, must satisfy d(i
1
, j
1
) +d(i
2
, j
2
) d(i
1
, j
2
) +
d(i
2
, j
1
). In other words, in the quadrangle formed by i
1
i
2
j
1
j
2
, the sum of the diagonals is greater than the
sum of the sides. Notice that this condition is the same as (1.2) and they both appear in the literature.

Citations
More filters
01 Jan 2013
TL;DR: This chapter contains an extensive discussion of dynamic programming speedup, with a focus on dynamic programming, online algorithms, and work functions.
Abstract: This is an overview over dynamic programming with an emphasis on advanced methods. Problems discussed include path problems, construction of search trees, scheduling problems, applications of dynamic programming for sorting problems, server problems, as well as others. This chapter contains an extensive discussion of dynamic programming speedup. There exist several general techniques in the literature for speeding up naive implementations of dynamic programming. Two of the best known are the Knuth-Yao quadrangle inequality speedup and the SMAWK/LARSCH algorithm for finding the row minima of totally monotone matrices. The chapter includes “ready to implement” descriptions of the SMAWK and LARSCH algorithms. Another focus is on dynamic programming, online algorithms, and work functions.

9 citations

Journal ArticleDOI
TL;DR: This paper reformulates the above shortest-path problems in terms of a dynamic programming scheme involving falling staircase anti-Monge weight-arrays, and provides an O(nlogn) time and Θ(n) space algorithm to solve the following one-dimensional dynamic programming recurrence.
Abstract: Given an n-vertex convex polygon, we show that a shortest Hamiltonian path visiting all vertices without imposing any restriction on the starting and ending vertices of the path can be found in O(nlogn) time and Θ(n) space. The time complexity increases to O(nlog2 n) for computing this path inside an n-vertex simple polygon. The previous best algorithms for these problems are quadratic in time and space. For our purposes, we reformulate the above shortest-path problems in terms of a dynamic programming scheme involving falling staircase anti-Monge weight-arrays, and, in addition, we provide an O(nlogn) time and Θ(n) space algorithm to solve the following one-dimensional dynamic programming recurrence $$E[i] = \min _{1\le j\le k}\min _{k\le i} \{V[k-1] + b(i,j) + c(j,k)\},\quad i=1, \dots,n,$$ where V[0] is known, V[k], for k=1,…,n, can be computed from E[k] in constant time, and B={b(i,j)} and C={c(j,k)} are known falling staircase anti-Monge weight-arrays of size n×n.

2 citations

Posted Content
TL;DR: In this article, a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs was presented, which has O(n/d)$-work and O(tilde{O}(d)depth.
Abstract: In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has $\tilde{O}(nm+(n/d)^3)$ work and $\tilde{O}(d)$ depth for any depth parameter $d\in [1,n]$. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP'17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has $\tilde{O}(nm+n^3/d^2)$ work and $\tilde{O}(d)$ depth [Ullman and Yannakakis, SIAM J. Comput. '91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. One notable ingredient of our parallel APSP algorithm is a simple deterministic $\tilde{O}(nm)$-work $\tilde{O}(d)$-depth procedure for computing $\tilde{O}(n/d)$-size hitting sets of shortest $d$-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called $d$-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an $\tilde{O}(nm)$-time deterministic algorithm for finding a shortest negative cycle of a real-weighted digraph. Such a near-optimal bound for this problem has been so far only achieved using a randomized algorithm [Orlin et al., Discret. Appl. Math. '18].
References
More filters
Proceedings ArticleDOI
01 Jan 1990
TL;DR: In this paper, an on-line two-dimensional dynamic programming algorithm for the prediction of RNA secondary structure is presented. But the complexity of the algorithm is not the same as the one presented in this paper.
Abstract: An on-line problem is a problem where each input is available only after certain outputs have been calculated. The usual kind of problem, where all inputs are available at all times, is referred to as an off-line problem. We present an efficient algorithm for Waterman's problem, an on-line two-dimensional dynamic programming problem that is used for the prediction of RNA secondary structure. Our algorithm uses as a module an algorithm for solving a certain on-line one-dimensional dynamic programming problem. The time complexity of our algorithm is n times the complexity of the on-line one-dimensional dynamic programming problem. For the concave case, we present a linear time algorithm for on-line searching in totally monotone matrices which is a generalization of the on-line one-dimensional problem. This yields an optimal O(n2) time algorithm for the on-line two-dimensional concave problem. The constants in the time complexity of this algorithm are fairly small, which make it practical. For the convex case, we use an O(nα(n)) time algorithm for the on-line one-dimensional problem, where α(·) is the functional inverse of Ackermann's function. This yields an O(n2α(n)) time algorithm for the on-line two-dimensional convex problem. Our techniques can be extended to solve the sparse version of Waterman's problem. We obtain an O(n + h log min {h, n 2 h }) time algorithm for the sparse concave case, and an O(n + hα(h)) log min {h, n 2 h }) time algorithm for the sparse convex case, where h is the number of possible base pairs in the RNA structure. All our algorithms improve on previously known algorithms.

79 citations

Proceedings ArticleDOI
01 Mar 1989
TL;DR: Several new paradigms for improving the processor efficiency for dynamic programming problems are presented and an O(log n) time, n processor parallel algorithm is given for the general tree construction problem and a nearly optimal binary search tree is given.
Abstract: An O(log ~ n) time, n2 / logn processor as well as an O(log n) time, n3/log n processor CREW deterministic parallel algorithms are presented for constructing Huffman codes from a given list of frequences. The time can be reduced to O(log n(loglog n) 2) on an CRCW model, using only n2/(log log n) 2 processors. Also presented is an optimal O(log n) time, O(n/ log n) processor EREW parallel algorithm for constructing a tree given a list of leaf depths when the depths are monotonic. An O(log 2 n) time, n processor parallel algorithm is given for the general tree construction problem. We also give an O(log 2 n) time n2/ log2n processor algorithm which finds a nearly optimal binary search tree. An O(log 2 n) time n 2'36 processor algorithm for recognizing linear context free languages is given. A crucial ingredient in achieving those bounds is a formulation of these problems as multiplications of special matrices which we call concave matrices. The structure of these matrices makes their parallel multiplication dramatically more efficient than that of arbitrary matrices. *Depar tment of Computer Science, Purdue University. Supported by the Office of Naval Research under Grants N00014-84K-0502 and N00014-86-K-0689, and the National Science Foundation under Grant DCR-8451393, with matchlng funds from AT&T. tDepar tment of Computer Science, Johns Hopkins University. Suppor ted by National Science Foundation through grant CCR-88-04284 tICS, UC Irvine. §School of Compute r Science, CMU and Depar tment of Computer Science, USC. Supported by National Science Foundation through grant CCR-87-13489. I'ermission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. '1'o copy otherwise, or to republish, requires a fee and/or specific permission. ~,:~ 1989 ACM 0-89791-323-X/89/0006/0421 $1.50 421 1 I n t r o d u c t i o n In this paper we present several new parallel algorithms. Each algorithm uses substantially fewer processors than used in previously known algorithms. The four problems considered are: The Tree Construction Problem, The Huffman Code Problem, The Linear Context Free Language Recognition Problem, and The Optimal Binary Search Tree Problem. In each of these problems the computational expensive part of the problem is finding the associated tree. We shall show that these trees are not arbitrary trees but are special. We take advantage of the special form of these trees to decrease the number of processors used. All of the problems we consider in this paper, as well as many other problems, can be performed in sequential polynomial time using Dynamic Programming. Arc algorithms for each of these problems can be obtained by parallelization of Dynamic Programming. Unfortunately, this approach produces parallel algorithms which use O(n 6) or more processors. An algorithm which increases the work performed from O(n) or O(n 2) to O(n 6) is not of much practical value. In this paper we present several new paradigms for improving the processor efficiency for dynamic programming problems. For all the problems considered a tree or class of trees is given implicitly and the algorithm must find one such tree. The construction of optimal codes is a classical problem in communication. Let ~ = {0, 1 .... , o" 1} be an alphabet. A code £ = {cl . . . . . cn} over E is a finite nonempty set of distinct finite sequences over ~, Each sequence ci is called code word. A code C is a prefix code if no code-word in C is a prefix of another code-word. A message over. C is a word resulting from the concatenation of code words from d. We assume the words over a source alphabet a l , . . . , a n are to be transmitted over a communication channel which can transfer one symbol of ~ per unit of time, and the probability of appearance of ai is Pi C ~ . The H u f f m a n Cod ing P r o b l e m is to construct a prefix code C =: {c l , . . . , cn E ~*} such f g . that the average word length Ei=lp, •Icil is minimum, where Ici] is the length of ci. It is easy to see that prefix codes have the nice property that a message can be decomposed in code word in only one waythey are uniquely decipherable. It is interesting to point out that Kraft and McMillan proved that for any code which is uniquely decipherable there is always a prefix code with the same average word length [13]. In 1952, IIuffman [9] gave an elegant sequential algorithm which can generate an optimal prefix code in O(n log n) time. If the probabilities are presorted then his algorithm is actually linear time [11]. Using parallel dynamic programming, Kosaraju and Teng [18], independently, gave the first A/'C algorithm for the IIuffman Coding Problem. However, b~th constructions use n e processors. In this paper, we first show how to reduce the processor count to n s, while using O(log n) time, by showing that we may assume that the tree associated with the prefix code is left-justified (to be defined in Section 2). The n 3 processor count arises from the fact that we are multiplying n x n matrices over a closed semiring. We reduce the processor count still further to n2/log n by showing that, after suitable modification, the matrices which are multiplied are concave (to be defined later). The structure of these matrices makes their parallel multiplication dramatically more efficient than that of arbitrary matrices. An O(logn log log n) time nZ/log n processor CREW algorithm is presented for multiplying them. Also given is an O((loglogn) 2) time, n2/log log n processor CRCW algorithm for multiplying two concave" matrices 1. The algorithm for construction of a ttuffman code still uses n 2 processors, which is probably too large for practical consideration since Huffman's algorithm only takes O(n log n) sequential time. Shannon and Fano gave a code, the Shannon-Fano Code, which is only one bit off from optimal. That is, the expected length o fa Shannon-Fano code word is at most one bit longer than the Huffman code word. The construction of the Shannon-Fano Code reduces to the following Tree C o n s t r u c t i o n P r o b l e m , Def in i t ion 1.1 (Tree C o n s t r u c t i o n P r o b l e m ) Given n integer values ll , . . . . ln, construct an ordered Mnary tree with n leaves whose levels when read form left to right are 11,..., 1,. 1Independently, [1] and [2] improved the CREW algorithm results by showing that two concave matrices can be rnultiplied in O(logn) time, using n2/logn CREW PRAM processors. Also, [2] improved the CRCW algorithm by reducing the number ofCRCW PRAM processors required to n2/(log log n) 2. We give an O(log 2 n) time, n processor EREW PRAM parallel algorithm for the tree construction problem. In the case when ll, .. •, 1, are monotonic, we give an O(logn) time and n / l o g n processor EREW PRAM parallel algorithm. In fact, trees where the level of the leaves are monotone will be used for both constructing Huffman Codes and Shannon-Fano Codes. Using our solution of the tree construction problem we get an O(logn) time n / logn processor EREW PRAM algorithm for constructing ShannonFano Codes. We also consider the problem of parallel constructing optimal binary search trees as defined by Knuth [10]. The best known NC algorithm for this problem is the parallelization of dynamic programming which uses n 6 processors. In this paper, using the new concave matrix multiplication algorithm, we show how to compute nearly optimal binary search tree in O(log 2 n) time using n2/ logn processors. Our search trees are only off from optimal by an additive amount of 1/n k for any fixed k. Finally, we consider recognition of linear context free languages. A CFL is said to be linear if all productions are of the form A --~ bB, A ~ Bb or a ~ A where A and B are nonterminal variables and a and b are terminal variables. It is well known from Ruzzo [17] that the general CFL's recognition problem can be performed on a CRCW PRAM in O(log n) time using n 6 processors again by parallelization of dynamic programming. By observing that the parse tree of the linear context free language is of very restricted form, we construct an O(n 3) processor, O(log 2 n) time CREW PRAM algorithm for it. Using the fact that we are doing Boolean matrix multiplication, we can reduce the processor count to n 2 ~ . 2 Pre l iminar i e s Throughout this paper a tree will be a rooted tree. It is ordered if the children of each node are ordered from left to right. The level of a node in a tree is its distance from the root. A binary tree T is complete at level I if there are 2 z nodes in T at level I. A binary tree is empty at level I if there is no vertex at level i. A binary tree T is a left-justified tree if it satisfies the following property: 1. i fa vertex has only one child, then it is a left child; 2. if u and v are sibling nodes of T, where u is to the left of v, then if Tv is not empty at some level 1,

63 citations


"Parallel Searching in Generalized M..." refers methods in this paper

  • ...Notice that the entries ofD above the diagonal obey the Monge condition....

    [...]

  • ...This was subsequently improved toO(m+ nα(m)) time by Klawe and Kleitman [27]....

    [...]

Journal ArticleDOI
Alok Aggarwal1, Maria Klawe
TL;DR: If P and Q are nonintersecting n and m vertex convex polygons, respectively, the methods given give an O((m+n)log logn) algorithm for finding for each vertex x of P, the farthest vertex of Q which is not visible to x, and the nearest vertex ofQWhich is notvisible to x.
Abstract: This paper introduces a generalization of totally monotone matrices, namely totally monotone partial matrices, shows how a number of problems in computational geometry can be reduced to the problem of finding the row maxima and minima in totally monotone partial matrices, and gives an O((m+nlog logn) algorithm for finding row maxima and minima in an n×m totally monotone partial matrix. In particular, if P and Q are nonintersecting n and m vertex convex polygons, respectively, our methods give an O((m+n)log logn) algorithm for finding for each vertex x of P, the farthest vertex of Q which is not visible to x, and the nearest vertex of Q which is not visible to x.

56 citations

Journal ArticleDOI
TL;DR: A modified algorithm is presented that reduces the processor requirement to O(n/sup 6//log /sup 5/n) while maintaining the same time complexity of O(log/sup 2/ n).
Abstract: Recurrence formulations for various problems, such as finding an optimal order of matrix multiplication, finding an optimal binary search tree, and optimal triangulation of polygons, assume a similar form. A. Gibbons and W. Rytter (1988) gave a CREW PRAM algorithm to solve such dynamic programming problems. The algorithm uses O(n/sup 6//log n) processors and runs in O(log/sup 2/ n) time. In this article, a modified algorithm is presented that reduces the processor requirement to O(n/sup 6//log /sup 5/n) while maintaining the same time complexity of O(log/sup 2/ n). >

51 citations