Algorithmica (1997) 19: 291–317

Algorithmica

©

1997 Springer-Verlag New York Inc.

Parallel Searching in Generalized Monge Arrays

A. Aggarwal,

1

D. Kravets,

2

J. K. Park,

3

and S. Sen

4

Abstract. This paper investigates the parallel time and processor complexitiesofseveral searching problems

involving Monge, staircase-Monge, and Monge-composite arrays. We present array-searching algorithms for

concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, and several hypercubic networks. All these

algorithms run in near-optimal time, and their processor-time products are all within an O(lg n) factor of

the worst-case sequential bounds. Several applications of these algorithms are also given. Two applications

improve previous results substantially, and the others provide novel parallel algorithms for problems not

previously considered.

Key Words. Monge arrays, CREW-PRAM algorithms, Hypercubes.

1. Introduction

1.1. Background.Anm×narray A ={a[i,j]}containing real numbers is called

Monge if, for 1 ≤ i < k ≤ m and 1 ≤ j < l ≤ n,

a[i, j] + a[k, l] ≤ a[i, l] +a[k, j].(1.1)

We refer to (1.1) as the Monge condition. Monge arrays have many applications. In the

late eighteenth century, Monge [34] observed that if unit quantities (cannonballs, for

example) need to be transported from locations X and Y (supply depots) in the plane

to locations Z and W (artillery batteries), not necessarily respectively, in such a way

as to minimize the total distance traveled, then the paths followed in transporting these

quantities must not properly intersect. In 1961, Hoffman [24] elaborated upon this idea

and showed that a greedy algorithm correctly solves the transportation problem for m

sources and n sinks if and only if the corresponding m ×n cost array is a Monge array.

1

IBM Research Division, T. J. Watson Research Center, Yorktown Heights, NY 10598, USA.

aggarwa@watson.ibm.com.

2

Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.

dina@cis.njit.edu. This author’sresearch was supported by the NSF Research Initiation Award CCR-9308204,

and the New Jersey Institute of Technology SBR Grant #421220. Part of this research wasdone while the author

was at MIT and supported by the Air Force under Contract AFOSR-89-0271 and by the Defense Advanced

Research Projects Agency under Contracts N00014-87-K-825 and N00014-89-J-1988.

3

Bremer Associates, Inc., 215 First Street, Cambridge, MA 02142, USA. james.park@bremer-inc.com. This

author’s work was supported in part by the Defense Advanced Research Projects Agency under Contract

N00014-87-K-0825 and the Ofﬁce of Naval Research under Contract N00014-86-K-0593 (while the author

was a graduate student at MIT) and by the Department of Energy under Contract DE-AC04-76DP00789

(while the author was a member of the Algorithms and Discrete Mathematics Department of Sandia National

Laboratories).

4

Department of Computer Science, Indian Institute of Technology, New Delhi, India. ssen@cse.iitd.ernet.in.

Part of the work was done when the author was a summer visitor at IBM T. J. Watson Research Center.

Received July 25, 1994; revised March 5, 1996. Communicated by B. Chazelle.

292 A. Aggarwal, D. Kravets, J. K. Park, and S. Sen

More recently, Monge arrays have found applications in a many other areas. Yao [37]

used these arrays to explain Knuth’s [28] efﬁcient sequential algorithm for computing

optimal binary trees. Aggarwal et al. [4] showed that the all-farthest-neighbors problem

for the vertices of a convex n-gon can be solved in linear time using Monge arrays.

Aggarwal and Park [6] gave efﬁcient sequential algorithms based on the Monge-array

abstraction for several problems in computational geometry and VLSI river routing.

Furthermore, many researchers [6], [31], [21], [22] have used Monge arrays to obtain

efﬁcient dynamic programming algorithms for problems related to molecular biology.

More recently, Aggarwal and Park [9] have used Monge arrays to obtain efﬁcient algo-

rithms for the economic-lot size model.

In many applications, the underlying array satisﬁes conditions that are similar but not

the same as in (1.1). An m × n array A is called inverse-Monge if, for 1 ≤ i < k ≤ m

and 1 ≤ j < l ≤ n,

a[i, j] + a[k, l] ≥ a[i, l] + a[k, j].

5

(1.2)

An m × n array S ={s[i,j]}is called staircase-Monge if

(i) every entry is either a real number or ∞,

(ii) s[i, j] =∞implies s[i,`]=∞for `> jand s[k, j] =∞for k > i, and

(iii) for 1 ≤ i < k ≤ m and 1 ≤ j <`≤n, (1.1) holds if all four entries s[i, j], s[i,`],

s[k, j], and s[k,`] are ﬁnite.

The deﬁnition of a staircase-inverse-Monge array is similar:

(i) every entry is either a real number or ∞,

(ii) s[i, j] =∞implies s[i,`]=∞for `< jand s[k, j] =∞for k > i, and

(iii) for 1 ≤ i < k ≤ m and 1 ≤ j <`≤n, (1.2) holds if all four entries s[i, j], s[i,`],

s[k, j], and s[k,`] are ﬁnite.

ObservethataMongearrayis aspecialcaseofastaircase-Mongearray. Finally,a p×q×r

array C ={c[i,j,k]}is called Monge-composite if c[i, j, k] = d[i, j] + e[ j, k] for all

i, j, and k, where D ={d[i,j]}is a p × q Monge array and E ={e[j,k]}is a q ×r

Monge array.

Like Monge arrays, staircase-Monge arrays have also found applications in many

areas. Aggarwal and Park [6], Larmore and Schieber [31], and Eppstein et al. [21],

[22] use staircase-Monge arrays to obtain algorithms for problems related to molecular

biology. Aggarwal and Suri [10] used these arrays to obtain fast sequential algorithms

for computing the following largest-area empty rectangle problem: given a rectangle

containingn points,ﬁnd the largest-arearectangle that lies inside thegivenrectangle, that

doesnot containanypoints inits interior,and whosesides areparallel to those of thegiven

rectangle. Furthermore, Aggarwal and Klawe [3] and Klawe and Kleitman [27] have

demonstrated other applications of staircase-Monge arrays in computational geometry.

Finally, both Monge and Monge-composite arrays have found applications in parallel

computation.Inparticular,AggarwalandPark[5]exploitMongearrays toobtainefﬁcient

CRCW- and CREW-PRAM algorithms for certain geometric problems, and they exploit

Monge-composite arrays to obtain efﬁcient CRCW- and CREW-PRAM algorithms for

5

We refer to (1.2) as the inverse-Monge condition.

Parallel Searching in Generalized Monge Arrays 293

string editing and other related problems. (See also [12].) Similarly, Atallah et al. [15]

haveusedMonge-composite arraystoconstruct Huffmanand othersuchcodes onCRCW

and CREW PRAMs. Larmore and Przytycka in [30] used Monge arrays to solve the

Concave Least Weight Subsequence (CLWS) problem (deﬁned in Section 4.2).

Unlike Monge and Monge-composite arrays, staircase-Monge arrays have not been

studied in a parallel setting (in spite of their immense utility). Furthermore, even for

Monge and Monge-composite arrays, the study of parallel array-search algorithms has

so far been restricted to CRCW and CREW PRAMs. In this paper we ﬁll in these gaps

by providing efﬁcient parallel algorithms for searching in Monge, staircase-Monge,

and Monge-composite arrays. We develop algorithms for the CREW-PRAM models

of parallel computation, as well as for several interconnection networks including the

hypercube, the cube-connected cycles, the butterﬂy, and the shufﬂe-exchange network.

Before we can describe our results, we need a few deﬁnitions which we give in the next

section.

1.2. Deﬁnitions. In this section we explain the speciﬁc searching problems we solve

and give the previously known results for these problems. The row-minima problem

for a two-dimensional array is that of ﬁnding the minima entry in each row of the

array. (If a row has several minima, then we take the leftmost one.) In dealing with

Monge arrays we assume that for any given i and j, a processor can compute the

(i, j)th entry of this array in O(1) time. For parallel machines without global memory

we need to use a more restrictive model. The details of this model are given in later

sections. Aggarwal et al. [4] showed that the row-minima problem for an m ×n Monge

array can be solved in O(m + n) time, which is optimal. Also, Aggarwal and Park [5]

have shown that the row-minima problem for such an array can be solved in O(lg mn)

time on an (m + n)-processor CRCW PRAM, and in O(lgmn lg lg mn) time on an

((m + n)/lg lg mn)-processor CREW PRAM. Atallah and Kosaraju in [14] improved

this to O(lgmn) using m +n processors on a (weaker) EREW PRAM. Note that all the

algorithms dealing with ﬁnding row-minima in Monge and inverse-Monge arrays can

also be used to solve the analogously deﬁned row-maxima problem for the same arrays.

In particular, if A ={a[i,j]}is an m × n Monge (resp. inverse-Monge) array, then

A

0

={a

0

[i,j]:a

0

[i,j]=−a[i,n−j+1]} is a m × n Monge (resp. inverse-Monge)

array. Thus, solving the row-minima problem for A

0

gives us row-maxima for A.

Unfortunately, the row-minima and row-maxima problems are not interchangeable

when dealing with staircase-Monge and staircase-inverse-Monge arrays. Aggarwal and

Klawe [3] showed that the row-minima problem for an m ×n staircase-Monge array can

be solved in O((m +n) lg lg(m +n)) sequential time, and Klawe and Kleitman [27] have

improved the time bound to O(m + nα(m)), where α(·) is the inverse of Ackermann’s

function. However, if we wanted to solve the row-maxima problem (instead of the row-

minima problem) for an m ×n staircase-Monge array, then we could, in fact, employ the

sequential algorithm given in [4] and solve the row-maxima problem in O(m +n) time.

No parallel algorithms were known for solving the row-minima problem for staircase-

Monge arrays.

Given a p ×q ×r Monge-composite array, for 1 ≤ i ≤ p and 1 ≤ k ≤ r, the (i, k)th

tube consists of all those entries of the array whose ﬁrst coordinate is i and whose third

coordinate is k. The tube-minima problem for a p × q × r Monge-composite array

294 A. Aggarwal, D. Kravets, J. K. Park, and S. Sen

is that of ﬁnding the minimum entry in each tube of the array. (If a tube has several

minima, then we take the one with the minimum second coordinate.) For sequential

computation, the result of [4] can be trivially used to solve the tube-minima problem in

O(( p +r)q) time. Aggarwal and Park [5] and Apostolico et al. [12] have independently

shown that the tube-minima problem for an n × n × n Monge-composite array can

be solved in O(lg n) time using n

2

/lg n processors on a CREW PRAM, and, recently,

Atallah [13] has shown that this tube-minima problem can be solved in O(lglg n) time

using n

2

/lg lg n processors on a CRCW PRAM. Both results are optimal with respect

to time and processor-time product. In view of the applications, we assume that the two

n × n Monge arrays D ={d[i,j]}and E ={e[j,k]}, that together form the Monge-

composite array, are stored in the global memory of the PRAM. Again, for parallel

machines without a global memory, we need to use a more restrictive model; the details

of this model are given later. No efﬁcient algorithms (other than the one that simulates

the CRCW-PRAM algorithm) were known for solving the tube-minima problem for a

hypercube or a shufﬂe-exchange network.

1.3. Our Main Results. The time andprocessor complexitiesof algorithms for comput-

ing row minima in two-dimensional Monge, row minima in two-dimensional staircase-

Monge arrays, and tube minima in three-dimensional Monge-composite arrays are listed

in Tables 1.1, 1.2, and 1.3, respectively. We assume a normal model of hypercube com-

putation, in which each processor uses only one of its edges in a single time step, only

one dimension of edges is used at any given time step, and the dimension used at time

step t + 1 is within 1 module d of the dimension used at time step t, where d is the

dimension of the hypercube (see Section 3.1.3 of [32]). It is known that such algorithms

for the hypercube can be implemented on other hypercubic bounded-degree networks

like Butterﬂy and shufﬂe-exchange without asymptotic slow-down. Observe that our

results for staircase-Monge arrays match the corresponding bounds for Monge arrays.

Following are some applications of these new array-searching algorithms.

1. All Pairs Shortest Path (APSP) Problem. Consider the following problem: given a

weighted directed graph G = (V, E), |V |=n,|E|=m, we want to ﬁnd the shortest

path between every pair of vertices in V . In the sequential case, Johnson [26] gave an

O(n

2

lg n +mn)-time algorithm for APSP. In the parallel case, APSP can be solved by

repeated squaring in O(lg

2

n) time using n

3

/lg n processors on a CREW PRAM. Atallah

et al. [15] showhowto solve APSP in O(lg

2

n) time using n

3

/lg n processorson a CREW

PRAM (this solution follows from their O(lg

2

n)-time (n

2

/lg n)-processor solution to

the single source shortest paths problem on such a graph). In Section 4.1 we give the

algorithm of Aggarwal et al. [2] which runs in O(lg

2

n) time using n

2

CREW-PRAM

Table 1.1. Row-minima results for an n × n Monge array.

Model Time Processors Reference

CREW PRAM O(lg n) n [14]

Hypercube O(lg n lglg n) n Theorem 3.2

Parallel Searching in Generalized Monge Arrays 295

Table 1.2. Row-minima results for an n × n staircase-Monge array.

Model Time Processors Reference

CREW PRAM O(lg n) n Theorem 2.3

Hypercube O(lg n lglg n) n Theorem 3.4

processors for the special case of the APSP problem when the graph is acyclic and the

edge weights satisfy the quadrangle inequality.

6

2. Huffman Coding Problem. Consider the following problem: given an alphabet C

of n characters and the function f

i

indicating the frequency of character c

i

∈ C in a

ﬁle, construct a preﬁx code which minimizes the number of bits needed to encode the

ﬁle, i.e., construct a binary tree T such that each leaf corresponds to a character in the

alphabet and the weight of the tree, W(T ), is minimized, where

W(T ) =

n

X

i=1

f

i

d

i

,(1.3)

and d

i

is the depth in T of the leaf corresponding to character c

i

. The weight of the tree

W(T ) is exactly the minimum number of bits needed to encode the ﬁle (see [18]). The

construction of such an optimal code (which is called a Huffman code) is a classical

problem in data compression. In the sequential domain, Huffman in [25] showed how to

construct Huffman codes greedily in O(n) time (once the character frequencies are in

sortedorder).In[15],Atallahet al.reducedHuffmancodingto O(lgn) tubeminimization

problemsonMonge-compositearrays,therebyobtainingparallelalgorithmsforHuffman

coding that run in O(lg

2

n) time using n

2

/lg n processors on a CREW PRAM and in

O(lg n(lglg n)

2

) time using n

2

/(lg lg n)

2

processors on a CRCW PRAM. Larmore and

Przytycka in [30] reduce Huffman coding to the Concave Least Weight Subsequence

(CLWS) problem (deﬁned in Section 4.2) and then showhow to solve CLWS, and thereby

Huffmancoding, in O(

√

n lg n) time usingn processorsona CREWPRAM. Theirs is the

ﬁrst known parallel algorithm for Huffman coding requiring o(n

2

) work. In Section 4.2

we present the result of Czumaj [20] for ﬁnding the Huffman code in O(lg

r+1

n) time

and a total of O(n

2

lg

2−r

n) work on a CREW PRAM, for any r ≥ 1. This is the ﬁrst

NC algorithm that achieves o(n

2

) work.

Table 1.3. Tube-minima results for an n × n × n Monge-composite array.

Model Time Processors Reference

CREW PRAM O(lg n) n

2

/lg n [5], [12]

Hypercube O(lg n) n

2

Theorem 3.5

6

Given an ordering of the vertices of a graph, the quadrangle inequality states that any four distinct vertices

appearing in increasing order in that ordering, i

1

, i

2

, j

1

, and j

2

, must satisfy d(i

1

, j

1

) +d(i

2

, j

2

) ≥ d(i

1

, j

2

) +

d(i

2

, j

1

). In other words, in the quadrangle formed by i

1

i

2

j

1

j

2

, the sum of the diagonals is greater than the

sum of the sides. Notice that this condition is the same as (1.2) and they both appear in the literature.