
Algorithmica (1997) 19: 291–317
Algorithmica
©
1997 Springer-Verlag New York Inc.
Parallel Searching in Generalized Monge Arrays
A. Aggarwal,
1
D. Kravets,
2
J. K. Park,
3
and S. Sen
4
Abstract. This paper investigates the parallel time and processor complexitiesofseveral searching problems
involving Monge, staircase-Monge, and Monge-composite arrays. We present array-searching algorithms for
concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, and several hypercubic networks. All these
algorithms run in near-optimal time, and their processor-time products are all within an O(lg n) factor of
the worst-case sequential bounds. Several applications of these algorithms are also given. Two applications
improve previous results substantially, and the others provide novel parallel algorithms for problems not
previously considered.
Key Words. Monge arrays, CREW-PRAM algorithms, Hypercubes.
1. Introduction
1.1. Background.Anm×narray A ={a[i,j]}containing real numbers is called
Monge if, for 1 ≤ i < k ≤ m and 1 ≤ j < l ≤ n,
a[i, j] + a[k, l] ≤ a[i, l] +a[k, j].(1.1)
We refer to (1.1) as the Monge condition. Monge arrays have many applications. In the
late eighteenth century, Monge [34] observed that if unit quantities (cannonballs, for
example) need to be transported from locations X and Y (supply depots) in the plane
to locations Z and W (artillery batteries), not necessarily respectively, in such a way
as to minimize the total distance traveled, then the paths followed in transporting these
quantities must not properly intersect. In 1961, Hoffman [24] elaborated upon this idea
and showed that a greedy algorithm correctly solves the transportation problem for m
sources and n sinks if and only if the corresponding m ×n cost array is a Monge array.
1
IBM Research Division, T. J. Watson Research Center, Yorktown Heights, NY 10598, USA.
aggarwa@watson.ibm.com.
2
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.
dina@cis.njit.edu. This author’sresearch was supported by the NSF Research Initiation Award CCR-9308204,
and the New Jersey Institute of Technology SBR Grant #421220. Part of this research wasdone while the author
was at MIT and supported by the Air Force under Contract AFOSR-89-0271 and by the Defense Advanced
Research Projects Agency under Contracts N00014-87-K-825 and N00014-89-J-1988.
3
Bremer Associates, Inc., 215 First Street, Cambridge, MA 02142, USA. james.park@bremer-inc.com. This
author’s work was supported in part by the Defense Advanced Research Projects Agency under Contract
N00014-87-K-0825 and the Office of Naval Research under Contract N00014-86-K-0593 (while the author
was a graduate student at MIT) and by the Department of Energy under Contract DE-AC04-76DP00789
(while the author was a member of the Algorithms and Discrete Mathematics Department of Sandia National
Laboratories).
4
Department of Computer Science, Indian Institute of Technology, New Delhi, India. ssen@cse.iitd.ernet.in.
Part of the work was done when the author was a summer visitor at IBM T. J. Watson Research Center.
Received July 25, 1994; revised March 5, 1996. Communicated by B. Chazelle.

292 A. Aggarwal, D. Kravets, J. K. Park, and S. Sen
More recently, Monge arrays have found applications in a many other areas. Yao [37]
used these arrays to explain Knuth’s [28] efficient sequential algorithm for computing
optimal binary trees. Aggarwal et al. [4] showed that the all-farthest-neighbors problem
for the vertices of a convex n-gon can be solved in linear time using Monge arrays.
Aggarwal and Park [6] gave efficient sequential algorithms based on the Monge-array
abstraction for several problems in computational geometry and VLSI river routing.
Furthermore, many researchers [6], [31], [21], [22] have used Monge arrays to obtain
efficient dynamic programming algorithms for problems related to molecular biology.
More recently, Aggarwal and Park [9] have used Monge arrays to obtain efficient algo-
rithms for the economic-lot size model.
In many applications, the underlying array satisfies conditions that are similar but not
the same as in (1.1). An m × n array A is called inverse-Monge if, for 1 ≤ i < k ≤ m
and 1 ≤ j < l ≤ n,
a[i, j] + a[k, l] ≥ a[i, l] + a[k, j].
5
(1.2)
An m × n array S ={s[i,j]}is called staircase-Monge if
(i) every entry is either a real number or ∞,
(ii) s[i, j] =∞implies s[i,`]=∞for `> jand s[k, j] =∞for k > i, and
(iii) for 1 ≤ i < k ≤ m and 1 ≤ j <`≤n, (1.1) holds if all four entries s[i, j], s[i,`],
s[k, j], and s[k,`] are finite.
The definition of a staircase-inverse-Monge array is similar:
(i) every entry is either a real number or ∞,
(ii) s[i, j] =∞implies s[i,`]=∞for `< jand s[k, j] =∞for k > i, and
(iii) for 1 ≤ i < k ≤ m and 1 ≤ j <`≤n, (1.2) holds if all four entries s[i, j], s[i,`],
s[k, j], and s[k,`] are finite.
ObservethataMongearrayis aspecialcaseofastaircase-Mongearray. Finally,a p×q×r
array C ={c[i,j,k]}is called Monge-composite if c[i, j, k] = d[i, j] + e[ j, k] for all
i, j, and k, where D ={d[i,j]}is a p × q Monge array and E ={e[j,k]}is a q ×r
Monge array.
Like Monge arrays, staircase-Monge arrays have also found applications in many
areas. Aggarwal and Park [6], Larmore and Schieber [31], and Eppstein et al. [21],
[22] use staircase-Monge arrays to obtain algorithms for problems related to molecular
biology. Aggarwal and Suri [10] used these arrays to obtain fast sequential algorithms
for computing the following largest-area empty rectangle problem: given a rectangle
containingn points,find the largest-arearectangle that lies inside thegivenrectangle, that
doesnot containanypoints inits interior,and whosesides areparallel to those of thegiven
rectangle. Furthermore, Aggarwal and Klawe [3] and Klawe and Kleitman [27] have
demonstrated other applications of staircase-Monge arrays in computational geometry.
Finally, both Monge and Monge-composite arrays have found applications in parallel
computation.Inparticular,AggarwalandPark[5]exploitMongearrays toobtainefficient
CRCW- and CREW-PRAM algorithms for certain geometric problems, and they exploit
Monge-composite arrays to obtain efficient CRCW- and CREW-PRAM algorithms for
5
We refer to (1.2) as the inverse-Monge condition.

Parallel Searching in Generalized Monge Arrays 293
string editing and other related problems. (See also [12].) Similarly, Atallah et al. [15]
haveusedMonge-composite arraystoconstruct Huffmanand othersuchcodes onCRCW
and CREW PRAMs. Larmore and Przytycka in [30] used Monge arrays to solve the
Concave Least Weight Subsequence (CLWS) problem (defined in Section 4.2).
Unlike Monge and Monge-composite arrays, staircase-Monge arrays have not been
studied in a parallel setting (in spite of their immense utility). Furthermore, even for
Monge and Monge-composite arrays, the study of parallel array-search algorithms has
so far been restricted to CRCW and CREW PRAMs. In this paper we fill in these gaps
by providing efficient parallel algorithms for searching in Monge, staircase-Monge,
and Monge-composite arrays. We develop algorithms for the CREW-PRAM models
of parallel computation, as well as for several interconnection networks including the
hypercube, the cube-connected cycles, the butterfly, and the shuffle-exchange network.
Before we can describe our results, we need a few definitions which we give in the next
section.
1.2. Definitions. In this section we explain the specific searching problems we solve
and give the previously known results for these problems. The row-minima problem
for a two-dimensional array is that of finding the minima entry in each row of the
array. (If a row has several minima, then we take the leftmost one.) In dealing with
Monge arrays we assume that for any given i and j, a processor can compute the
(i, j)th entry of this array in O(1) time. For parallel machines without global memory
we need to use a more restrictive model. The details of this model are given in later
sections. Aggarwal et al. [4] showed that the row-minima problem for an m ×n Monge
array can be solved in O(m + n) time, which is optimal. Also, Aggarwal and Park [5]
have shown that the row-minima problem for such an array can be solved in O(lg mn)
time on an (m + n)-processor CRCW PRAM, and in O(lgmn lg lg mn) time on an
((m + n)/lg lg mn)-processor CREW PRAM. Atallah and Kosaraju in [14] improved
this to O(lgmn) using m +n processors on a (weaker) EREW PRAM. Note that all the
algorithms dealing with finding row-minima in Monge and inverse-Monge arrays can
also be used to solve the analogously defined row-maxima problem for the same arrays.
In particular, if A ={a[i,j]}is an m × n Monge (resp. inverse-Monge) array, then
A
0
={a
0
[i,j]:a
0
[i,j]=−a[i,n−j+1]} is a m × n Monge (resp. inverse-Monge)
array. Thus, solving the row-minima problem for A
0
gives us row-maxima for A.
Unfortunately, the row-minima and row-maxima problems are not interchangeable
when dealing with staircase-Monge and staircase-inverse-Monge arrays. Aggarwal and
Klawe [3] showed that the row-minima problem for an m ×n staircase-Monge array can
be solved in O((m +n) lg lg(m +n)) sequential time, and Klawe and Kleitman [27] have
improved the time bound to O(m + nα(m)), where α(·) is the inverse of Ackermann’s
function. However, if we wanted to solve the row-maxima problem (instead of the row-
minima problem) for an m ×n staircase-Monge array, then we could, in fact, employ the
sequential algorithm given in [4] and solve the row-maxima problem in O(m +n) time.
No parallel algorithms were known for solving the row-minima problem for staircase-
Monge arrays.
Given a p ×q ×r Monge-composite array, for 1 ≤ i ≤ p and 1 ≤ k ≤ r, the (i, k)th
tube consists of all those entries of the array whose first coordinate is i and whose third
coordinate is k. The tube-minima problem for a p × q × r Monge-composite array

294 A. Aggarwal, D. Kravets, J. K. Park, and S. Sen
is that of finding the minimum entry in each tube of the array. (If a tube has several
minima, then we take the one with the minimum second coordinate.) For sequential
computation, the result of [4] can be trivially used to solve the tube-minima problem in
O(( p +r)q) time. Aggarwal and Park [5] and Apostolico et al. [12] have independently
shown that the tube-minima problem for an n × n × n Monge-composite array can
be solved in O(lg n) time using n
2
/lg n processors on a CREW PRAM, and, recently,
Atallah [13] has shown that this tube-minima problem can be solved in O(lglg n) time
using n
2
/lg lg n processors on a CRCW PRAM. Both results are optimal with respect
to time and processor-time product. In view of the applications, we assume that the two
n × n Monge arrays D ={d[i,j]}and E ={e[j,k]}, that together form the Monge-
composite array, are stored in the global memory of the PRAM. Again, for parallel
machines without a global memory, we need to use a more restrictive model; the details
of this model are given later. No efficient algorithms (other than the one that simulates
the CRCW-PRAM algorithm) were known for solving the tube-minima problem for a
hypercube or a shuffle-exchange network.
1.3. Our Main Results. The time andprocessor complexitiesof algorithms for comput-
ing row minima in two-dimensional Monge, row minima in two-dimensional staircase-
Monge arrays, and tube minima in three-dimensional Monge-composite arrays are listed
in Tables 1.1, 1.2, and 1.3, respectively. We assume a normal model of hypercube com-
putation, in which each processor uses only one of its edges in a single time step, only
one dimension of edges is used at any given time step, and the dimension used at time
step t + 1 is within 1 module d of the dimension used at time step t, where d is the
dimension of the hypercube (see Section 3.1.3 of [32]). It is known that such algorithms
for the hypercube can be implemented on other hypercubic bounded-degree networks
like Butterfly and shuffle-exchange without asymptotic slow-down. Observe that our
results for staircase-Monge arrays match the corresponding bounds for Monge arrays.
Following are some applications of these new array-searching algorithms.
1. All Pairs Shortest Path (APSP) Problem. Consider the following problem: given a
weighted directed graph G = (V, E), |V |=n,|E|=m, we want to find the shortest
path between every pair of vertices in V . In the sequential case, Johnson [26] gave an
O(n
2
lg n +mn)-time algorithm for APSP. In the parallel case, APSP can be solved by
repeated squaring in O(lg
2
n) time using n
3
/lg n processors on a CREW PRAM. Atallah
et al. [15] showhowto solve APSP in O(lg
2
n) time using n
3
/lg n processorson a CREW
PRAM (this solution follows from their O(lg
2
n)-time (n
2
/lg n)-processor solution to
the single source shortest paths problem on such a graph). In Section 4.1 we give the
algorithm of Aggarwal et al. [2] which runs in O(lg
2
n) time using n
2
CREW-PRAM
Table 1.1. Row-minima results for an n × n Monge array.
Model Time Processors Reference
CREW PRAM O(lg n) n [14]
Hypercube O(lg n lglg n) n Theorem 3.2

Parallel Searching in Generalized Monge Arrays 295
Table 1.2. Row-minima results for an n × n staircase-Monge array.
Model Time Processors Reference
CREW PRAM O(lg n) n Theorem 2.3
Hypercube O(lg n lglg n) n Theorem 3.4
processors for the special case of the APSP problem when the graph is acyclic and the
edge weights satisfy the quadrangle inequality.
6
2. Huffman Coding Problem. Consider the following problem: given an alphabet C
of n characters and the function f
i
indicating the frequency of character c
i
∈ C in a
file, construct a prefix code which minimizes the number of bits needed to encode the
file, i.e., construct a binary tree T such that each leaf corresponds to a character in the
alphabet and the weight of the tree, W(T ), is minimized, where
W(T ) =
n
X
i=1
f
i
d
i
,(1.3)
and d
i
is the depth in T of the leaf corresponding to character c
i
. The weight of the tree
W(T ) is exactly the minimum number of bits needed to encode the file (see [18]). The
construction of such an optimal code (which is called a Huffman code) is a classical
problem in data compression. In the sequential domain, Huffman in [25] showed how to
construct Huffman codes greedily in O(n) time (once the character frequencies are in
sortedorder).In[15],Atallahet al.reducedHuffmancodingto O(lgn) tubeminimization
problemsonMonge-compositearrays,therebyobtainingparallelalgorithmsforHuffman
coding that run in O(lg
2
n) time using n
2
/lg n processors on a CREW PRAM and in
O(lg n(lglg n)
2
) time using n
2
/(lg lg n)
2
processors on a CRCW PRAM. Larmore and
Przytycka in [30] reduce Huffman coding to the Concave Least Weight Subsequence
(CLWS) problem (defined in Section 4.2) and then showhow to solve CLWS, and thereby
Huffmancoding, in O(
√
n lg n) time usingn processorsona CREWPRAM. Theirs is the
first known parallel algorithm for Huffman coding requiring o(n
2
) work. In Section 4.2
we present the result of Czumaj [20] for finding the Huffman code in O(lg
r+1
n) time
and a total of O(n
2
lg
2−r
n) work on a CREW PRAM, for any r ≥ 1. This is the first
NC algorithm that achieves o(n
2
) work.
Table 1.3. Tube-minima results for an n × n × n Monge-composite array.
Model Time Processors Reference
CREW PRAM O(lg n) n
2
/lg n [5], [12]
Hypercube O(lg n) n
2
Theorem 3.5
6
Given an ordering of the vertices of a graph, the quadrangle inequality states that any four distinct vertices
appearing in increasing order in that ordering, i
1
, i
2
, j
1
, and j
2
, must satisfy d(i
1
, j
1
) +d(i
2
, j
2
) ≥ d(i
1
, j
2
) +
d(i
2
, j
1
). In other words, in the quadrangle formed by i
1
i
2
j
1
j
2
, the sum of the diagonals is greater than the
sum of the sides. Notice that this condition is the same as (1.2) and they both appear in the literature.