scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fast parallel algorithms for the maximum sum problem

01 Mar 1995-Vol. 21, Iss: 3, pp 461-466
TL;DR: This work gives an algorithm for the 1D version of the problem to find the maximum sum over all rectangular subregions of a given matrix of real numbers which takes O(log n) time using O( (n) ( log n) ) processors on the EREW PRAM.
Abstract: A problem in pattern recognition is to find the maximum sum over all rectangular subregions of a given (n × n) matrix of real numbers. The problem has one-dimensional (1D) and two-dimensional (2D) versions. For the 1D version, it is to find the maximum sum over all contiguous subvectors of a given vector of n real numbers. We give an algorithm for the 1D version running in O(log n) time using O( (n) ( log n) ) processors on the EREW PRAM, and an algorithm for the 2D version which takes O(log n) time using O( (n 3 ) ( log n) ) processors on the EREW PRAM.
Citations
More filters
Proceedings ArticleDOI
10 May 2004
TL;DR: A VLSI K maximum subarrays algorithm with O(K * n) steps and a circuit size of O(n/sup 2/), which is cost-optimal in parallelisation of the sequential algorithm.
Abstract: Given an array of positive and negative values, we consider the problem of K maximum sums. When an overlapping property needs to be observed, previous algorithms for the maximum sum are not directly applicable. We designed an O(K * n) algorithm for the K maximum subsequences problem. This was then modified to solve the K maximum subarrays problem in O(K * n/sup 3/) time. Finally, we present a VLSI K maximum subarrays algorithm with O(K * n) steps and a circuit size of O(n/sup 2/), which is cost-optimal in parallelisation of the sequential algorithm.

53 citations


Cites background from "Fast parallel algorithms for the ma..."

  • ...Keyword: maximum subarray, maximum subsequence, prefix sums, VLSI...

    [...]

Book ChapterDOI
16 Aug 2005
TL;DR: The K-maximum subarray problem is to find the K subarrays with largest sums, and the time complexity is improved from O(min K+n\log^2 n, n\sqrt{K}\}) to O(nlog K + K2) for K ≤ n.
Abstract: The maximum subarray problem for a one- or two-dimensional array is to find the array portion that maiximizes the sum of array elements in it. The K-maximum subarray problem is to find the K subarrays with largest sums. We improve the time complexity for the one-dimensional case from $O(min\{K+n\log^2 n, n\sqrt{K}\})$ for 0 ≤ K ≤ n(n–1)/2 to O(nlog K + K2) for K ≤ n. The latter is better when $K \le \sqrt n\log n$. If we simply extend this result to the two-dimensional case, we will have the complexity of O(n3log K + K2n2). We improve this complexity to O(n3) for $K \le \sqrt{n}$.

29 citations


Cites methods from "Fast parallel algorithms for the ma..."

  • ...For the two-dimensional case, EREW PRAM solutions achieving O(log n) time with O(n(3)/logn) processors are given in [7, 8] and comparable result on interconnection networks is given in [9]....

    [...]

01 Jan 2007
TL;DR: This thesis explores various techniques to speed up the computation, and several new algorithms for the maximum subarray problem, and investigates a speed-up option through parallel computation.
Abstract: The maximum subarray problem (MSP) involves selection of a segment of consecutive array elements that has the largest possible sum over all other segments in a given array. The efficient algorithms for the MSP and related problems are expected to contribute to various applications in genomic sequence analysis, data mining or in computer vision etc. The MSP is a conceptually simple problem, and several linear time optimal algorithms for 1D version of the problem are already known. For 2D version, the currently known upper bounds are cubic or near-cubic time. For the wider applications, it would be interesting if multiple maximum subarrays are computed instead of just one, which motivates the work in the first half of the thesis. The generalized problem of K-maximum subarray involves finding K segments of the largest sum in sorted order. Two subcategories of the problem can be defined, which are K-overlapping maximum subarray problem (K-OMSP), and K-disjoint maximum subarray problem (K-DMSP). Studies on the K-OMSP have not been undertaken previously, hence the thesis explores various techniques to speed up the computation, and several new algorithms. The first algorithm for the 1D problem is of O(Kn) time, and increasingly efficient algorithms of O(K + n logK) time, O((n+K) logK) time and O(n+K logmin(K,n)) time are presented. Considerations on extending these results to higher dimensions are made, which contributes to establishing O(n) time for 2D version of the problem where K is bounded by a certain range. Ruzzo and Tompa studied the problem of all maximal scoring subsequences, whose definition is almost identical to that of the K-DMSP with a few subtle differences. Despite slight differences, their linear time algorithm is readily capable of computing the 1D K-DMSP, but it is not easily extended to higher dimensions. This observation motivates a new algorithm based on the tournament data structure, which is of O(n+K logmin(K,n)) worst-case time. The extended version of the new algorithm is capable of processing a 2D problem in O(n + min(K,n) · n logmin(K,n)) time, that is O(n) for K ≤ n log n . For the 2D MSP, the cubic time sequential computation is still expensive for practical purposes considering potential applications in computer vision and data mining. The second half of the thesis investigates a speed-up option through parallel computation. Previous parallel algorithms for the 2D MSP have huge demand for hardware resources, or their target parallel computation models are in the realm of pure theoretics. A nice compromise between speed and cost can be realized through utilizing a mesh topology. Two mesh algorithms for the 2D MSP with O(n) running time that require a network of size O(n) are designed and analyzed, and various techniques are considered to maximize the practicality to their full potential.

29 citations


Cites background or methods from "Fast parallel algorithms for the ma..."

  • ...Previously known parallel algorithms for the maximum subarray problem (MSP) [103, 77, 79] are also derived from the sequential solutions....

    [...]

  • ...This size is far below the requirement of O(n(3)/ log n) size by previous parallel solutions [103, 77, 79], and each processing unit used in this solution is simple enough so that modern technology can include millions of them in a single VLSI chip....

    [...]

  • ...Wen [103] presented a parallel algorithm for the one-dimensional version running in O(log n) time using O(n/ log n) processors on the EREW PRAM (Exclusive Read, Exclusive Write Parallel Random Access Machine) and a similar result is given by Perumalla and Deo [77]....

    [...]

  • ...Perumalla and Deo [77], Wen [103] and Qiu and Akl [79] gave optimal parallel algorithms....

    [...]

  • ...Wen’s EREW PRAM algorithm [103] employs Smith’s recursive linear time algorithm (Algorithm 5) [84] and does not separate four different phases....

    [...]

Journal ArticleDOI
TL;DR: This work finds K maximum subarrays in sorted order with improved complexity of O ((n + K) log K).
Abstract: The maximum subarray problem is to find the contiguous array elements having the largest possible sum. We extend this problem to find K maximum subarrays. For general K maximum subarrays where overlapping is allowed, Bengtsson and Chen presented $$O\left(\mathit{min}\right\{K+n{\hbox{ log }}^{2}n,n\sqrt{K}\left\}\right)$$ time algorithm for one-dimensional case, which finds unsorted subarrays. Our algorithm finds K maximum subarrays in sorted order with improved complexity of O ((n + K) log K). For the two-dimensional case, we introduce two techniques that establish O(n3) and subcubic time.

25 citations


Cites methods from "Fast parallel algorithms for the ma..."

  • ...For the two-dimensional case, EREW PRAM solutions achieving O(log n) time with O(n3/logn) processors are given in [7, 8 ] and comparable result on interconnection networks is given in [9]....

    [...]

Dissertation
01 Jan 2007
TL;DR: A new framework for parallel programming for trees on the basis of the programming model called skeletal parallel programming is developed, in which a tree is divided with high locality and good load balance and tree skeletons are executed efficiently.
Abstract: Parallel computing is an essential technique to deal with large scaled problems. In recent years, while hardware for parallel computing is getting widely available, developing software for parallel computing remains as a hard task for many programmers. The main difficulties are caused by the communication, synchronization, and data distribution required in parallel programs. This thesis studies the theory and practice of parallel programming for trees based on parallel primitives called tree skeletons. Trees are important data structures for representing structured data. However, their irregular and ill-balanced structure makes it hard to develop efficient parallel programs on them, because naive divide-and-conquer parallel computation may lead to poor performance for ill-balanced trees. To remedy this situation, this thesis develops a new framework for parallel programming for trees on the basis of the programming model called skeletal parallel programming. Skeletal parallel programming, first proposed by Cole, encourages programmers to develop parallel programs by composing ready-made components called parallel skeletons (or algorithmic skeletons). A theory has been proposed for design of parallel skeletons for lists based on constructive algorithms, and several libraries of parallel skeletons have been developed to bring the theory into practice. This thesis extends these ideas from lists to trees. The following are three important contributions in the thesis. The first contribution is the design of parallel tree skeletons for both binary trees and general trees of arbitrary shape. Our parallel tree skeletons have a sequential interface but with a parallel implementation; the sequential interface is designed based on the theory of constructive algorithmics, while the parallel implementation is either based on tree contraction algorithms or newly developed ones. The second contribution is a set of theories for skeletal parallel programming on trees. These theories provide us with a systematic method for deriving skeletal parallel programs from sequential programs. We illustrate effectiveness of the method by solving two classes of nontrivial problems, maximum marking problems and XPath queries. The third contribution is an implementation of a parallel skeleton library for trees. We developed a new implementation algorithm for tree skeletons, in which a tree is divided with high locality and good load balance and tree skeletons are executed efficiently in

15 citations


Cites background from "Fast parallel algorithms for the ma..."

  • ...Several researchers have studied parallel algorithms for the maximum segment sum problem on lists (arrays) and the maximum subarray sum problem (maximum segment sum problem on two-dimensional arrays) for several parallel-computation models [6,9,59,106,109,127]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work defines a novel scheduling problem, which leads to the first optimal logarithmic time PRAM algorithm for list ranking, and shows how to apply these results to obtain improved PRAM upper bounds for a variety of problems on graphs.
Abstract: We define a novel scheduling problem; it is solved in parallel by repeated, rapid, approximate reschedulings. This leads to the first optimal logarithmic time PRAM algorithm for list ranking. Companion papers show how to apply these results to obtain improved PRAM upper bounds for a variety of problems on graphs, including the following: connectivity, biconnectivity, Euler tour and $st$-numbering, and a number of problems on trees.

194 citations

Journal ArticleDOI
TL;DR: This paper derives a O(n3) divide-and-conquer algorithm, then shows that it can be executed in O(log2n) time in parallel and, furthermore, with pipelining of inputs itCan be executed with O(1) time between successive outputs.

76 citations

Journal ArticleDOI
David Gries1
TL;DR: The by-now-standard strategy for developing a loop invariant and loop was developed in [1] and explained [2].

73 citations

Journal ArticleDOI
TL;DR: An adaptive parallel algorithm for inducing a priority queue structure on an n-element array is presented and the algorithm is extended to provide optimal parallel construction algorithms for three other heap-like structures useful in implementing double-ended priority queues, namely min- max heaps, deeps, and min-max-pair heaps.
Abstract: An adaptive parallel algorithm for inducing a priority queue structure on an n-element array is presented. The algorithm is extended to provide optimal parallel construction algorithms for three other heap-like structures useful in implementing double-ended priority queues, namely min-max heaps, deeps, and min-max-pair heaps. It is shown that an n-element array can be made into a heap, a deap, a min-max heap, or a min-max-pair heap in O(log n+(n/p)) time using no more than n/log n processors, in the exclusive-read-exclusive-write parallel random-access machine model. >

25 citations