scispace - formally typeset
Search or ask a question

Showing papers on "Parallel algorithm published in 1992"


Book
01 Oct 1992
TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Abstract: Written by an authority in the field, this book provides an introduction to the design and analysis of parallel algorithms. The emphasis is on the application of the PRAM (parallel random access machine) model of parallel computation, with all its variants, to algorithm analysis. Special attention is given to the selection of relevant data structures and to algorithm design principles that have proved to be useful. Features *Uses PRAM (parallel random access machine) as the model for parallel computation. *Covers all essential classes of parallel algorithms. *Rich exercise sets. *Written by a highly respected author within the field. 0201548569B04062001

1,577 citations


Journal ArticleDOI
TL;DR: The author, a well-known researcher in paralle l computing, once again has proved his expertise and authority on the materials covered and this book will certainly have an impact to the psychology of students and researchers alike.
Abstract: In the ever-expanding field of parallel computing, we have seen a number of textbooks , some emphasizing the design aspects of parallel algorithms based on abstract models of paralle l machines (such as PRAMs) and some others focusing on the topological properties of paralle l architectures . What is needed in this area is a book which provides a linkage between th e topological properties of a parallel network and its computational capabilities or limitations , as well as comparative analyses of parallel architectures, not only among the proposed ones but also in view of a desirable general-purpose parallel machine which is yet to be built . The book under review comes closest to this goal . The author, a well-known researcher in paralle l computing, once again has proved his expertise and authority on the materials covered . This book will certainly have an impact to the psychology of students and researchers alike, on ho w to correlate parallel architectures and algorithms . Physically, this book is organized around three categories of parallel architectures : Arrays and Trees, Meshes of Trees, and Hypercubic networks . Each category covers not only th e basic type of architectures but also other variants or related models . For example, Chapter 1 on Arrays and Trees encompasses linear arrays, two-dimensional arrays, trees, ring, torus, X tree, pyramid, multigrid networks, systolic and semisystolic networks, and higher-dimensional arrays as well . Similarly, Chapter 2 on Meshes of Trees shows different ways of looking at two-dimensional meshes of trees at the beginning and further extends to higher-dimensiona l meshes of trees, and shuffle-tree graphs at the end . The third chapter, Hypercubes and Related Networks, covers butterfly, cube-connected-cycles, Benes network, shuffle-exchange, de Bruij n network, butterfly-like networks (Omega network, flip network, baseline and reverse baselin e networks, Banyan and delta networks, and k-ary butterfy), and de Bruijn-type networks (k-ar y de Bruijn network, and generalized shuffle-exchange network) . Whereas the above parallel networks constitute the architectural domain of the hook as th e basis, the application domain — parallel computation problems and algorithms — threads th e chapters together and helps a reader to view the similarities and differences of each network , from algorithm design standpoint . In addition to the definitions and characterizations of th e topological properties of the parallel architectures, each chapter examines a carefully-chose n subset of fundamental computational problems such as integer arithmetic, prefix computation , list ranking, sorting and counting, matrix arithmetic, graph problems, Fast Fourier Transfor m and Discrete Fourier Transform, computational geometry, and image analysis etc . The solution s to these problems are explored from simple algorithms to more complicated ones until it achieve s optimality. This approach seems to be adequate to reveal the capability and limitations of eac h network . The problems and algorithms are not treated in an isolated context but provokes a reader to capture what is achievable in terms of speedup and efficiency, and what is the limi t in terms of lower hounds, in a particular parallel network under focus . The author pays special attention to the routing problem . Considering that routing is a common vehicle for solving most of the regular and irregular parallel computation problem s in a fixed-connection network, the general capability of each network against an abstract parallel machine model is properly exposed via routing problem . Also discussed are the containment/embedding of one network in another, i .e . mapping between networks and the simulatio n

665 citations


Journal ArticleDOI
TL;DR: The construction of a crossed cube which has many of the properties of the hypercube, but has diameter only about half as large, is discussed, and it is shown that the CQ/sub n/ architecture can profitably emulate the ordinary hypercube.
Abstract: The construction of a crossed cube which has many of the properties of the hypercube, but has diameter only about half as large, is discussed. This network is self-routing, in the sense that there is a simple distributed routing algorithm which guarantees optimal paths between any pair of vertices. This fact, together with other properties such as regularity, symmetry, high connectivity, and a simple recursive structure, suggests that the crossed cube may be an attractive alternative to the ordinary hypercube for massively parallel architectures, SIMD algorithms, which utilize the architecture are developed, and it is shown that the CQ/sub n/ architecture can profitably emulate the ordinary hypercube. It is also shown that addition of simple switches can improve the capabilities of the system significantly. For instance, the dynamic reconfiguration capability allows hypercube algorithms to be executed on the proposed architecture. The use of these switches also improves the embedding properties of the system. >

398 citations


Journal ArticleDOI
TL;DR: This paper identifies important characteristics of clustering algorithms and proposes a general framework for analyzing and evaluating such algorithms and presents an analytic performance comparison of Dominant Sequence Clustering (DSC), explaining why DSC is superior to other algorithms.

393 citations


Journal ArticleDOI
TL;DR: An efficient scheme for two-dimensional data encryption is presented based on the principles and ideas reflected by the specification and development of the SCAN language, and is mainly motivated for the encryption of 2D digital pictures.

278 citations


Journal ArticleDOI
TL;DR: The proposed parallel algorithm is based on an artificial neural network composed of nm processing elements for an n-cell-m-frequency problem and found better solutions than the existing algorithm in one out of eight problems.
Abstract: The channel assignment problem involves not only assigning channels or frequencies to each radio cell. but also satisfying frequency constraints given by a compatibility matrix. The proposed parallel algorithm is based on an artificial neural network composed of nm processing elements for an n-cell-m-frequency problem. The algorithm runs not only on a sequential machine but also on a parallel machine with up to a maximum of nm processors. The algorithm was tested by solving eight benchmark problems where the total number of frequencies varied from 100 to 533. The algorithm found the solutions in nearly constant time with nm processors. The simulation results showed that the algorithm found better solutions than the existing algorithm in one out of eight problems. >

264 citations


01 Jan 1992
TL;DR: NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms, and several examples of algorithms coded in the language are described.
Abstract: This report describes NESL, a strongly-typed, applicative, data-parallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of data-parallel constructs based on vectors, including a mechanism for applying any function over the elements of a vector in parallel and a rich set of parallel functions that manipulate vectors. NESL fully supports nested vectors and nested parallelism--the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with complex and dynamically changing data structures, such as required in many graph and sparse matrix algorithms. NESL also provides a mechanism for calculating the asymptotic running time for a program on various parallel machine models, including the parallel random access machine (PRAM). This is useful for estimating running times of algorithms on actual machines and, when teaching algorithms, for supplying a close correspondence between the code and the theoretical complexity. This report defines NESL and describes several examples of algorithms coded in the language. The examples include algorithms for median finding, sorting, string searching, finding prime numbers, and finding a planar convex hull. NESL currently compiles to an intermediate language called VCODE, which runs on the Cray Y-MP, Connection Machine CM-2, and Encore Multimax. For many algorithms, the current implementation gives performance close to optimized machine-specific code for these machines.

262 citations


Journal ArticleDOI
TL;DR: The authors describe PROOFS, a fast fault simulator for synchronous sequential circuits that achieves high performance by combining all the advantages of differential fault simulation, single fault propagation, and parallel fault simulation while minimizing their individual disadvantages.
Abstract: The authors describe PROOFS, a fast fault simulator for synchronous sequential circuits, PROOFS achieves high performance by combining all the advantages of differential fault simulation, single fault propagation, and parallel fault simulation, while minimizing their individual disadvantages. The fault simulator minimizes the memory requirements, reduces the number of gate evaluations, and simplifies the complexity of the software implementation. PROOFS requires an average of one fifth the memory required for concurrent fault simulation and runs six to 67 times faster on the ISCAS-89 sequential benchmark circuits. >

222 citations


Journal ArticleDOI
TL;DR: The algorithm reduces memory and bus contention, which many parallel sorting algorithms suffer from, by using a regular sampling of the data to ensure good pivot selection and is shown to be asymptotically optimal.

218 citations


Journal ArticleDOI
Masao Fukushima1
TL;DR: A decomposition algorithm for solving convex programming problems with separable structure that reduces to the ordinary method of multipliers when the problem is regarded as nonseparable.
Abstract: This paper presents a decomposition algorithm for solving convex programming problems with separable structure. The algorithm is obtained through application of the alternating direction method of multipliers to the dual of the convex programming problem to be solved. In particular, the algorithm reduces to the ordinary method of multipliers when the problem is regarded as nonseparable. Under the assumption that both primal and dual problems have at least one solution and the solution set of the primal problem is bounded, global convergence of the algorithm is established.

216 citations


Journal ArticleDOI
TL;DR: A one-pass parallel thinning algorithm based on a number of criteria, including connectivity, unit-width convergence, medial axis approximation, noise immunity, and efficiency, is proposed and extended to the derived-grid to attain an isotropic medial axis representation.
Abstract: A one-pass parallel thinning algorithm based on a number of criteria, including connectivity, unit-width convergence, medial axis approximation, noise immunity, and efficiency, is proposed. A pipeline processing model is assumed for the development. Precise analysis of the thinning process is presented to show its properties, and proofs of skeletal connectivity and convergence are provided. The proposed algorithm is further extended to the derived-grid to attain an isotropic medial axis representation. A set of measures based on the desired properties of thinning is used for quantitative evaluation of various algorithms. Image reconstruction from connected skeletons is also discussed. Evaluation shows that the procedures compare favorably to others. >

Journal ArticleDOI
TL;DR: To assess stability, which is also a precondition for scalability, the authors introduce and measure the load-sharing hit-ratio, the ratio of remote execution requests concluded successfully.
Abstract: A method for qualitative and quantitative analysis of load sharing algorithms is presented, using a number of well known examples as illustration Algorithm design choice are considered with respect to the main activities of information dissemination and allocation decision making It is argued that nodes must be capable of making local decisions, and for this efficient state, dissemination techniques are necessary Activities related to remote execution should be bounded and restricted to a small proportion of the activity of the system The quantitative analysis provides both performance and efficiency measures, including consideration of the load and delay characteristics of the environment To assess stability, which is also a precondition for scalability, the authors introduce and measure the load-sharing hit-ratio, the ratio of remote execution requests concluded successfully Using their analysis method, they are able to suggest improvements to some published algorithms >

Journal ArticleDOI
TL;DR: An optimal algorithm for one-to-all broadcasting in the star graph is proposed and works by recursively partitioning the original star graph into smaller star graphs.
Abstract: The star graph has been show to be an attractive alternative to the widely used n-cube. Like the n-cube, the star graph possesses rich structure and symmetry as well as fault tolerant capabilities, but has a smaller diameter and degree. However, very few algorithms exists to show its potential as a multiprocessor interconnection network. Many fast and efficient parallel algorithms require broadcasting as a basic step. An optimal algorithm for one-to-all broadcasting in the star graph is proposed. The algorithm can broadcast a message to N processors in O(log/sub 2/ N) time. The algorithm exploits the rich structure of the star graph and works by recursively partitioning the original star graph into smaller star graphs. In addition, an optimal all-to-all broadcasting algorithm is developed. >

Journal ArticleDOI
TL;DR: The power test as discussed by the authors is a combination of the extended GCD algorithm and the Fourier-Motzkin method to eliminate variables in a system of inequalities, which is the first test that can generate the information needed for some advanced transformations, and that can handle complex simultaneous loop limits.
Abstract: A data dependence decision algorithm called the power test is introduced. The power test is a combination of the extended GCD algorithm and the Fourier-Motzkin method to eliminate variables in a system of inequalities. This is the first test that can generate the information needed for some advanced transformations, and that can handle complex simultaneous loop limits. Previous work in data dependence decision algorithms is reviewed. Some examples which motivated the development of this test are examined, including those which demonstrate the additional power of the power test. Although it may be too expensive for use as a general-purpose dependence test in a compiler, the power test has proved useful in an interactive program restructuring environment. >

Journal ArticleDOI
TL;DR: A survey and a characterization of theVarious parallel algorithms and architectures developed for the problem of labeling digitized images over the last two decades are presented and it is shown that four basic parallel techniques underly the various parallel algorithms for this problem.
Abstract: A survey and a characterization of the various parallel algorithms and architectures developed for the problem of labeling digitized images over the last two decades are presented. It is shown that four basic parallel techniques underly the various parallel algorithms for this problem. However, because most of these techniques have been developed at a theoretical level, it is still not clear which techniques are most efficient in practical terms. Parallel architectures and parallel models of computation that implement these techniques are also studied. >

Book ChapterDOI
08 Jul 1992
TL;DR: It is shown that optimality to within a multiplicative factor close to one can be achieved for the problems of Gauss-Jordan elimination and sorting, by transportable algorithms that can be applied for a wide range of values of the parameters p, g, and L.
Abstract: We describe a methodology for constructing parallel algorithms that are transportable among parallel computers having different numbers of processors, different bandwidths of interprocessor communication and different periodicity of global synchronisation. We do this for the bulk-synchronous parallel (BSP) model, which abstracts the characteristics of a parallel machine into three numerical parameters p, g, and L, corresponding to processors, bandwidth, and periodicity respectively. The model differentiates memory that is local to a processor from that which is not, but, for the sake of universality, does not differentiate network proximity. The advantages of this model in supporting shared memory or PRAM style programming have been treated elsewhere. Here we emphasise the viability of an alternative direct style of programming where, for the sake of efficiency the programmer retains control of memory allocation. We show that optimality to within a multiplicative factor close to one can be achieved for the problems of Gauss-Jordan elimination and sorting, by transportable algorithms that can be applied for a wide range of values of the parameters p, g, and L. We also give some simulation results for PRAMs on the BSP to identify the level of slack at which corresponding efficiencies can be approached by shared memory simulations, provided the bandwidth parameter g is good enough.

Journal ArticleDOI
Tadao Takaoka1
TL;DR: A new algorithm is invented for the all pairs shortest path problem with O (n 3 ( log log n log n ) time on a unifor RAM) by an improvement of Fredman's result by a factor of 1.6.

Journal ArticleDOI
TL;DR: Poggio et al. as discussed by the authors proposed a regularization method for determining scales for edge detection adaptively for each site in the image plane, which can detect both step and diffuse edges while drastically filtering out the random noise.
Abstract: The authors suggest a regularization method for determining scales for edge detection adaptively for each site in the image plane. Specifically, they extend the optimal filter concept of T. Poggio et al. (1984) and the scale-space concept of A. Witkin (1983) to an adaptive scale parameter. To avoid an ill-posed feature synthesis problem, the scheme automatically finds optimal scales adaptively for each pixel before detecting final edge maps. The authors introduce an energy function defined as a functional over continuous scale space. Natural constraints for edge detection are incorporated into the energy function. To obtain a set of optimal scales that can minimize the energy function, a parallel relaxation algorithm is introduced. Experiments for synthetic and natural scenes show the advantages of the algorithm. In particular, it is shown that this system can detect both step and diffuse edges while drastically filtering out the random noise. >

Journal ArticleDOI
TL;DR: This work develops a general and easy to implement technique to make robust many efficient parallel algorithms, e.g., algorithms for all the problems listed above, and obtains an optimal cost algorithm when P=N.
Abstract: The efficient parallel algorithms proposed for many fundamental problems, such as list ranking, integer sorting and computing preorder numberings on trees, are very sensitive to processor failures. The requirement of efficiency (commonly formalized using Parallel-time × Processors as a cost measure) has led to the design of highly tuned PRAM algorithms which, given the additional constraint of simple processor failures, unfortunately become inefficient or even incorrect. We propose a new notion of robustness, that combines efficiency with fault tolerance. For the common case of fail-stop errors, we develop a general and easy to implement technique to make robust many efficient parallel algorithms, e.g., algorithms for all the problems listed above. More specifically, for any dynamic pattern of fail-stop errors on a CRCW PRAM with at least one surviving, processor, our method increases the original algorithm cost by at most a log2 multiplicative factor. Our technique is based on a robust solution of the problem of Write-All, i.e., using P processors, write 1's in all locations of an N-sized array. In addition we show that at least a log/log log multiplicative overhead will be incurred for certain patterns of failures by any algorithm that implements robust solutions to Write-All with P = N. However, by exploiting parallel slackness, we obtain an optimal cost algorithm when P ≤ N/log2N-logNloglogN.

Journal ArticleDOI
TL;DR: A domain-splitting algorithm for unstructured grids, which tries to minimize the surface-to-volume ratio of each subdomain, is described, which is employed both for grid generation and grid smoothing.
Abstract: A parallel unstructured grid generation algorithm is presented and implemented on the INTEL hypercube. Different processor hierarchies are discussed, and the appropriate hierarchies for mesh generation and mesh smoothing are selected. A domain-splitting algorithm for unstructured grids, which tries to minimize the surface-to-volume ratio of each subdomain, is described. This splitting algorithm is employed both for grid generation and grid smoothing. Results obtained on the INTEL hypercube demonstrate the effectiveness of the algorithms developed.

BookDOI
01 Jan 1992
TL;DR: The proceedings of the Parallel Computational Fluid Dynamics 2008 Conference as mentioned in this paper were collected to examine major developments in block-structured grid and boundary methods to simulate flows over moving bodies, specific methods for optimization in aerodynamics design, innovative parallel algorithms and numerical solvers, such as scalable algebraic multilevel preconditioners and the acceleration of iterative solutions, software frameworks and component architectures for parallelism, large scale computing and parallel efficiencies in the industrial context, lattice Boltzmann and SPH methods, and applications in the environment, biofluids, and
Abstract: This book collects the proceedings of the Parallel Computational Fluid Dynamics 2008 conference held in Lyon, France. Contributed papers by over 40 researchers representing the state of the art in parallel CFD and architecture from Asia, Europe, and North America examine major developments in (1) block-structured grid and boundary methods to simulate flows over moving bodies, (2) specific methods for optimization in Aerodynamics Design, (3) innovative parallel algorithms and numerical solvers, such as scalable algebraic multilevel preconditioners and the acceleration of iterative solutions, (4) software frameworks and component architectures for parallelism, (5) large scale computing and parallel efficiencies in the industrial context, (6) lattice Boltzmann and SPH methods, and (7) applications in the environment, biofluids, and nuclear engineering.

Journal ArticleDOI
TL;DR: This work presents a unified derivation of four rotation-based recursive least squares algorithms that solve the adaptive least squares problems of the linear combiner, thelinear combiner without a desired signal, the single channel, and the multichannel linear prediction and transversal filtering.
Abstract: This work presents a unified derivation of four rotation-based recursive least squares (RLS) algorithms. They solve the adaptive least squares problems of the linear combiner, the linear combiner without a desired signal, the single channel, and the multichannel linear prediction and transversal filtering. Compared to other approaches, the authors' derivation is simpler and unified, and may be useful to readers for better understanding the algorithms and their relationships. Moreover, it enables improvements of some algorithms in the literature in both the computational and the numerical issues. All algorithms derived in this work are based on Givens rotations. They offer superior numerical properties as shown by computer simulations. They are computationally efficient and highly concurrent. Aspects of parallel implementation and parameter identification are discussed. >

Proceedings ArticleDOI
01 Mar 1992
TL;DR: The presented algorithm satisfies the AT/Sup 2/ lower bound of Omega (n/sup 2/) for sorting n numbers in the word model of VLSI for optimal sorting on the reconfigurable mesh.
Abstract: An optimal sorting algorithm on the reconfigurable mesh is proposed. The algorithm sorts n numbers in constant time using n*n processors. The best known previous result uses O(n*nlog/sup 2/n) processors. The presented algorithm satisfies the AT/sup 2/ lower bound of Omega (n/sup 2/) for sorting n numbers in the word model of VLSI. Modification to the algorithm for area-time trade-off is shown, to achieve AT/sup 2/ optimality over 1 >

Journal ArticleDOI
TL;DR: The parallel method provides rapid, high-resolution alignments for users of the software toolkit for pairwise sequence comparison, as illustrated here by a comparison of the chloroplast genomes of tobacco and liverwort.
Abstract: The local similarity problem is to determine the similar regions within two given sequences. We recently developed a dynamic programming algorithm for the local similarity problem that requires only space proportional to the sum of the two sequence lengths, whereas earlier methods use space proportional to the product of the lengths. In this paper, we describe how to parallelize the new algorithm and present results of experimental studies on an Intel hypercube. The parallel method provides rapid, high-resolution alignments for users of our software toolkit for pairwise sequence comparison, as illustrated here by a comparison of the chloroplast genomes of tobacco and liverwort.

Journal ArticleDOI
TL;DR: Computations of three-dimensional compressible flows using unstructured meshes having close to one million elements, such as a complete airplane, demonstrate that the Connection Machine systems are suitable for these applications.
Abstract: A finite element method for computational fluid dynamics has been implemented on the Connection Machine systems CM-2 and CM-200. An implicit iterative solution strategy, based on the preconditioned matrix-free GMRES algorithm, is employed. Parallel data structures built on both nodal and elemental sets are used to achieve maximum parallelization. Communication primitives provided through the Connection Machine Scientific Software Library substantially improved the overall performance of the program. Computations of three-dimensional compressible flows using unstructured meshes having close to one million elements, such as a complete airplane, demonstrate that the Connection Machine systems are suitable for these applications. Performance comparisons are also carried out with the vector computers Cray Y-MP and Convex C-1.

Proceedings Article
A. Etemadi1
07 Apr 1992
TL;DR: A new, robust, parallel algorithm for the segmentation of edge data in the form of straight-lines and circular arcs is described, which is implemented within the FEX software package for use on sequential machines.
Abstract: The author describes a new, robust, parallel algorithm for the segmentation of edge data. The segmentation is in the form of straight-lines and circular arcs. No user-supplied thresholds are necessary, and additionally the further grouping of the segments is simplified. The algorithm has been implemented within the FEX software package for use on sequential machines. Using a number of complex edge maps, he compares the results of the application of the algorithm with that developed by West and Rosin (1991) based on a technique suggested by Lowe (1987).< >

Proceedings ArticleDOI
30 Aug 1992
TL;DR: The author presents an approach to the approximation of curves for the description and recognition of curved objects that needs only to make a rotation and a scaling respectively to a circular arc to find the longest arc of the approximation.
Abstract: The author presents an approach to the approximation of curves for the description and recognition of curved objects. One needs only to make a rotation and a scaling respectively to a circular arc to find the longest arc of the approximation of the part of a curve which might contain linear segments. As all points in the curve are independent from each other in computing, it is easy to use a parallel algorithm for fast computing. Experiments on several high resolution tactile images show the algorithm behaves well. This method can also be extended to the recognition of occluded curved objects. >

Journal ArticleDOI
TL;DR: A class of symmetric Hopfield networks with nonpositive synapses and zero threshold is analyzed in detail and it is shown that this class naturally solves the vertex cover problem.
Abstract: A class of symmetric Hopfield networks with nonpositive synapses and zero threshold is analyzed in detail. It is shown that all stationary points have a one-to-one correspondence with the minimal vertex covers of certain undirected graphs, that the sequential Hopfield algorithm as applied to this class of networks converges in at most 2n steps (n being the number of neurons), and that the parallel Hopfield algorithm either converges in one step or enters a two-cycle in one step. The necessary and sufficient condition on the initial iterate for the parallel algorithm to converge in one step are given. A modified parallel algorithm which is guaranteed to converge in (3n/2) steps ((x) being the integer part of x) for an n-neuron network of this particular class is also given. By way of application, it is shown that this class naturally solves the vertex cover problem. Simulations confirm that the solution provided by this method is better than those provided by other known methods. >

Proceedings Article
23 Aug 1992
TL;DR: A new declustering method for parallel disk systems, called coordinate modulo distribution (CMD), is proposed and analysis shows that the method achieves optimum parallelism for a very high percentage of range queries on multidimensional data, if the distribution of data on each dimension is stationary.
Abstract: I/O parallelism appears to be a promising approach to achieving high performance in parallel database systems In such systems, it is essential to decluster database files into fragments and spread them across multiple disks so that the DBMS software can exploit the I/O bandwidth reading and writing the diiks in parallel In this paper, we consider the problem of declustering multidimensional data on a parallel disk system Since the multidimensional range que-q is the main work-horse for applications accessing such data, our aim is to provide efficient support for it A new declustering method for parallel disk systems, called coordinate modulo distribution (CMD), is proposed Our analysis shows that the method achieves optimum parallelism for a very high percentage of range queries on multidimensional data, if the distribution of data on each dimension is stationary We have derived the *Supported in part by National Science Foundation Grant No IRI-9110584 iOn leave from HeiIongjiang University, P FL China *Computer Science F&search Dept, Lawrence Berkeley Lab, and the Department of Marketing and Quantitative Studies, San Jose State University, San Jose, California Permission to copy without fee all or part of this material is grantedprovided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment To copy otherwise, or to republish, requires a fee and/or special permisJIon from the Endowment Proceedings of the 18th VLDB Conference Vancouver, British Columbia, Canada 1992 exact conditions under which optimality is achieved Also provided are the worst and average case bounds on multidimensional range query performance Experimental results show that the method achieves near optimum performance in almost all cases even when the stationarity assumption does not hold Details of the parallel algorithms for range query processing and data maintenance are also provided

Journal ArticleDOI
TL;DR: The purpose of the ordering is to limit fill and enhance concurrency in the subsequent Cholesky factorization of the matrix, and a geometric approach to nested dissection is used based on a given Cartesian embedding of the graph of the Matrix in Euclidean space.
Abstract: This paper is concerned with the distributed parallel computation of an ordering for a symmetric positive definite sparse matrix. The purpose of the ordering is to limit fill and enhance concurrency in the subsequent Cholesky factorization of the matrix. A geometric approach to nested dissection is used based on a given Cartesian embedding of the graph of the matrix in Euclidean space. The resulting algorithm can be implemented efficiently on massively parallel, distributed memory computers. One unusual feature of the distributed algorithm is that its effectiveness does not depend on data locality, which is critical in this context, since an appropriate partitioning of the problem is not known until after the ordering has been determined. The ordering algorithm is the first component in a suite of scalable parallel algorithms currently under development for solving large sparse linear systems on massively parallel computers.