scispace - formally typeset
Search or ask a question

Showing papers by "Vipin Kumar published in 1993"


Journal ArticleDOI
TL;DR: Isoefficiency analysis helps to determine the best algorithm/architecture combination for a particular problem without explicitly analyzing all possible combinations under all possible conditions.
Abstract: Isoefficiency analysis helps us determine the best algorithm/architecture combination for a particular problem without explicitly analyzing all possible combinations under all possible conditions. >

329 citations


Journal ArticleDOI
TL;DR: The authors present the scalability analysis of a parallel fast Fourier transform (FFT) algorithm on mesh and hypercube connected multicomputers using the isoefficiency metric and show that it is more cost-effective to implement the FFT algorithm on a hypercube rather than a mesh.
Abstract: The authors present the scalability analysis of a parallel fast Fourier transform (FFT) algorithm on mesh and hypercube connected multicomputers using the isoefficiency metric. The isoefficiency function of an algorithm architecture combination is defined as the rate at which the problem size should grow with the number of processors to maintain a fixed efficiency. It is shown that it is more cost-effective to implement the FFT algorithm on a hypercube rather than a mesh despite the fact that large scale meshes are cheaper to construct than large hypercubes. Although the scope of this work is limited to the Cooley-Tukey FFT algorithm on a few classes of architectures, the methodology can be used to study the performance of various FFT algorithms on a variety of architectures such as SIMD hypercube and mesh architectures and shared memory architecture. >

139 citations


Journal ArticleDOI
TL;DR: Experimental results for many synthetic and practical problems run on various parallel machines that validate the theoretical analysis are presented, and it is shown that the average speedup obtained is linear when the distribution of solutions is uniform and superlinear when the distributed distribution is nonuniform.
Abstract: Analytical models and experimental results concerning the average case behavior of parallel backtracking are presented. Two types of backtrack search algorithms are considered: simple backtracking, which does not use heuristics to order and prune search, and heuristic backtracking, which does. Analytical models are used to compare the average number of nodes visited in sequential and parallel search for each case. For simple backtracking, it is shown that the average speedup obtained is linear when the distribution of solutions is uniform and superlinear when the distribution of solutions is nonuniform. For heuristic backtracking, the average speedup obtained is at least linear, and the speedup obtained on a subset of instances is superlinear. Experimental results for many synthetic and practical problems run on various parallel machines that validate the theoretical analysis are presented. >

97 citations


Proceedings ArticleDOI
02 May 1993
TL;DR: It is shown that parallel search techniques derived from their sequential counterparts can enable the solution of instances of the robot motion planning problem which are computationally infeasible on sequential machines.
Abstract: The authors show that parallel search techniques derived from their sequential counterparts can enable the solution of instances of the robot motion planning problem which are computationally infeasible on sequential machines. A parallel version of a robot motion planning algorithm based on quasibest first search with randomized escape from local minima and random backtracking is presented. Its performance on a problem instance, which was computationally infeasible on a single processor of an nCUBE2 multicomputer, is discussed. The limitations of parallel robot motion planning systems are discussed, and a course for future work is suggested. >

77 citations


Proceedings ArticleDOI
16 Aug 1993
TL;DR: This paper analyzes the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predicts the conditions under which each formulation is better than the others.
Abstract: A number of parallel formulations of dense matrix multiplication algorithm have been developed For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed to be superior than the others In this paper we analyze the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predict the conditions under which each formulation is better than the others

66 citations


Journal ArticleDOI
TL;DR: The impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimization of the parallel execution time is studied.

66 citations


Proceedings ArticleDOI
13 Apr 1993
TL;DR: The authors are concerned with dynamic programming (DP) algorithms whose solution is given by a recurrence relation similar to that for the matrix parenthesization problem, and present three different mappings of this systolic algorithm on a mesh connected parallel computer.
Abstract: The authors are concerned with dynamic programming (DP) algorithms whose solution is given by a recurrence relation similar to that for the matrix parenthesization problem. Guibas, Kung and Thompson (1979), presented a systolic array algorithm for this problem that uses O(n/sup 2/) processing cells and solves the problem in O(n) time. The authors present three different mappings of this systolic algorithm on a mesh connected parallel computer. The first two mappings use commonly known techniques for mapping systolic arrays to mesh computers. Both of them are able to obtain only a fraction of maximum possible performance. The primary reason for the poor performance of these formulations is that different nodes at different levels in the multistage graph in the DP formulation require different amounts of computation. Any adaptation has to take this into consideration and evenly distribute the work among the processors. The third mapping balances the work load among processors and thus is capable of providing efficiency approximately equal to 1 (i.e., speedup approximately equal to the number of processors) for any number of processors and sufficiently large problem. They experimentally evaluate these mappings on a mesh embedded onto a 256 processor nCUBE/2. >

14 citations


Proceedings ArticleDOI
05 Jan 1993
TL;DR: The authors study the impact of parallel processing overhead and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time and evaluate a more general criterion of optimality.
Abstract: The authors study the impact of parallel processing overhead and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. They evaluate a more general criterion of optimality and show how operating at the optimal point is equivalent to operating at a unique value of efficiency, which is a characteristic of the criterion of optimality and the properties of the parallel system under study. The technical results derived are put in perspective with similar results that have appeared in the literature. It is shown that this study generalizes and/or extends these earlier results. >

8 citations


15 Nov 1993
TL;DR: The objective of this research is to develop efficient parallel algorithms for a variety of problems and to analyze the scalability of new and existing parallel algorithms.
Abstract: : The objective of this research is to develop efficient parallel algorithms for a variety of problems and to analyze the scalability of new and existing parallel algorithms. Scalability analysis is an important tool used for predicting the performance of an algorithm-architecture combination when one or more of the hardware related parameters (interconnection network, speed of processors, speed of communication channels, number of processors) are changed. The problems studied as a part of this project come from diverse domains such as solution of differential equations, discrete optimization, neural network based learning, sorting and graph algorithms. In particular, we have studied parallel algorithms for solving linear systems using the preconditioned conjugate gradient method, partitioning of finite element meshes, balancing load in unstructured tree search arising in discrete optimization, the backpropagation neural network learning algorithm, dynamic programming, fast fourier transform, sorting, shortest-path computation for graphs, robot motion planning, and matrix multiplication. Parallel algorithms, Scalability analysis, Isoefficiency.

4 citations


01 Jan 1993
TL;DR: The analysis and experiments show that the new load balancing methods presented are highly scalable on SIMD architectures, and their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures.
Abstract: In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mechanism, which determines when the search space redistribution must occur to balance search space over processors; and (ii) a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms, and verify the results experimentally. The analysis and experiments show that our new load balancing methods are highly scalable on SIMD architectures. Their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures. We verify our theoretical results by implementing the 15-puzzle problem on a CM-21 SIMD parallel computer.