Showing papers by "Vipin Kumar published in 1993"

PDF

Open Access

Journal Article•DOI•

Isoefficiency: measuring the scalability of parallel algorithms and architectures

[...]

Ananth Grama¹, Anshul Gupta¹, Vipin Kumar¹•Institutions (1)

01 Aug 1993-IEEE Parallel & Distributed Technology: Systems & Applications

TL;DR: Isoefficiency analysis helps to determine the best algorithm/architecture combination for a particular problem without explicitly analyzing all possible combinations under all possible conditions.

...read moreread less

Abstract: Isoefficiency analysis helps us determine the best algorithm/architecture combination for a particular problem without explicitly analyzing all possible combinations under all possible conditions. >

...read moreread less

329 citations

Journal Article•DOI•

The scalability of FFT on parallel computers

[...]

Anshul Gupta¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

01 Aug 1993-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The authors present the scalability analysis of a parallel fast Fourier transform (FFT) algorithm on mesh and hypercube connected multicomputers using the isoefficiency metric and show that it is more cost-effective to implement the FFT algorithm on a hypercube rather than a mesh.

...read moreread less

Abstract: The authors present the scalability analysis of a parallel fast Fourier transform (FFT) algorithm on mesh and hypercube connected multicomputers using the isoefficiency metric. The isoefficiency function of an algorithm architecture combination is defined as the rate at which the problem size should grow with the number of processors to maintain a fixed efficiency. It is shown that it is more cost-effective to implement the FFT algorithm on a hypercube rather than a mesh despite the fact that large scale meshes are cheaper to construct than large hypercubes. Although the scope of this work is limited to the Cooley-Tukey FFT algorithm on a few classes of architectures, the methodology can be used to study the performance of various FFT algorithms on a variety of architectures such as SIMD hypercube and mesh architectures and shared memory architecture. >

...read moreread less

139 citations

Journal Article•DOI•

On the efficiency of parallel backtracking

[...]

V.N. Rao¹, Vipin Kumar²•Institutions (2)

University of Central Florida¹, University of Minnesota²

01 Apr 1993-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Experimental results for many synthetic and practical problems run on various parallel machines that validate the theoretical analysis are presented, and it is shown that the average speedup obtained is linear when the distribution of solutions is uniform and superlinear when the distributed distribution is nonuniform.

...read moreread less

Abstract: Analytical models and experimental results concerning the average case behavior of parallel backtracking are presented. Two types of backtrack search algorithms are considered: simple backtracking, which does not use heuristics to order and prune search, and heuristic backtracking, which does. Analytical models are used to compare the average number of nodes visited in sequential and parallel search for each case. For simple backtracking, it is shown that the average speedup obtained is linear when the distribution of solutions is uniform and superlinear when the distribution of solutions is nonuniform. For heuristic backtracking, the average speedup obtained is at least linear, and the speedup obtained on a subset of instances is superlinear. Experimental results for many synthetic and practical problems run on various parallel machines that validate the theoretical analysis are presented. >

...read moreread less

97 citations

Proceedings Article•DOI•

Parallel search algorithms for robot motion planning

[...]

Daniel J. Challou¹, Maria Gini¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

02 May 1993

TL;DR: It is shown that parallel search techniques derived from their sequential counterparts can enable the solution of instances of the robot motion planning problem which are computationally infeasible on sequential machines.

...read moreread less

Abstract: The authors show that parallel search techniques derived from their sequential counterparts can enable the solution of instances of the robot motion planning problem which are computationally infeasible on sequential machines. A parallel version of a robot motion planning algorithm based on quasibest first search with randomized escape from local minima and random backtracking is presented. Its performance on a problem instance, which was computationally infeasible on a single processor of an nCUBE2 multicomputer, is discussed. The limitations of parallel robot motion planning systems are discussed, and a course for future work is suggested. >

...read moreread less

77 citations

Proceedings Article•DOI•

Scalability of Parallel Algorithms for Matrix Multiplication

[...]

Anshul Gupta¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

16 Aug 1993

TL;DR: This paper analyzes the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predicts the conditions under which each formulation is better than the others.

...read moreread less

Abstract: A number of parallel formulations of dense matrix multiplication algorithm have been developed For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed to be superior than the others In this paper we analyze the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predict the conditions under which each formulation is better than the others

...read moreread less

66 citations

Journal Article•DOI•

Performance properties of large scale parallel systems

[...]

Anshul Gupta¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

01 Nov 1993-Journal of Parallel and Distributed Computing

TL;DR: The impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimization of the parallel execution time is studied.

...read moreread less

66 citations

Proceedings Article•DOI•

Efficient parallel mappings of a dynamic programming algorithm: a summary of results

[...]

George Karypis¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

13 Apr 1993

TL;DR: The authors are concerned with dynamic programming (DP) algorithms whose solution is given by a recurrence relation similar to that for the matrix parenthesization problem, and present three different mappings of this systolic algorithm on a mesh connected parallel computer.

...read moreread less

Abstract: The authors are concerned with dynamic programming (DP) algorithms whose solution is given by a recurrence relation similar to that for the matrix parenthesization problem. Guibas, Kung and Thompson (1979), presented a systolic array algorithm for this problem that uses O(n/sup 2/) processing cells and solves the problem in O(n) time. The authors present three different mappings of this systolic algorithm on a mesh connected parallel computer. The first two mappings use commonly known techniques for mapping systolic arrays to mesh computers. Both of them are able to obtain only a fraction of maximum possible performance. The primary reason for the poor performance of these formulations is that different nodes at different levels in the multistage graph in the DP formulation require different amounts of computation. Any adaptation has to take this into consideration and evenly distribute the work among the processors. The third mapping balances the work load among processors and thus is capable of providing efficiency approximately equal to 1 (i.e., speedup approximately equal to the number of processors) for any number of processors and sufficiently large problem. They experimentally evaluate these mappings on a mesh embedded onto a 256 processor nCUBE/2. >

...read moreread less

14 citations

Proceedings Article•DOI•

Analyzing performance of large scale parallel systems

[...]

Alok Gupta¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

05 Jan 1993

TL;DR: The authors study the impact of parallel processing overhead and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time and evaluate a more general criterion of optimality.

...read moreread less

Abstract: The authors study the impact of parallel processing overhead and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. They evaluate a more general criterion of optimality and show how operating at the optimal point is equivalent to operating at a unique value of efficiency, which is a characteristic of the criterion of optimality and the properties of the parallel system under study. The technical results derived are put in perspective with similar results that have appeared in the literature. It is shown that this study generalizes and/or extends these earlier results. >

...read moreread less

8 citations

Design and Analysis of Scalable Parallel Algorithms

[...]

Vipin Kumar

15 Nov 1993

TL;DR: The objective of this research is to develop efficient parallel algorithms for a variety of problems and to analyze the scalability of new and existing parallel algorithms.

...read moreread less

Abstract: : The objective of this research is to develop efficient parallel algorithms for a variety of problems and to analyze the scalability of new and existing parallel algorithms. Scalability analysis is an important tool used for predicting the performance of an algorithm-architecture combination when one or more of the hardware related parameters (interconnection network, speed of processors, speed of communication channels, number of processors) are changed. The problems studied as a part of this project come from diverse domains such as solution of differential equations, discrete optimization, neural network based learning, sorting and graph algorithms. In particular, we have studied parallel algorithms for solving linear systems using the preconditioned conjugate gradient method, partitioning of finite element meshes, balancing load in unstructured tree search arising in discrete optimization, the backpropagation neural network learning algorithm, dynamic programming, fast fourier transform, sorting, shortest-path computation for graphs, robot motion planning, and matrix multiplication. Parallel algorithms, Scalability analysis, Isoefficiency.

...read moreread less

4 citations

Unstructured Tree Search on SIMD Parallel Computers: Experimental Results *

[...]

George Karypis, Vipin Kumar

01 Jan 1993

TL;DR: The analysis and experiments show that the new load balancing methods presented are highly scalable on SIMD architectures, and their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures.

...read moreread less

Abstract: In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mechanism, which determines when the search space redistribution must occur to balance search space over processors; and (ii) a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms, and verify the results experimentally. The analysis and experiments show that our new load balancing methods are highly scalable on SIMD architectures. Their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures. We verify our theoretical results by implementing the 15-puzzle problem on a CM-21 SIMD parallel computer.

...read moreread less