Showing papers on "Parallel algorithm published in 1995"

PDF

Open Access

Journal Article•DOI•

Fast parallel algorithms for short-range molecular dynamics

[...]

01 Mar 1995-Journal of Computational Physics

TL;DR: In this article, three parallel algorithms for classical molecular dynamics are presented, which can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors.

...read moreread less

32,670 citations

Journal Article•DOI•

Efficient algorithms for globally optimal trajectories

[...]

John N. Tsitsiklis¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Sep 1995-IEEE Transactions on Automatic Control

TL;DR: Two serial and parallel algorithms for solving a system of equations that arises from the discretization of the Hamilton-Jacobi equation associated to a trajectory optimization problem of the following type are presented.

...read moreread less

Abstract: We present serial and parallel algorithms for solving a system of equations that arises from the discretization of the Hamilton-Jacobi equation associated to a trajectory optimization problem of the following type. A vehicle starts at a prespecified point x/sub o/ and follows a unit speed trajectory x(t) inside a region in /spl Rscr//sup m/ until an unspecified time T that the region is exited. A trajectory minimizing a cost function of the form /spl int//sub 0//sup T/ r(x(t))dt+q(x(T)) is sought. The discretized Hamilton-Jacobi equation corresponding to this problem is usually solved using iterative methods. Nevertheless, assuming that the function r is positive, we are able to exploit the problem structure and develop one-pass algorithms for the discretized problem. The first algorithm resembles Dijkstra's shortest path algorithm and runs in time O(n log n), where n is the number of grid points. The second algorithm uses a somewhat different discretization and borrows some ideas from a variation of Dial's shortest path algorithm (1969) that we develop here; it runs in time O(n), which is the best possible, under some fairly mild assumptions. Finally, we show that the latter algorithm can be efficiently parallelized: for two-dimensional problems and with p processors, its running time becomes O(n/p), provided that p=O(/spl radic/n/log n). >

...read moreread less

816 citations

Journal Article•DOI•

Parallel algorithms for hierarchical clustering

[...]

Clark F. Olson¹•Institutions (1)

Cornell University¹

01 Aug 1995

TL;DR: This paper discusses parallel algorithms to perform hierarchical clustering using various distance metrics, and a general algorithm is given that can be used to perform clustering with the complete link and average link metrics on a butterfly.

...read moreread less

Abstract: Hierarchical clustering is common method used to determine clusters of similar data points in multi-dimensional spaces. $O(n^2)$ algorithms, where $n$ is the number of points to cluster, have long been known for this problem. This paper discusses parallel algorithms to perform hierarchical clustering using various distance metrics. I describe $O(n)$ time algorithms for clustering using the single link, average link, complete link, centroid, median, and minimum variance metrics on an $n$ node CRCW PRAM and $O(n \log n)$ algorithms for these metrics (except average link and complete link) on $\frac{n}{\log n}$ node butterfly networks or trees. Thus, optimal efficiency is achieved for a significant number of processors using these distance metrics. A general algorithm is given that can be used to perform clustering with the complete link and average link metrics on a butterfly. While this algorithm achieves optimal efficiency for the general class of metrics, it is not optimal for the specific cases of complete link and average link clustering.

...read moreread less

429 citations

Journal Article•DOI•

Parallel recombinative simulated annealing: a genetic algorithm

[...]

Samir W. Mahfoud¹, David E. Goldberg¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

11 Jan 1995

TL;DR: The algorithm is implemented on the CM-5 and is run repeatedly on two deceptive problems to demonstrate the added implicit parallelism and faster convergence which can result from larger population sizes.

...read moreread less

Abstract: This paper introduces and analyzes a parallel method of simulated annealing. Borrowing from genetic algorithms, an effective combination of simulated annealing and genetic algorithms, called parallel recombinative simulated annealing, is developed. This new algorithm strives to retain the desirable asymptotic convergence properties of simulated annealing, while adding the populations approach and recombinative power of genetic algorithms. The algorithm iterates a population of solutions rather than a single solution, employing a binary recombination operator as well as a unary neighborhood operator. Proofs of global convergence are given for two variations of the algorithm. Convergence behavior is examined, and empirical distributions are compared to Boltzmann distributions. Parallel recombinative simulated annealing is amenable to straightforward implementation on SIMD, MIMD, or shared-memory machines. The algorithm, implemented on the CM-5, is run repeatedly on two deceptive problems to demonstrate the added implicit parallelism and faster convergence which can result from larger population sizes.

...read moreread less

326 citations

Book•

Measuring parallel processor performance

[...]

Alan H. Karp, Horace P. Flatt

01 Jan 1995

TL;DR: In this article, the authors introduce a new metric for measuring the performance of a parallel algorithm running on a parallel processor that has some advantages over the others, such as its use is illustrated with data from the Linpack benchmark report and the winners of the Gordon Bell Award.

...read moreread less

Abstract: Many metrics are used for measuring the performance of a parallel algorithm running on a parallel processor. This article introduces a new metric that has some advantages over the others. Its use is illustrated with data from the Linpack benchmark report and the winners of the Gordon Bell Award.

...read moreread less

267 citations

Journal Article•DOI•

Robust computation of optical flow in a multi-scale differential framework

[...]

Joseph Weber¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1995-International Journal of Computer Vision

TL;DR: A new algorithm for computing optical flow in a differential framework based on a robust version of total least squares is developed, incorporating only past time frames.

...read moreread less

Abstract: We have developed a new algorithm for computing optical flow in a differential framework. The image sequence is first convolved with a set of linear, separable spatiotemporal filter kernels similar to those that have been used in other early vision problems such as texture and stereopsis. The brightness constancy constraint can then be applied to each of the resulting images, giving us, in general, an overdetermined system of equations for the optical flow at each pixel. There are three principal sources of error: (a) stochastic error due to sensor noise (b) systematic errors in the presence of large displacements and (c) errors due to failure of the brightness constancy model. Our analysis of these errors leads us to develop an algorithm based on a robust version of total least squares. Each optical flow vector computed has an associated reliability measure which can be used in subsequent processing. The performance of the algorithm on the data set used by Barron et al. (IJCV 1994) compares favorably with other techniques. In addition to being separable, the filters used are also causal, incorporating only past time frames. The algorithm is fully parallel and has been implemented on a multiple processor machine.

...read moreread less

264 citations

Journal Article•DOI•

Efficient realizations of the discrete and continuous wavelet transforms: from single chip implementations to mappings on SIMD array computers

[...]

Chaitali Chakrabarti¹, M. Vishwanath²•Institutions (2)

Arizona State University¹, PARC²

01 Mar 1995-IEEE Transactions on Signal Processing

TL;DR: The proposed systolic array and the parallel filter architectures implement these on-line algorithms and are optimal both with respect to area and time (under the word-serial model).

...read moreread less

Abstract: This paper presents a wide range of algorithms and architectures for computing the 1D and 2D discrete wavelet transform (DWT) and the 1D and 2D continuous wavelet transform (CWT). The algorithms and architectures presented are independent of the size and nature of the wavelet function. New on-line algorithms are proposed for the DWT and the CWT that require significantly small storage. The proposed systolic array and the parallel filter architectures implement these on-line algorithms and are optimal both with respect to area and time (under the word-serial model). Moreover, these architectures are very regular and support single chip implementations in VLSI. The proposed SIMD architectures implement the existing pyramid and a'trous algorithms and are optimal with respect to time. >

...read moreread less

244 citations

Journal Article•DOI•

Real-time corner detection algorithm for motion estimation

[...]

Han Wang¹, Michael Brady²•Institutions (2)

Nanyang Technological University¹, University of Oxford²

01 Nov 1995-Image and Vision Computing

TL;DR: A subpixel addressing mechanism (called linear interpolation) is utilized for intermediate pixel addressing in the differentiation step, which results in improved accuracy of corner localization and reduced computational complexity.

...read moreread less

207 citations

Proceedings Article•DOI•

Sorting in linear time

[...]

Arne Andersson¹, Torben Hagerup², Stefan Nilsson¹, Rajeev Raman³•Institutions (3)

Lund University¹, Max Planck Society², King's College London³

29 May 1995

TL;DR: In this paper, it was shown that a unit-cost RAM with a word length of bits can sort integers in the range in time, for arbitrary!, a significant improvement over the bound of " # $ achieved by the fusion trees of Fredman and Willard, provided that % &'( *),+., for some fixed /102, the sorting can even be accomplished in linear expected time with a randomized algorithm.

...read moreread less

Abstract: We show that a unit-cost RAM with a word length of bits can sort integers in the range in time, for arbitrary ! , a significant improvement over the bound of " # $ achieved by the fusion trees of Fredman and Willard. Provided that % & ' ( *),+., for some fixed /102 , the sorting can even be accomplished in linear expected time with a randomized algorithm. Both of our algorithms parallelize without loss on a unitcost PRAM with a word length of bits. The first one yields an algorithm that uses 3 4 5 $ time and 6 ( operations on a deterministic CRCW PRAM. The second one yields an algorithm that uses ' 5 7 expected time and " expected operations on a randomized EREW PRAM, provided that 8 ' 5 7 *),+.for some fixed /90: . Our deterministic and randomized sequential and parallel algorithms generalize to the lexicographic sorting problem of sorting multiple-precision integers represented in several words.

...read moreread less

194 citations

Book•

Parallel algorithms and architectures

[...]

Michael Cosnard, Denis Trystram

01 Jan 1995

TL;DR: A thorough introduction to this technology, explaining the fundamentals of parallelism in a logical and readable way, developing the algorithms of vector processors, shared-memory parallel machines and distributed-memory machines emphasising the link between architectures, models and algorithms.

...read moreread less

Abstract: From the Publisher: Developments in parallel computing in recent years have made it possible to build multi-processor architectures that enable efficiency and speed in today's computing environment. Parallel Algorithms and Architectures provides a thorough introduction to this technology, explaining the fundamentals of parallelism in a logical and readable way. Progressing from theory to implementation, the text develops the algorithms of vector processors, shared-memory parallel machines and distributed-memory machines emphasising the link between architectures, models and algorithms. In addition, the book addresses a number of issues that are of great practical importance to people developing parallel programs, including coverage of LINPACK and BLAS, vectorisation, task placement and scheduling. Parallel Algorithms and Architectures is ideal for both computer science students and people in industry who require an understanding of parallelism.

...read moreread less

194 citations

Proceedings Article•DOI•

Randomized rounding without solving the linear program

[...]

Neal E. Young¹•Institutions (1)

Bell Labs¹

22 Jan 1995

TL;DR: A new technique called oblivious rounding is introduced a variant of randomized rounding that avoids the bottleneck of first solving the linear program, which yields more efficient algorithms and brings probabilistic methods to bear on a new class of problems.

...read moreread less

Abstract: We introduce a new technique called oblivious rounding a variant of randomized rounding that avoids the bottleneck of first solving the linear program. Avoiding this bottleneck yields more efficient algorithms and brings probabilistic methods to bear on a new class of problems. We give oblivious rounding algorithms that approximately solve general packing and covering problems, including a parallel algorithm to find sparse strategies for matrix games.

...read moreread less

Journal Article•DOI•

Distributed Genetic Algorithm for Structural Optimization

[...]

Hojjat Adeli, Sanjay Kumar

01 Jul 1995-Journal of Aerospace Engineering

TL;DR: This paper presents a distributed genetic algorithm for optimization of large structures on a cluster of workstations connected via a local area network (LAN) based on its adaptability to a high degree of parallelism.

...read moreread less

Abstract: Parallel algorithms for optimization of structures reported in the literature have been restricted to shared-memory multiprocessors. This paper presents a distributed genetic algorithm for optimization of large structures on a cluster of workstations connected via a local area network (LAN). The selection of genetic algorithm is based on its adaptability to a high degree of parallelism. Two different approaches are used to transform the constrained structural optimization problem to an unconstrained optimization problem: a penalty-function method and augmented Lagrangian approach. For the solution of the resulting simultaneous linear equations the iterative preconditioned conjugate gradient (PCG) method is used because of its low memory requirement. A dynamic load-balancing mechanism is developed to account for the unpredictable multiuser, multasking environment of a networked cluster of workstations, heterogeneity of machines, and indeterminate nature of the interative PCG equation solver. The algorithm ...

...read moreread less

Journal Article•DOI•

Matching fluid and structure meshes for aeroelastic computations : a parallel approach

[...]

Nathan Maman¹, Charbel Farhat¹•Institutions (1)

University of Colorado Boulder¹

17 Feb 1995-Computers & Structures

TL;DR: The key issues addressed by Matcher are described and the underlying parallel algorithm, which generates the data structures needed for handling arbitrary and non-conforming fluid/structure interfaces in aeroelastic computations, is described.

...read moreread less

NESL: A Nested Data-Parallel Language. (Version 3.1),

[...]

Guy E. Blelloch

19 Sep 1995

TL;DR: The NESL as mentioned in this paper is a strongly-typed, applicative, data-parallel language for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms.

...read moreread less

Abstract: : This report describes NESL, a strongly-typed, applicative, data-parallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of data-parallel constructs based on sequences, including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences.

...read moreread less

Journal Article•DOI•

A fast Hough transform for segment detection

[...]

Nicolás Guil, Julio Villalba, Emilio L. Zapata

01 Nov 1995-IEEE Transactions on Image Processing

TL;DR: A new algorithm for the fast Hough transform (FHT) is described that satisfactorily solves the problems other fast algorithms propose in the literature-erroneous solutions, point redundance, scaling, and detection of straight lines of different sizes-and needs less storage space.

...read moreread less

Abstract: The authors describe a new algorithm for the fast Hough transform (FHT) that satisfactorily solves the problems other fast algorithms propose in the literature-erroneous solutions, point redundance, scaling, and detection of straight lines of different sizes-and needs less storage space. By using the information generated by the algorithm for the detection of straight lines, they manage to detect the segments of the image without appreciable computational overhead. They also discuss the performance and the parallelization of the algorithm and show its efficiency with some examples. >

...read moreread less

Fast sequential and parallel algorithms for association rule mining: a comparison

[...]

Andreas Mueller¹•Institutions (1)

University of Maryland, College Park¹

01 Aug 1995

TL;DR: The performance results show that, while both algorithms parallelize easily and obtain good speedup and scale-up results, the parallel SEAR version performs better than parallel SPEAR, despite the fact that it uses more communication.

...read moreread less

Abstract: The eld of knowledge discovery in databases, or \Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One speciic data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing eeorts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TID-list data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one database portion at a time in memory. We implemented an algorithm called SPTID that incorporates both TID-lists and partitioning to study their beneets. For comparison, a non-partitioning algorithm called SEAR, which is based on a new preex-tree data structure, is used. Our experiments with SPTID and SEAR indicate that TID-lists have inherent ineeciencies; furthermore, because all of the algorithms tested tend to be CPU-bound, trading CPU-overhead against I/O operations by partitioning did not lead to better performance. In order to scale mining algorithms to the huge databases (e.g., multiple Terabytes) that large organizations will manage in the near future, we implemented parallel versions of SEAR and SPEAR (its partitioned counterpart). The performance results show that, while both algorithms parallelize easily and obtain good speedup and scale-up results, the parallel SEAR version performs better than parallel SPEAR, despite the fact that it uses more communication.

...read moreread less

Proceedings Article•DOI•

Optimized priority assignment for tasks and messages in distributed hard real-time systems

[...]

José Javier Gutiérrez García, Michael González Harbour

25 Apr 1995

TL;DR: A new heuristic algorithm for optimizing the assignment of priorities to tasks and messages in distributed hard realtime systems that executes two orders of magnitude faster than simulated annealing, finds better solutions, and finds solutions in cases where the latter method fails.

...read moreread less

Abstract: Recent advances in the analysis of distributed realtime systems have made it possible to predict if hard realtime requirements will be met. However, it is still difficult to find a feasible priority assignment when the utilization levels of the CPUs and communication networks are pushed near to their limits. This paper presents a new heuristic algorithm for optimizing the assignment of priorities to tasks and messages in distributed hard realtime systems. The algorithm is based on the knowledge of the parameters that influence the worst-case response time of a distributed application. This algorithm is compared to simulated annealing, which is a general optimization technique for discrete functions that had been previously used for solving similar problems. On average, our heuristic algorithm executes two orders of magnitude faster than simulated annealing, finds better solutions, and finds solutions in cases where the latter method fails. >

...read moreread less

Report•DOI•

An efficient parallel algorithm for mesh smoothing

[...]

Lori A. Freitag, Paul E. Plassmann, Mark T. Jones

31 Dec 1995

TL;DR: A new parallel algorithm for mesh smoothing that has a fast parallel runtime both in theory and in practice and experimental results obtained on the IBM SP system demonstrating the efficiency of this approach are presented.

...read moreread less

Abstract: Automatic mesh generation and adaptive refinement methods have proven to be very successful tools for the efficient solution of complex finite element applications. A problem with these methods is that they can produce poorly shaped elements; such elements are undesirable because they introduce numerical difficulties in the solution process. However, the shape of the elements can be improved through the determination of new geometric locations for mesh vertices by using a mesh smoothing algorithm. In this paper the authors present a new parallel algorithm for mesh smoothing that has a fast parallel runtime both in theory and in practice. The authors present an efficient implementation of the algorithm that uses non-smooth optimization techniques to find the new location of each vertex. Finally, they present experimental results obtained on the IBM SP system demonstrating the efficiency of this approach.

...read moreread less

Journal Article•DOI•

An algorithm for exact bounds on the time separation of events in concurrent systems

[...]

Henrik Hulgaard¹, Steven M. Burns¹, T. Amon¹, G. Borriello²•Institutions (2)

University of Washington¹, Texas State University²

01 Nov 1995-IEEE Transactions on Computers

TL;DR: An efficient algorithm to find exact (tight) bounds on the separation time of events in an arbitrary process graph without conditional behavior is presented, which will form a basis for exploration of timing-constrained synthesis techniques.

...read moreread less

Abstract: Determining the time separation of events is a fundamental problem in the analysis, synthesis, and optimization of concurrent systems. Applications range from logic optimization of asynchronous digital circuits to evaluation of execution times of programs for real-time systems. We present an efficient algorithm to find exact (tight) bounds on the separation time of events in an arbitrary process graph without conditional behavior. This result is more general than the methods presented in several previously published papers as it handles cyclic graphs and yields the tightest possible bounds on event separations. The algorithm is based on a functional decomposition technique that permits the implicit evaluation of an infinitely unfolded process graph. Examples are presented that demonstrate the utility and efficiency of the solution. The algorithm will form a basis for exploration of timing-constrained synthesis techniques. >

...read moreread less

Proceedings Article•DOI•

Parallel computation of sequential pixel updates in statistical tomographic reconstruction

[...]

Ken D. Sauer¹, Sean Borman¹, Charles A. Bouman•Institutions (1)

University of Notre Dame¹

23 Oct 1995

TL;DR: It is shown that for degrees of parallelism of typical practical interest, the Gauss-Seidel iterations updates may be computed in parallel with little loss in convergence speed.

...read moreread less

Abstract: While Bayesian methods can significantly improve the quality of tomographic reconstructions, they require the solution of large iterative optimization problems. Recent results indicate that the convergence of these optimization problems can be improved by using sequential pixel updates, or Gauss-Seidel iterations. However, Gauss-Seidel iterations may be perceived as less useful when parallel computing architectures are use. We show that for degrees of parallelism of typical practical interest, the Gauss-Seidel iterations updates may be computed in parallel with little loss in convergence speed. In this case, the theoretical speed up of parallel implementations is nearly linear with the number of processors.

...read moreread less

Journal Article•DOI•

Analysis of Coppersmith's block Wiedemann algorithm for the parallel solution of sparse linear systems

[...]

Erich Kaltofen

01 Apr 1995-Mathematics of Computation

TL;DR: It is proved that by use of certain randomizations on the input system the parallel speed up is roughly by the number of vectors in the blocks when using as many processors.

...read moreread less

Abstract: By using projections by a block of vectors in place of a single vector it is possible to parallelize the outer loop of iterative methods for solving sparse linear systems. We analyze such a scheme proposed by Coppersmith for Wiedemann's coordinate recurrence algorithm, which is based in part on the Krylov subspace approach. We prove that by use of certain randomizations on the input system the parallel speed up is roughly by the number of vectors in the blocks when using as many processors. Our analysis is valid for fields of entries that have sufficiently large cardinality. Our analysis also deals with an arising subproblem of solving a singular block Toeplitz system by use of the theory of Toeplitz-like matrices

...read moreread less

Journal Article•DOI•

Guillotineable bin packing: A genetic approach

[...]

Berthold Kröger¹•Institutions (1)

University of Osnabrück¹

03 Aug 1995-European Journal of Operational Research

TL;DR: The presented algorithm is able to generate almost optimal packing schemes and even in its sequential version the algorithm empirically is proven to be superior to different approaches like random search or simulated annealing.

...read moreread less

Proceedings Article•DOI•

Provably efficient scheduling for languages with fine-grained parallelism

[...]

Guy E. Blelloch¹, Phillip B. Gibbons², Yossi Matias²•Institutions (2)

Carnegie Mellon University¹, Bell Labs²

20 Jul 1995

TL;DR: The paper identifies a class of parallel schedules that are provably efficient in both time and space and describes a scheduler for implementing high-level languages with nested parallel- ism, that generates schedules in this class.

...read moreread less

Abstract: Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any computation with w units of work and critical path length d, and for any sequential schedule that takes space s1, we provide a parallel schedule that takes fewer than w/p 1 d steps on p processors and requires less than s 1 1 p z d space. This matches the lower bound that we show, and significantly improves upon the best previous bound of s1 z p space for the common case where d , , s 1. The paper then describes a scheduler for implementing high-level languages with nested parallel- ism, that generates schedules in this class. During program execution, as the structure of the computation is revealed, the scheduler keeps track of the active tasks, allocates the tasks to the processors, and performs the necessary task synchronization. The scheduler is itself a parallel algorithm, and incurs at most a constant factor overhead in time and space, even when the scheduling granularity is individual units of work. The algorithm is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.

...read moreread less

Journal Article•DOI•

Adaptive interacting multiple model algorithm for tracking a manoeuvring target

[...]

A. Munir¹, D.P. Atherton¹•Institutions (1)

University of Brighton¹

01 Feb 1995

TL;DR: An adaptive interacting multiple-model algorithm (AIMM) for use in manoeuvring target tracking that does not need predefined models and can be implemented on parallel machines.

...read moreread less

Abstract: The paper describes an adaptive interacting multiple-model algorithm (AIMM) for use in manoeuvring target tracking. The algorithm does not need predefined models. A two-stage Kalman estimator is used to estimate the acceleration of the target. This acceleration value is then fed to the subfilters in an interacting multiple-model (IMM) algorithm, where the subfilters have different acceleration parameters. Results compare the performance of the AIMM algorithm with the IMM algorithm, using simulations of different manoeuvring-target scenarios. Also considered are the relative computational requirements, and the ease with which the algorithms can be implemented on parallel machines.

...read moreread less

Journal Article•DOI•

A parallel thinning algorithm for medial surfaces

[...]

Gilles Bertrand¹•Institutions (1)

ESIEE¹

01 Sep 1995-Pattern Recognition Letters

TL;DR: A new 3D parallel thinning algorithm for medial surfaces that works in cubic grids with the 6-connectivity is proposed, based on a precise definition of end points which are points belonging to surfaces or curves.

...read moreread less

Book•

Isoefficiency: measuring the scalability of parallel algorithms and architectures

[...]

Ananth Grama, Anshul Gupta, Vipin Kumar

01 Jan 1995

TL;DR: Isoefficiency analysis helps us determine the best algorithm/architecture combination for a particular problem without explicitly analyzing all possible combinations under all possible conditions as mentioned in this paper, which is the case here.

...read moreread less

Abstract: Isoefficiency analysis helps us determine the best algorithm/architecture combination for a particular problem without explicitly analyzing all possible combinations under all possible conditions. >

...read moreread less

Journal Article•DOI•

Design and performance of a scalable parallel community climate model

[...]

John B. Drake¹, Ian Foster², John Michalakes², Brian Toonen², Patrick H. Worley¹ - Show less +1 more•Institutions (2)

Oak Ridge National Laboratory¹, Argonne National Laboratory²

01 Oct 1995

TL;DR: This parallel model is functionally equivalent to the National Center for Atmospheric Research's Community Climate Model, CCM2, but is structured to exploit distributed memory multi-computers and incorporates parallel spectral transform, semi-Lagrangian transport, and load balancing algorithms.

...read moreread less

Abstract: We describe the design of a parallel global atmospheric circulation model, PCCM2 This parallel model is functionally equivalent to the National Center for Atmospheric Research's Community Climate Model, CCM2, but is structured to exploit distributed memory multi-computers PCCM2 incorporates parallel spectral transform, semi-Lagrangian transport, and load balancing algorithms We present detailed performance results on the IBM SP2 and Intel Paragon These results provide insights into the scalability of the individual parallel algorithms and of the parallel model as a whole

...read moreread less

Proceedings Article•DOI•

Bulk synchronous parallel computing-a paradigm for transportable software

[...]

Thomas E. Cheatham¹, Amr Fahmy¹, Dan C. Stefanescu¹, Leslie G. Valiant¹•Institutions (1)

Harvard University¹

04 Jan 1995

TL;DR: The role of unbundled compiler technology in facilitating the development of such a parallel computer environment is described, based on the Bulk Synchronous Parallel Model, which is a general purpose parallel computing environment for developing transportable algorithms.

...read moreread less

Abstract: A necessary condition for the establishment, on a substantial basis, of a parallel software industry would appear to be the availability of technology for generating transportable software, i.e. architecture independent software which delivers scalable performance for a wide variety of applications on a wide range of multiprocessor computers. This paper describes H-BSP-a general purpose parallel computing environment for developing transportable algorithms. H-BSP is based on the Bulk Synchronous Parallel Model (BSP), in which a computation involves a number of supersteps, each having several parallel computational threads that synchronize at the end of the superstep. The BSP Model deals explicitly with the notion of communication among computational threads and introduces parameters g and L that quantify the ratio of communication throughput to computation throughput, and the synchronization period, respectively. These two parameters, together with the number of processors and the problem size, are used to quantify the performance and, therefore, the transportability of given classes of algorithms across machines having different values for these parameters. This paper describes the role of unbundled compiler technology in facilitating the development of such a parallel computer environment. >

...read moreread less

Journal Article•DOI•

A system for the 3D reconstruction of retracted-septa PET data using the EM algorithm

[...]

Calvin A. Johnson, Yuchen Yan, Richard E. Carson, Robert L. Martino, M.E. Daube-Witherspoon - Show less +1 more

01 Aug 1995

TL;DR: The EM reconstruction algorithm for volume acquisition from current generation retracted-septa PET scanners is implemented, and extensive use of EM system matrix (C/sub ij/) symmetries reduces the storage cost by a factor of 188.

...read moreread less

Abstract: We have implemented the EM reconstruction algorithm for volume acquisition from current generation retracted-septa PET scanners. Although the software was designed for a GE Advance scanner, it is easily adaptable to other 3D scanners. The reconstruction software was written for an Intel iPSC/860 parallel computer with 128 compute nodes. Running on 32 processors, the algorithm requires approximately 55 minutes per iteration to reconstruct a 128/spl times/128/spl times/35 image. No projection data compression schemes or other approximations were used in the implementation. Extensive use of EM system matrix (C/sub ij/) symmetries (including the 8-fold in-plane symmetries, 2-fold axial symmetries, and axial parallel line redundancies) reduces the storage cost by a factor of 188. The parallel algorithm operates on distributed projection data which are decomposed by base-symmetry angles. Symmetry operators copy and index the C/sub ij/ chord to the form required for the particular symmetry. The use of asynchronous reads, lookup tables, and optimized image indexing improves computational performance. >

...read moreread less

Journal Article•DOI•

The REFINE multiprocessor—theoretical properties and algorithms

[...]

Suchendra M. Bhandarkar¹, Hamid R. Arabnia¹•Institutions (1)

University of Georgia¹

01 Nov 1995

TL;DR: The REFINE multiprocessor is shown to offer a cost-effective alternative to the Boolean n-cube multiprocessionor architecture without substantial loss in performance.

...read moreread less

Abstract: A reconfigurable interconnection network based on a multi-ring architecture called REFINE is described. REFINE embeds a single 1-factor of the Boolean hypercube in any given configuration. The mathematical properties of the REFINE topology and the hardware for the reconfiguration switch are described. The REFINE topology is scalable in the sense that the number of interprocessor communication links scales linearly with network size whereas the network diameter scales logarithmically with network size. Primitive parallel operations on the REFINE topology are described and analyzed. These primitive operations could be used as building blocks for more complex parallel algorithms. A large class of algorithms for the Boolean n-cube which includes the FFT and the Batcher's bitonic sort is shown to map efficiently on the REFINE topology. The REFINE multiprocessor is shown to offer a cost-effective alternative to the Boolean n-cube multiprocessor architecture without substantial loss in performance.

...read moreread less

Collapse