scispace - formally typeset
Search or ask a question

Showing papers on "Parallel algorithm published in 1999"


Book ChapterDOI
15 Aug 1999
TL;DR: To cluster increasingly massive data sets that are common today in data and text mining, a parallel implementation of the k-means clustering algorithm based on the message passing model is proposed and analytically shows that the speedup and the scaleup of the algorithm approach the optimal as the number of data points increases.
Abstract: To cluster increasingly massive data sets that are common today in data and text mining, we propose a parallel implementation of the k-means clustering algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the kmeans algorithm. We analytically show that the speedup and the scaleup of our algorithm approach the optimal as the number of data points increases. We implemented our algorithm on an IBM POWERparallel SP2 with a maximum of 16 nodes. On typical test data sets, we observe nearly linear relative speedups, for example, 15.62 on 16 nodes, and essentially linear scaleup in the size of the data set and in the number of clusters desired. For a 2 gigabyte test data set, our implementation drives the 16 node SP2 at more than 1.8 gigaflops.

450 citations


Journal ArticleDOI
TL;DR: A key feature of this parallel formulation is that it is able to achieve a high degree of concurrency while maintaining the high quality of the partitions produced by the serial multilevel k-way graph partitioning algorithm.
Abstract: In this paper we present a parallel formulation of a multilevel k-way graph partitioning algorithm. A key feature of this parallel formulation is that it is able to achieve a high degree of concurrency while maintaining the high quality of the partitions produced by the serial multilevel k-way partitioning algorithm. In particular, the time taken by our parallel graph partitioning algorithm is only slightly longer than the time taken for re-arrangement of the graph among processors according to the new partition. Experiments with a variety of finite element graphs show that our parallel formulation produces high-quality partitionings in a short amount of time. For example, a 128-way partitioning of graphs with one million vertices can be computed in a little over two seconds on a 128-processor Cray T3D. Furthermore, the quality of the partitions produced is comparable (edge-cuts within 5%) to those produced by the serial multilevel k-way algorithm. Thus our parallel algorithm makes it feasible to perform frequent repartitioning of graphs in dynamic computations without compromising the partitioning quality.

355 citations


Book
01 Jan 1999
TL;DR: Professor Parhami reviews the circuit model and problemdriven parallel machines, variants of mesh architectures, and composite and hierarchical systems, among other subjects.
Abstract: This original text provides comprehensive coverage of parallel algorithms and architectures, beginning with fundamental concepts and continuing through architectural variations and aspects of implementation. Unlike the authors of similar texts, Professor Parhami reviews the circuit model and problemdriven parallel machines, variants of mesh architectures, and composite and hierarchical systems, among other subjects. With its balanced treatment of theory and practical designs, classtested lecture material and problems, and helpful case studies, the book is suited to graduate and upperlevel undergraduate students of advanced architecture or parallel processing.

335 citations


Journal ArticleDOI
TL;DR: The dR*-tree is introduced, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer in the ‘shared-nothing’ architecture with multiple computers interconnected through a network.
Abstract: The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dRa-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.

296 citations


Journal ArticleDOI
TL;DR: An efficient parallel algorithm that overcomes the difficulty of implementing Gaussian elimination with partial pivoting on parallel machines, using a graph reduction technique and a supernode-panel computational kernel for high single processor utilization and scheduling two types of parallel tasks for a high level of concurrency.
Abstract: Although Gaussian elimination with partial pivoting is a robust algorithm to solve unsymmetric sparse linear systems of equations, it is difficult to implement efficiently on parallel machines because of its dynamic and somewhat unpredictable way of generating work and intermediate results at run time. In this paper, we present an efficient parallel algorithm that overcomes this difficulty. The high performance of our algorithm is achieved through (1) using a graph reduction technique and a supernode-panel computational kernel for high single processor utilization, and (2) scheduling two types of parallel tasks for a high level of concurrency. One such task is factoring the independent panels in the disjoint subtrees of the column elimination tree of $A$. Another task is updating a panel by previously computed supernodes. A scheduler assigns tasks to free processors dynamically and facilitates the smooth transition between the two types of parallel tasks. No global synchronization is used in the algorithm. The algorithm is well suited for shared memory machines (SMP) with a modest number of processors. We demonstrate 4- to 7-fold speedups on a range of 8 processor SMPs, and more on larger SMPs. One realistic problem arising from a 3-D flow calculation achieves factorization rates of 1.0, 2.5, 0.8, and 0.8 gigaflops on the 12 processor Power Challenge, 8 processor Cray C90, 16 processor Cray J90, and 8 processor AlphaServer 8400.

265 citations


Proceedings ArticleDOI
31 Aug 1999
TL;DR: The state of the art on PGAs is reviewed and a new taxonomy also including a new form of PGA (the dynamic deme model) which was recently developed is proposed.
Abstract: Genetic algorithms (GAs) are powerful search techniques that are used to solve difficult problems in many disciplines. Unfortunately, they can be very demanding in terms of computation load and memory. Parallel genetic algorithms (PGAs) are parallel implementations of GAs which can provide considerable gains in terms of performance and scalability. PGAs can easily be implemented on networks of heterogeneous computers or on parallel mainframes. We review the state of the art on PGAs and propose a new taxonomy also including a new form of PGA (the dynamic deme model) which was recently developed.

232 citations


Journal ArticleDOI
TL;DR: This paper demonstrates that, for PDE problems, the patterns of powers of sparsified matrices (PSMs) can be used a priori as effective approximate inverse patterns, and that the additional effort of adaptive sparsity pattern calculations may not be required.
Abstract: Parallel algorithms for computing sparse approximations to the inverse of a sparse matrix either use a prescribed sparsity pattern for the approximate inverse or attempt to generate a good pattern as part of the algorithm. This paper demonstrates that, for PDE problems, the patterns of powers of sparsified matrices (PSMs) can be used a priori as effective approximate inverse patterns, and that the additional effort of adaptive sparsity pattern calculations may not be required. PSM patterns are related to various other approximate inverse sparsity patterns through matrix graph theory and heuristics concerning the PDE's Green's function. A parallel implementation shows that PSM-patterned approximate inverses are significantly faster to construct than approximate inverses constructed adaptively, while often giving preconditioners of comparable quality.

203 citations


Journal ArticleDOI
TL;DR: This work builds on the classical greedy sequential set cover algorithm, in the spirit of the primal-dual schema, to obtain simple parallel approximation algorithms for the set cover problem and its generalizations.
Abstract: We build on the classical greedy sequential set cover algorithm, in the spirit of the primal-dual schema, to obtain simple parallel approximation algorithms for the set cover problem and its generalizations. Our algorithms use randomization, and our randomized voting lemmas may be of independent interest. Fast parallel approximation algorithms were known before for set cover, though not for the generalizations considered in this paper.

180 citations


Journal ArticleDOI
TL;DR: A scalable parallel implementation of the self organizing map (SOM) suitable for data-mining applications involving clustering or segmentation against large data sets such as those encountered in the analysis of customer spending patterns is described.
Abstract: We describe a scalable parallel implementation of the self organizing map (SOM) suitable for data-mining applications involving clustering or segmentation against large data sets such as those encountered in the analysis of customer spending patterns. The parallel algorithm is based on the batch SOM formulation in which the neural weights are updated at the end of each pass over the training data. The underlying serial algorithm is enhanced to take advantage of the sparseness often encountered in these data sets. Analysis of a realistic test problem shows that the batch SOM algorithm captures key features observed using the conventional on-line algorithm, with comparable convergence rates. Performance measurements on an SP2 parallel computer are given for two retail data sets and a publicly available set of census data.These results demonstrate essentially linear speedup for the parallel batch SOM algorithm, using both a memory-contained sparse formulation as well as a separate implementation in which the mining data is accessed directly from a parallel file system. We also present visualizations of the census data to illustrate the value of the clustering information obtained via the parallel SOM method.

168 citations


01 Jan 1999
TL;DR: Three parallel algorithms that represent a spectrum of trade-o s between computation, communication, memory usage, synchronization, and the use of problem-speci c information are presented.
Abstract: We consider the problem of mining association rules on a shared-nothing multiprocessor. We present three parallel algorithms that represent a spectrum of trade-o s between computation, communication, memory usage, synchronization, and the use of problem-speci c information. We describe the implementation of these algorithms on IBM POWERparallel SP2, a shared-nothing machine. Performance measurements from this implementation show that the best algorithm, Count Distribution, scales linearly and has excellent speedup and sizeup behavior. The results from this study, besides being of interest in themselves, provide guidance for the design of parallel algorithms for other data mining tasks. Also Department of Computer Science, University of Wisconsin, Madison.

163 citations


Book
01 Dec 1999
TL;DR: Algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks are surveyed.
Abstract: This paper surveys algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks The paper focuses on scheduling techniques that result in mostly sequential data accesses and in data reuse, and on techniques for transforming algorithms that cannot be effectively scheduled The survey covers out-of-core algorithms for solving dense systems of linear equations, for the direct and iterative solution of sparse systems, for computing eigenvalues, for fast Fourier transforms, and for N-body computations The paper also discusses reasonable assumptions on memory size, approaches for the analysis of out-of-core algorithms, and relationships between out-of-core, cache-aware, and parallel algorithms

Journal ArticleDOI
TL;DR: This paper describes two basic parallel formulations of classification decision tree learning algorithm based on induction and proposes a hybrid method that employs the good features of these methods.
Abstract: Classification decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classification decision trees have a natural concurrency, but are difficult to parallelize due to the inherent dynamic nature of the computation. In this paper, we present parallel formulations of classification decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. We also provide the analysis of the cost of computation and communication of the proposed hybrid method. Moreover, experimental results on an IBM SP-2 demonstrate excellent speedups and scalability.

Journal ArticleDOI
TL;DR: A scheduler for implementing high-level languages with nested parallelism, that generates schedules in this class, and is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.
Abstract: Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program.This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any computation with w units of work and critical path length d, and for any sequential schedule that takes space s1, we provide a parallel schedule that takes fewer than w/p + d steps on p processors and requires less than s1 + p·d space. This matches the lower bound that we show, and significantly improves upon the best previous bound of s1·p spaces for the common case where d«s1.The paper then describes a scheduler for implementing high-level languages with nested parallelism, that generates schedules in this class. During program execution, as the structure of the computation is revealed, the scheduler keeps track of the active tasks, allocates the tasks to the processors, and performs the necessary task synchronization. The scheduler is itself a parallel algorithm, and incurs at most a constant factor overhead in time and space, even when the scheduling granularity is individual units of work. The algorithm is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.

Journal ArticleDOI
TL;DR: A more accurate version of the algorithm is presented and the results of some numerical accuracy tests that compare both versions with the standard articulated body algorithm are presented.
Abstract: This paper is the second in a two part series describing a recursive, divide and conquer algorithm for calculating the forward dynamics of a robot mechanism, or a general rigid body system, on a pa...

Proceedings ArticleDOI
23 Mar 1999
TL;DR: Presents parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems and shows that the construction of a decision-Tree classifier can be effectively parallelized on an SMP machine with good speedup.
Abstract: Presents parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pipelining and dynamic load balancing to yield faster implementations. The task-parallel approach uses dynamic subtree partitioning among processors. Our performance evaluation shows that the construction of a decision-tree classifier can be effectively parallelized on an SMP machine with good speedup.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: It is shown that significant, scalable speed-ups can be obtained with relatively little effort on the part of the developer, and potential difficulties that might be faced in other efforts to parallelize sequential motion planning methods are identified.
Abstract: In this paper we report on our experience in parallelizing probabilistic roadmap motion planning methods (PRMs). We show that significant, scalable speed-ups can be obtained with relatively little effort on the part of the developer. Our experience is not limited to PRMs. In particular, we outline general techniques for parallelizing types of computations commonly performed in motion planning algorithms, and identify potential difficulties that might be faced in other efforts to parallelize sequential motion planning methods.

Journal ArticleDOI
TL;DR: A new parallel tabu search heuristic for the vehicle routing problem with time window constraints (VRPTW) is described, based on simple customer shifts and allows us to consider infeasible interim‐solutions.
Abstract: In this paper, we describe a new parallel tabu search heuristic for the vehicle routingproblem with time window constraints (VRPTW). The neighborhood structure we proposeis based on simple customer shifts and allows us to consider infeasible interim‐solutions.Similarly to the column generation approach used in exact algorithms, all routes generatedby the tabu search heuristic are collected in a pool. To obtain a new initial solution forthe tabu search heuristic, a fast set covering heuristic is periodically applied to the routes inthe pool. The parallel heuristic has been implemented on a Multiple‐Instruction Multiple‐Datacomputer architecture with eight nodes. Computational results for Solomon's benchmarkproblems demonstrate that our parallel heuristic can produce high‐quality solutions.

Journal ArticleDOI
TL;DR: The design and implementation of a practical parallel algorithm for Delaunay triangulation that works well on general distributions, achieves significantly better speedups over good sequential code, does not assume a uniform distribution of points, and is widely portable due to its use of MPI as a communication mechanism.
Abstract: This paper describes the design and implementation of a practical parallel algorithm for Delaunay triangulation that works well on general distributions. Although there have been many theoretical parallel algorithms for the problem, and some implementations based on bucketing that work well for uniform distributions, there has been little work on implementations for general distributions. We use the well known reduction of 2D Delaunay triangulation to find the 3D convex hull of points on a paraboloid. Based on this reduction we developed a variant of the Edelsbrunner and Shi 3D convex hull algorithm, specialized for the case when the point set lies on a paraboloid. This simplification reduces the work required by the algorithm (number of operations) from O(n log 2 n) to O(n log n) . The depth (parallel time) is O( log 3 n) on a CREW PRAM. The algorithm is simpler than previous O(n log n) work parallel algorithms leading to smaller constants. Initial experiments using a variety of distributions showed that our parallel algorithm was within a factor of 2 in work from the best sequential algorithm. Based on these promising results, the algorithm was implemented using C and an MPI-based toolkit. Compared with previous work, the resulting implementation achieves significantly better speedups over good sequential code, does not assume a uniform distribution of points, and is widely portable due to its use of MPI as a communication mechanism. Results are presented for the IBM SP2, Cray T3D, SGI Power Challenge, and DEC AlphaCluster.

Journal ArticleDOI
TL;DR: In this article, the authors used Monte Carlo Simulation (MCS) and the weighted integral method to produce efficient numerical handling of stochastic finite element analysis for 2D plane stress/strain problems.

Book
01 Jun 1999
TL;DR: This book progresses from theory to computation, exploring the fundamentals of parallelism and the relationship between parallel programming approaches, algorithms, and architectures.
Abstract: From the Publisher: Parallel processing is a fast-growing technology that dominates many areas of computer science and engineering. This book progresses from theory to computation, exploring the fundamentals of parallelism and the relationship between parallel programming approaches, algorithms, and architectures.. "This book is suitable for advanced undergraduate and first-year graduate students in computer science, as well as researchers in the area.

Journal ArticleDOI
TL;DR: New theories are developed which allow us to devise a parallel algorithm and an efficient elimination algorithm which improve existing algorithms for the computation of toric ideals.

Journal ArticleDOI
TL;DR: This algorithm yields the first scalable, portable, and numerically stable parallel divide and conquer eigensolver and is compared with that of the QR algorithm and of bisection followed by inverse iteration on an IBM SP2 and a cluster of Pentium PIIs.
Abstract: We present a new parallel implementation of a divide and conquer algorithm for computing the spectral decomposition of a symmetric tridiagonal matrix on distributed memory architectures. The implementation we develop differs from other implementations in that we use a two-dimensional block cyclic distribution of the data, we use the Lowner theorem approach to compute orthogonal eigenvectors, and we introduce permutations before the back transformation of each rank-one update in order to make good use of deflation. This algorithm yields the first scalable, portable, and numerically stable parallel divide and conquer eigensolver. Numerical results confirm the effectiveness of our algorithm. We compare performance of the algorithm with that of the QR algorithm and of bisection followed by inverse iteration on an IBM SP2 and a cluster of Pentium PIIs.

Journal ArticleDOI
TL;DR: The state of the art in parallel algorithms used for solving discrete optimization problems, including heuristic and nonheuristic techniques for searching graphs as well as trees, and speed-up anomalies in parallel search that are caused by the inherent speculative nature of search techniques are described.
Abstract: Discrete optimization problems arise in a variety of domains, such as VLSI design, transportation, scheduling and management, and design optimization. Very often, these problems are solved using state space search techniques. Due to the high computational requirements and inherent parallel nature of search techniques, there has been a great deal of interest in the development of parallel search methods since the dawn of parallel computing. Significant advances have been made in the use of powerful heuristics and parallel processing to solve large-scale discrete optimization problems. Problem instances that were considered computationally intractable only a few years ago are routinely solved currently on server-class symmetric multiprocessors and small workstation clusters. Parallel game-playing programs are challenging the best human minds at games like chess. In this paper, we describe the state of the art in parallel algorithms used for solving discrete optimization problems. We address heuristic and nonheuristic techniques for searching graphs as well as trees, and speed-up anomalies in parallel search that are caused by the inherent speculative nature of search techniques.

Journal ArticleDOI
TL;DR: Over the past twelve years, online algorithms have received considerable research interest and the term competitive analysis was coined when Sleator and Tarjan suggested comparing an online algorithm to an optimal offline algorithm.
Abstract: Over the past twelve years, online algorithms have received considerable research interest. Online problems had been investigated already in the seventies and early eighties but an extensive, systematic study started only when Sleator and Tarjan [41] suggested comparing an online algorithm to an optimal offline algorithm and Karlin, Manasse, Rudolph and Sleator [29] coined the term competitive analysis.

Journal ArticleDOI
TL;DR: A mesh-smoothing algorithm based on nonsmooth optimization techniques and a scalable implementation of this algorithm that proves that the parallel algorithm has a provably fast runtime bound and executes correctly for a parallel random access machine (PRAM) computational model.
Abstract: Maintaining good mesh quality during the generation and refinement of unstructured meshes in finite-element applications is an important aspect in obtaining accurate discretizations and well-conditioned linear systems. In this article, we present a mesh-smoothing algorithm based on nonsmooth optimization techniques and a scalable implementation of this algorithm. We prove that the parallel algorithm has a provably fast runtime bound and executes correctly for a parallel random access machine (PRAM) computational model. We extend the PRAM algorithm to distributed memory computers and report results for two- and three-dimensional simplicial meshes that demonstrate the efficiency and scalability of this approach for a number of different test cases. We also examine the effect of different architectures on the parallel algorithm and present results for the IBM SP supercomputer and an ATM-connected network of SPARC Ultras.

Proceedings ArticleDOI
12 Oct 1999
TL;DR: It is concluded that the distributed environment GA is the fastest way to gain a good solution under the given population size and uncertainty of the appropriate crossover and mutation rates.
Abstract: Introduces an alternative approach to relieving the task of choosing optimal mutation and crossover rates by using a parallel and distributed GA with distributed environments. It is shown that the best mutation and crossover rates depend on the population sizes and the problems, and those are different between a single and multiple populations. The proposed distributed environment GA uses various combination of the parameters as the fixed values in the subpopulations. The excellent performance of the new scheme is experimentally recognized for a standard test function. It is concluded that the distributed environment GA is the fastest way to gain a good solution under the given population size and uncertainty of the appropriate crossover and mutation rates.

Proceedings ArticleDOI
06 Jul 1999
TL;DR: The paper solves the redundancy allocation problem of a series-parallel system by developing and demonstrating a problem-specific ant system and using an adaptive penalty method to deal with the highly constrained problem.
Abstract: The paper solves the redundancy allocation problem of a series-parallel system by developing and demonstrating a problem-specific ant system. The problem is to select components and redundancy levels to maximize system reliability, given system-level constraints on cost and weight. The ant system algorithm presented in the paper is combined with an adaptive penalty method to deal with the highly constrained problem. An elitist strategy and mutation are introduced to our AS algorithm. The elitist strategy enhances the magnitude of the trails of good selections of components. The mutated ants can help explore new search areas. Experiments were conducted on a well known set of sample problems proposed by D.E. Fyffe et al. (1968).

Journal ArticleDOI
TL;DR: In this article, the synchronous computation of the partial sums of the two operands is proposed for the parallel multiplication of two n-bit numbers, which permits an efficient realization of parallel multiplication using iterative arrays.
Abstract: A new algorithm for the multiplication of two n-bit numbers based on the synchronous computation of the partial sums of the two operands is presented. The proposed algorithm permits an efficient realization of the parallel multiplication using iterative arrays. At the same time, it permits high-speed operation. Multiplier arrays for positive numbers and numbers in two's complement form based on the proposed technique are implemented. Also, an efficient pipeline form of the proposed multiplication scheme is introduced. All multipliers obtained have low circuit complexity permitting high-speed operation and the interconnections of the cells are regular, well-suited for VLSI realization.

Journal ArticleDOI
TL;DR: A parallel algorithm guided by a systematic partitioning of the task graph to perform scheduling using multiple processors that schedules both the tasks and messages and is suitable for graphs with arbitrary computation and communication costs, and is applicable to systems with arbitrary network topologies using homogeneous or heterogeneous processors.
Abstract: Existing heuristics for scheduling a node and edge weighted directed task graph to multiple processors can produce satisfactory solutions but incur high time complexities, which tend to exacerbate in more realistic environments with relaxed assumptions. Consequently, these heuristics do not scale well and cannot handle problems of moderate sizes. A natural approach to reducing complexity, while aiming for a similar or potentially better solution, is to parallelize the scheduling algorithm. This can be done by partitioning the task graphs and concurrently generating partial schedules for the partitioned parts, which are then concatenated to obtain the final schedule. The problem, however, is nontrivial as there exists dependencies among the nodes of a task graph which must be preserved for generating a valid schedule. Moreover, the time clock for scheduling is global for all the processors (that are executing the parallel scheduling algorithm), making the inherent parallelism invisible. In this paper, we introduce a parallel algorithm that is guided by a systematic partitioning of the task graph to perform scheduling using multiple processors. The algorithm schedules both the tasks and messages, and is suitable for graphs with arbitrary computation and communication costs, and is applicable to systems with arbitrary network topologies using homogeneous or heterogeneous processors. We have implemented the algorithm on the Intel Paragon and compared it with three closely related algorithms. The experimental results indicate that our algorithm yields higher quality solutions while using an order of magnitude smaller scheduling times. The algorithm also exhibits an interesting trade-off between the solution quality and speedup while scaling well with the problem size.

Journal ArticleDOI
TL;DR: A set of cost measures that can be applied to parallel algorithms to predict their computation, data access and communication performance make it possible to compare different parallel implementation strategies for data mining techniques without benchmarking each one.
Abstract: This article presents a set of cost measures that can be applied to parallel algorithms to predict their computation, data access and communication performance. These measures make it possible to compare different parallel implementation strategies for data mining techniques without benchmarking each one.