scispace - formally typeset
Search or ask a question

Showing papers on "Sequential algorithm published in 2004"


Journal ArticleDOI
TL;DR: The proposed sequential algorithm allows for early detection of a regime shift and subsequent monitoring of changes in its magnitude over time and can be easily used for an automatic calculation of regime shifts in large sets of variables.
Abstract: [1] Empirical studies of climate regime shifts typically use confirmatory statistical techniques with an a priori hypothesis about the timing of the shifts. Although there are methods for an automatic detection of discontinuities in a time series, their performance drastically diminishes at the ends of the series. Since all the methods currently available require a substantial amount of data to be accumulated, the regime shifts are usually detected long after they actually occurred. The proposed sequential algorithm allows for early detection of a regime shift and subsequent monitoring of changes in its magnitude over time. The algorithm can handle the incoming data regardless whether they are presented in the form of anomalies or absolute values. It can be easily used for an automatic calculation of regime shifts in large sets of variables.

952 citations


Journal ArticleDOI
Xavier Emery1
TL;DR: This work aims at examining what are the conditions for the realizations to reproduce the input parameters (indicator means and correlograms) and what happens with the other parameters (other two-point or multiple-point statistics).
Abstract: The sequential indicator algorithm is a widespread geostatistical simulation technique that relies on indicator (co)kriging and is applicable to a wide range of datasets. However, such algorithm comes up against several limitations that are often misunderstood. This work aims at highlighting these limitations, by examining what are the conditions for the realizations to reproduce the input parameters (indicator means and correlograms) and what happens with the other parameters (other two-point or multiple-point statistics). Several types of random functions are contemplated, namely: the mosaic model, random sets, models defined by multiple indicators and isofactorial models. In each case, the conditions for the sequential algorithm to honor the model parameters are sought after. Concurrently, the properties of the multivariate distributions are identified and some conceptual impediments are emphasized. In particular, the prior multiple-point statistics are shown to depend on external factors such as the total number of simulated nodes and the number and locations of the samples. As a consequence, common applications such as a flow simulation or a change of support on the realizations may lead to hazardous interpretations.

64 citations


Journal ArticleDOI
Xavier Emery1
TL;DR: The study concludes that, even in a favorable case where the simulated domain is large with respect to the range of the model, the realizations may poorly reproduce the second-order statistics and be inconsistent with the stationarity and ergodicity assumptions.
Abstract: The sequential algorithm is widely used to simulate Gaussian random fields. However, a rigorous application of this algorithm is impractical and some simplifications are required, in particular a moving neighborhood has to be defined. To examine the effect of such restriction on the quality of the realizations, a reference case is presented and several parameters are reviewed, mainly the histogram, variogram, indicator variograms, as well as the ergodic fluctuations in the first and second-order statistics. The study concludes that, even in a favorable case where the simulated domain is large with respect to the range of the model, the realizations may poorly reproduce the second-order statistics and be inconsistent with the stationarity and ergodicity assumptions. Practical tips such as the ‘multiple-grid strategy’ do not overcome these impediments. Finally, extending the original algorithm by using an ordinary kriging should be avoided, unless an intrinsic random function model is sought after.

62 citations


Proceedings ArticleDOI
26 Apr 2004
TL;DR: Four parallel MST algorithms are designed and implemented for arbitrary sparse graphs that for the first time give speedup when compared with the best sequential algorithm, and also solve the minimum spanning forest problem.
Abstract: Summary form only given. Minimum spanning tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical imaging, and proteomics, and national security and bioterrorism such as detecting the spread of toxins through populations in the case of biological/chemical warfare. Most of the previous attempts for improving the speed of MST using parallel computing are too complicated to implement or perform well only on special graphs with regular structure. We design and implement four parallel MST algorithms (three variations of Boruvka plus our new approach) for arbitrary sparse graphs that for the first time give speedup when compared with the best sequential algorithm. In fact, our algorithms also solve the minimum spanning forest problem. We provide an experimental study of our algorithms on symmetric multiprocessors such as IBM's p690/Regatta and Sun's Enterprise servers. Our new implementation achieves good speedups over a wide range of input graphs with regular and irregular structures, including the graphs used by previous parallel MST studies. For example, on an arbitrary random graph with IM vertices and 20M edges, our new approach achieves a speedup of 5 using 8 processors. The source code for these algorithms is freely-available from our Web site hpc.ece.unm.edu. This work was supported in part by NSF Grants CAREER ACI-00-93039, ITR ACI-00-81404, DEB-99-10123, ITR EIA-01-21377, Biocomplexity DEB-01-20709, and ITR EF/BIO 03-31654.

56 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: In this paper's parallel FCM algorithm, dividing the computations among the processors and minimizing the need for accessing secondary storage, enhance the performance and efficiency of image segmentation task as compared to the sequential algorithm.
Abstract: This paper proposes a parallel Fuzzy C-Mean (FCM) algorithm for image segmentation. The sequential FCM algorithm is computationally intensive and has significant memory requirements. For many applications such as medical image segmentation and geographical image analysis that deal with large size images, sequential FCM is very slow. In our parallel FCM algorithm, dividing the computations among the processors and minimizing the need for accessing secondary storage, enhance the performance and efficiency of image segmentation task as compared to the sequential algorithm.

56 citations


01 Jan 2004
TL;DR: In this paper, a Bayesian approach is proposed to find a set of control variables at which the response is insensitive to the value of the environmental variables, a "robust" choice of control variable.
Abstract: This paper is concerned with the design of computer experiments when there are two types of inputs: control variables and environmental variables. Con- trol variables, also called manufacturing variables, are determined by a product designer while environmental variables, called noise variables in the quality control literature, are uncontrolled in the field but take values that are characterized by a probability distribution. Our goal is to find a set of control variables at which the response is insensitive to the value of the environmental variables, a "robust" choice of control variables. Such a choice ensures that the mean response is as insensitive as possible to perturbations of the nominal environmental variable distribution. We present a sequential strategy to select the inputs at which to observe the re- sponse so as to determine a robust setting of the control variables. Our solution is Bayesian; the prior takes the response as a draw from a stationary Gaussian stochas- tic process. Given the previous information, the sequential algorithm computes for each untested site the "improvement" over the current guess of the optimal robust setting. The design selects the next site to maximize the expected improvement

55 citations


Proceedings ArticleDOI
10 May 2004
TL;DR: A VLSI K maximum subarrays algorithm with O(K * n) steps and a circuit size of O(n/sup 2/), which is cost-optimal in parallelisation of the sequential algorithm.
Abstract: Given an array of positive and negative values, we consider the problem of K maximum sums. When an overlapping property needs to be observed, previous algorithms for the maximum sum are not directly applicable. We designed an O(K * n) algorithm for the K maximum subsequences problem. This was then modified to solve the K maximum subarrays problem in O(K * n/sup 3/) time. Finally, we present a VLSI K maximum subarrays algorithm with O(K * n) steps and a circuit size of O(n/sup 2/), which is cost-optimal in parallelisation of the sequential algorithm.

53 citations


Journal ArticleDOI
TL;DR: A global parallelization approach is adopted that preserves the properties, behavior, and fundamental of the sequential algorithm.

48 citations


Journal ArticleDOI
TL;DR: In this article, a modified sequential approach is proposed to improve the performance of the sequential function specification method for inverse heat conduction problems (IHCPs) by adding the use of future information in the process of preliminary estimation.

31 citations


Dissertation
01 Jan 2004
TL;DR: A number of algorithmic and technical solutions are introduced which for the first time enable parallel inference of large phylogenetic trees comprising up to 10.000 organisms with maximum likelihood based on DNA sequence data based on the maximum likelihood criterion.
Abstract: The computation of large phylogenetic (evolutionary) trees from DNA sequence data based on the maximum likelihood criterion is most probably NP-complete Furthermore, the computation of the likelihood value for one single potential tree topology is computationally intensive This thesis introduces a number of algorithmic and technical solutions which for the first time enable parallel inference of large phylogenetic trees comprising up to 10000 organisms with maximum likelihood The algorithmic part includes a technique to accelerate the computation of likelihood values, as well as novel search-space heuristics which significantly accelerate the tree inference process and yield better final trees at the same time The technical part covers technical solutions for the acquisition of the enormous amount of required computational resources such as parallel MPI-based and distributed seti@home-like implementations of the basic sequential algorithm Finally, the program has been used to compute a biologically significant initial small "tree of life" containing 10000 representative organisms from the three domains: Bacteria, Eukarya, and Archaea based on data from the ARB database

30 citations


Proceedings ArticleDOI
26 Apr 2004
TL;DR: The results show that the distributed steady state GA is an efficient and accurate tool for solving RND that even outperforms existing parallel solutions.
Abstract: Summary form only given. Evolutionary algorithms (EAs) are applied to solve the radio network design problem (RND). The task is to find the best set of transmitter locations in order to cover a given geographical region at an optimal cost. Usually, parallel EAs are needed in order to cope with the high computational requirements of such a problem. Here, we try to develop and evaluate a set of sequential and parallel genetic algorithms (GAs) in order to solve efficiently the RND problem. The results show that our distributed steady state GA is an efficient and accurate tool for solving RND that even outperforms existing parallel solutions. The sequential algorithm performs very efficiently from a numerical point of view, although the distributed version is much faster, with an observed linear speedup.

Journal ArticleDOI
TL;DR: The proposed optimal forgetting algorithm performs as well as the best hand tuned forgetting factor and results in a continuously adaptive compensation technique without the need of any manual adjustment.
Abstract: Mismatch is known to degrade the performance of speech recognition systems. In real life applications we often encounter nonstationary mismatch sources. A general way to compensate for slowly time varying mismatch is by using sequential algorithms with forgetting. The choice of the forgetting factor is usually performed empirically on some development data, and no optimality criterion is used. In this paper we introduce a framework for obtaining optimal forgetting factor. In sequential algorithms, a recursion is usually used to calculate the required parameters so as to optimize a certain performance measure. To obtain optimal forgetting, we develop a recursion to calculate the forgetting factor that optimizes the same performance criterion as done in the original recursion. When combined together the two recursions result in a sequential algorithm that simultaneously optimizes the desired parameters and the forgetting factor. The proposed method is applied in conjunction with a sequential noise estimation algorithm, but the same principle can be extended to a wide range of sequential algorithms. The algorithm is extensively evaluated for different speech recognition tasks: the 5K Wall Street Journal task corrupted by different types of artificially added noise, a command and digit database recorded in a noisy car environment, and a 20K Japanese broadcast news task corrupted by field noise. In all situations it was found that the sequential algorithm performs as well as or better than batch estimation. In addition, the proposed optimal forgetting algorithm performs as well as the best hand tuned forgetting factor. This results in a continuously adaptive compensation technique without the need of any manual adjustment.

Journal Article
TL;DR: A novel combination of emergent algorithmic methods, powerful computational platforms and supporting infrastructure are used to launch systematic attacks on combinatorial problems of significance, and it is shown to be critical to achieving scalability.
Abstract: A novel combination of emergent algorithmic methods, powerful computational platforms and supporting infrastructure is described. These complementary tools and technologies are used to launch systematic attacks on combinatorial problems of significance. As a case study, optimal solutions to very large instances of the NP-hard vertex cover problem are computed. To accomplish this, an efficient sequential algorithm and two forms of parallel algorithms are devised and implemented. The importance of maintaining a balanced decomposition of the search space is shown to be critical to achieving scalability. With the synergistic combination of techniques detailed here, it is now possible to solve problem instances that before were widely viewed as hopelessly out of reach. Target problems need only be amenable to reduction and decomposition. Applications are also discussed.

01 Mar 2004
TL;DR: The main principle of the approach is to generate an initial solution, and at different levels of the tree search to determine a new upper bound used with a best-first search strategy.
Abstract: In this paper, we propose an optimal algorithm for the Multiple-choice Multidimensional Knapsack Problem MMKP. The main principle of the approach is twofold: (i) to generate an initial solution, and (ii) at different levels of the tree search to determine a new upper bound used with a best-first search strategy. The developed method was able to optimally solve the MMKP. The performance of the exact algorithm is evaluated on a set of small and medium instances. This algorithm is parallelizable and it is one of its important feature.

Journal Article
TL;DR: The problem of coding labeled trees by means of strings of node labels is considered and a unified approach based on a reduction of both coding and decoding to integer (radix) sorting is presented, solving the problem of optimally computing the second code presented by Neville.
Abstract: We consider the problem of coding labeled trees by means of strings of node labels and we present a unified approach based on a reduction of both coding and decoding to integer (radix) sorting. Applying this approach to four well-known codes introduced by Prufer [18], Neville [17], and Deo and Micikevicius [5], we close some open problems. With respect to coding, our general sequential algorithm requires optimal linear time, thus solving the problem of optimally computing the second code presented by Neville. The algorithm can be parallelized on the EREW PRAM model, so as to work in O(logn) time using O(n) or O(nlog n) operations, depending on the code. With respect to decoding, the problem of finding an optimal sequential algorithm for the second Neville code was also open, and our general scheme solves it. Furthermore, in a parallel setting our scheme yields the first efficient decoding algorithms for the codes in [5] and [17].

Book ChapterDOI
05 Dec 2004
TL;DR: The key expansion optimality of several known algorithms is obtained among the class of all masking-based domain extenders for universal one-way hash functions (UOWHFs) and a new parallel domain extender for UOWHF is presented.
Abstract: We study the class of masking based domain extenders for UOWHFs. Our first contribution is to show that any correct masking based domain extender for UOWHF which invokes the compression UOWHF s times must use at least ⌈log2 s⌉ masks. As a consequence, we obtain the key expansion optimality of several known algorithms among the class of all masking based domain extending algorithms. Our second contribution is to present a new parallel domain extender for UOWHF. The new algorithm achieves asymptotically optimal speed-up over the sequential algorithm and the key expansion is almost everywhere optimal, i.e., it is optimal for almost all possible number of invocations of the compression UOWHF. Our algorithm compares favourably with all previously known masking based domain extending algorithms.

Book ChapterDOI
14 May 2004
TL;DR: This paper shows how to use speculative parallelization techniques to execute in parallel iterative algorithms such as randomized incremental constructions, and shows that the convex hull problem can be automatically executed in parallel, obtaining speedups with as little as four processors, and reaching 5.15x speedup with 28 processors.
Abstract: Finding the fastest algorithm to solve a problem is one of the main issues in Computational Geometry. Focusing only on worst case analysis or asymptotic computations leads to the development of complex data structures or hard to implement algorithms. Randomized algorithms appear in this scenario as a very useful tool in order to obtain easier implementations within a good expected time bound. However, parallel implementations of these algorithms are hard to develop and require an in-depth understanding of the language, the compiler and the underlying parallel computer architecture. In this paper we show how we can use speculative parallelization techniques to execute in parallel iterative algorithms such as randomized incremental constructions. In this paper we focus on the convex hull problem, and show that, using our speculative parallelization engine, the sequential algorithm can be automatically executed in parallel, obtaining speedups with as little as four processors, and reaching 5.15x speedup with 28 processors.

Book ChapterDOI
05 Apr 2004
TL;DR: In this paper, a unified approach based on a reduction of both coding and decoding to integer (radix) sorting is presented. But it does not consider the problem of coding labeled trees by means of strings of node labels.
Abstract: We consider the problem of coding labeled trees by means of strings of node labels and we present a unified approach based on a reduction of both coding and decoding to integer (radix) sorting. Applying this approach to four well-known codes introduced by Prufer [18], Neville [17], and Deo and Micikevicius [5], we close some open problems. With respect to coding, our general sequential algorithm requires optimal linear time, thus solving the problem of optimally computing the second code presented by Neville. The algorithm can be parallelized on the EREW PRAM model, so as to work in O(log n) time using O(n) or \(O(n \sqrt{log n})\) operations, depending on the code.

Proceedings ArticleDOI
17 May 2004
TL;DR: Using the proposed approach on bandpass filtered speech and music, this work can extract the fine-structured modulations that occur on a micro-time scale, within an analysis frame.
Abstract: We present a new zero-crossing based algorithm for decomposing a bandpass signal into the amplitude modulation (AM) and frequency modulation (FM) components. In this sequential algorithm, the FM component is first estimated using zero-crossing instant information in a k-nearest neighbour (k-NN) framework. The AM component is estimated by coherent demodulation using a time-varying lowpass filter that uses the estimated instantaneous frequency. Simulation results show that the proposed algorithm gives more accurate envelope and frequency estimates compared to the discrete-energy separation algorithm (DESA) which uses the Teager energy operator. Using the proposed approach on bandpass filtered speech and music, we can extract the fine-structured modulations that occur on a micro-time scale, within an analysis frame.

Book ChapterDOI
13 Dec 2004
TL;DR: This paper developed an algorithm of building a library to map helices in a 3D structure to its 1-dimensional (1D) structure using the length constraints of helices, obtained from such partial information.
Abstract: Determining 3-dimensional (3D) structures of proteins is still a challenging problem. Certain experimental techniques can produce partial information about protein structures, yet not enough to solve the structure. In this paper, we investigate the problem of relating such partial information to its protein sequence. We developed an algorithm of building a library to map helices in a 3D structure to its 1-dimensional (1D) structure using the length constraints of helices, obtained from such partial information. We present a parallel algorithm for building a mapping tree using dynamic distributed scheduling for load balancing. The algorithm shows near linear speedup for up to 20 processors tested. If the protein secondary structure prediction is good, the library contains a mapping that correctly assigns the majority of the helices in the protein.

Journal ArticleDOI
TL;DR: The genetic algorithm is used successfully to improve the performance over the heuristic algorithms and a wavelength lower bound estimate on the minimum number of passes required is calculated and compared to the results obtained using heu- ristic, genetic, and simulated annealing algorithms to show the advantages of simulatedAnnealing algorithm.
Abstract: Multistage interconnection networks (MINs) are popular in switching and communication applications and have been used in tele- communication and parallel computing systems for many years. Crosstalk a major problem introduced by an optical MIN, is caused by coupling two signals within a switching element. We focus on an efficient solution to avoiding crosstalk by routing traffic through anN3N optical network to avoid coupling two signals within each switching element us- ing wavelength-division multiplexing (WDM) and a time-division ap- proach. Under the constraint of avoiding crosstalk, the interest is on realizing a permutation that uses the minimum number of passes for routing. This routing problem is an NP-hard problem. Many heuristic al- gorithms are already designed by researchers to perform this routing such as a sequential algorithm, a degree-descending algorithm, etc. The genetic algorithm is used successfully to improve the performance over the heuristic algorithms. The drawback of the genetic algorithm is its long running times. We use the simulated annealing algorithm to improve the performance of solving the problem and optimizing the result. In addition, a wavelength lower bound estimate on the minimum number of passes required is calculated and compared to the results obtained using heu- ristic, genetic, and simulated annealing algorithms. Many cases are tested and the results are compared to the results of other algorithms to show the advantages of simulated annealing algorithm. © 2004 Society of

Book ChapterDOI
TL;DR: Efficient BSP/CGM parallel algorithms that require a constant number of communication rounds for both the maximum subsequence problem and the analysis of DNA or protein sequences are presented.
Abstract: The maximum subsequence problem finds the contiguous subsequence of n real numbers with the highest sum. This problem appears in the analysis of DNA or protein sequences. It can be solved sequentially in O(n) time. In the 2-D version, given an n × n array A, the maximum subarray of A is the contiguous subarray that has the maximum sum. The sequential algorithm for the maximum subarray problem takes O(n 3) time. We present efficient BSP/CGM parallel algorithms that require a constant number of communication rounds for both problems. In the first algorithm, the sequence stored on each processor is reduced to only five numbers, so that the resulting values can be concentrated on a single processor which runs an adaptation of the sequential algorithm to obtain the result. The parallel algorithm requires O(n/p) computing time. In the second algorithm, the input array is partitioned equally among the processors and we first reduce each subarray to a sequence, and then apply the first algorithm to solve it. The parallel algorithm takes O(n 3/p) computing time. The good performance of the parallel algorithms is confirmed by experimental results run on a 64-node Beowulf parallel computer.

Journal ArticleDOI
TL;DR: A simple sequential algorithm for the problem of computing the connected components of the complement of a given graph, which works on the input graph and not on its complement, and which for a graph on n vertices and m edges runs in optimal O(n+m) time.
Abstract: In this paper we consider the problem of computing the connected components of the complement of a given graph. We describe a simple sequential algorithm for this problem, which works on the input graph and not on its complement, and which for a graph on n vertices and m edges runs in optimal O(n+m) time. Moreover, unlike previous linear co-connectivity algorithms, this algorithm admits efficient parallelization, leading to an optimal O(log n)-time and O((n+m)log n)-processor algorithm on the EREW PRAM model of computation. It is worth noting that, for the related problem of computing the connected components of a graph, no optimal deterministic parallel algorithm is currently available. The co-connectivity algorithms find applications in a number of problems. In fact, we also include a parallel recognition algorithm for weakly triangulated graphs, which takes advantage of the parallel co-connectivity algorithm and achieves an O(log2 n) time complexity using O((n+m2) log n) processors on the EREW PRAM model of computation.

Journal ArticleDOI
TL;DR: A sequential algorithm is presented to find all cut-vertices on trapezoid graphs using an EREW PRAM model and runs in O(n) time.
Abstract: In this paper, a sequential algorithm is presented to find all cut-vertices on trapezoid graphs. To every trapezoid graph G there is a corresponding trapezoid representation. If all the 4n corner points of n trapezoids, in a trapezoid representation of a trapezoid graph G with n vertices, are given, then the proposed sequential algorithm runs in O(n) time. Parallel implementation of this algorithm can be done in O(log n) time using O(n/ log n) processors on an EREW PRAM model.

Proceedings ArticleDOI
29 Nov 2004
TL;DR: A sequential algorithm operating on a tree that approximates the a posteriori probabilities with reduced complexity is applied to a list-sequential multiuser detector for coded applications in a turbo scheme.
Abstract: We study a list-sequential (LISS) multiuser detector for coded applications in a turbo scheme. Optimal with respect to the bit error rate within that iterative scheme would be an APP detector that calculates the a posteriori probabilities. Unfortunately, this detector suffers from a prohibitively high computational complexity for a large number of users. Therefore, we apply a sequential algorithm operating on a tree that approximates the a posteriori probabilities with reduced complexity. The tradeoff between performance and computational burden can be controlled by the size of the available memory.

Journal ArticleDOI
TL;DR: Two simple and efficient parallel algorithms for the construction of the Delaunay triangulation in E2 by randomized incremental insertion are described and several times even super-linear speed-up is noticed compared with the reference sequential algorithm.

Proceedings ArticleDOI
17 May 2004
TL;DR: A simple sequential algorithm for deriving initial values for Gaussian mixture parameters used in HMM-based speech recognition provides good speech recognition performance when compared to models obtained with the usual Gaussian splitting procedure.
Abstract: A simple sequential algorithm for deriving initial values for Gaussian mixture parameters used in HMM-based speech recognition is presented. The proposed algorithm sequentially clusters the training frames, in the order in which they are available and according to the density to which they are associated. This frame-density association results from a frame-state alignment of the training data performed with a single-Gaussian model, which is good enough for such a force-alignment task. The models obtained with the proposed sequential clustering procedure provide good speech recognition performance when compared to models obtained with the usual Gaussian splitting procedure.

23 Jun 2004
TL;DR: The aim of this communication is to present a method of data validation for dynamic linear systems, which is able to take into account the uncertainties of the model parameters.
Abstract: The methods of data validation which were developed these last years largely call for the redundancy resulting from models. The case of models with certain parameters (static and/or dynamic) was analyzed and received many solutions. However, there is relatively few work concerning the data validation in the presence of model uncertainties. The aim of this communication is to present a method of data validation for dynamic linear systems, which is able to take into account the uncertainties of the model parameters. Firstly we represent the dynamic model of the system in a static form by piling the state and measurement vectors on an observation window. Secondly, the elementary operations relating to the intervals make it possible to propose a state estimation of the system taking into account the parameters uncertainties. As the uncertainties are supposed to be bounded, the estimation's result is provided in an interval form. A sequential algorithm is used to obtain the state estimation by carrying out the intersection between the estimation resulting from three methods (Gauss elimination, Gauss-Seidel iteration and Krawczyk iteration). By analyzing this estimation, we can detect and isolate the data which are affected by gross errors as biases and propose a correction to make these data coherent with the model of the system.

Proceedings ArticleDOI
28 Mar 2004
TL;DR: A grid-based decision tree architecture is presented, and it is hoped that through these definitions, software developers can define clear system processes and differentiate the application scope for software applications.
Abstract: Decision tree is one of the frequently used methods in data mining for searching prediction information. Due to its characteristics which are suitable for parallelism, it has been widely adopted in high performance field and developed into various parallel decision tree algorithms to deal with huge data and complex computation. Following the development of other technology fields, grid computing is regarded as the extension of PC cluster and therefore it future research development is highly valued. This new wave of Internet application is the 3rd generation of Internet applications following the traditional Internet and Web application. We have presented a grid-based decision tree architecture, and hope it can be applied on both parallel and sequential algorithms for the decision tree applications. Also, based on the scope and model of data mining applied in grid environment as well as user equivalent perspective, grid roles can be categorized into three types. We are hoping that through these definitions, software developers can define clear system processes and differentiate the application scope for software applications. To fulfil our architecture, we first apply an existing parallel decision tree algorithm-SPRINT algorithm in the grid environment. The performance and differences in many other areas are compared using different sizes of dataset. The experimental results are used for future reference and further development.

Journal ArticleDOI
TL;DR: The two-fixed-endpoint Hamiltonian path problem on distance-hereditary graphs efficiently in parallel is solved and a linear-time algorithm is obtained which is faster than the previous known O(|V|3) sequential algorithm.