Topic
Interval tree
About: Interval tree is a research topic. Over the lifetime, 1951 publications have been published within this topic receiving 70069 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This work has used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches.
Abstract: The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. (Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.) The size of homologous sequence data sets has in- creased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. More- over, current probabilistic sequence evolution models (Swofford et al., 1996 ; Page and Holmes, 1998 ), notably those including rate variation among sites (Uzzell and Corbin, 1971 ; Jin and Nei, 1990 ; Yang, 1996 ), require an increasing number of calculations. Therefore, the speed of phylogeny reconstruction methods is becoming a sig- nificant requirement and good compromises between speed and accuracy must be found. The maximum likelihood (ML) approach is especially accurate for building molecular phylogenies. Felsenstein (1981) brought this framework to nucleotide-based phy- logenetic inference, and it was later also applied to amino acid sequences (Kishino et al., 1990). Several vari- ants were proposed, most notably the Bayesian meth- ods (Rannala and Yang 1996; and see below), and the discrete Fourier analysis of Hendy et al. (1994), for ex- ample. Numerous computer studies (Huelsenbeck and Hillis, 1993; Kuhner and Felsenstein, 1994; Huelsenbeck, 1995; Rosenberg and Kumar, 2001; Ranwez and Gascuel, 2002) have shown that ML programs can recover the cor- rect tree from simulated data sets more frequently than other methods can. Another important advantage of the ML approach is the ability to compare different trees and evolutionary models within a statistical framework (see Whelan et al., 2001, for a review). However, like all optimality criterion-based phylogenetic reconstruction approaches, ML is hampered by computational difficul- ties, making it impossible to obtain the optimal tree with certainty from even moderate data sets (Swofford et al., 1996). Therefore, all practical methods rely on heuristics that obtain near-optimal trees in reasonable computing time. Moreover, the computation problem is especially difficult with ML, because the tree likelihood not only depends on the tree topology but also on numerical pa- rameters, including branch lengths. Even computing the optimal values of these parameters on a single tree is not an easy task, particularly because of possible local optima (Chor et al., 2000). The usual heuristic method, implemented in the pop- ular PHYLIP (Felsenstein, 1993 ) and PAUP ∗ (Swofford, 1999 ) packages, is based on hill climbing. It combines stepwise insertion of taxa in a growing tree and topolog- ical rearrangement. For each possible insertion position and rearrangement, the branch lengths of the resulting tree are optimized and the tree likelihood is computed. When the rearrangement improves the current tree or when the position insertion is the best among all pos- sible positions, the corresponding tree becomes the new current tree. Simple rearrangements are used during tree growing, namely "nearest neighbor interchanges" (see below), while more intense rearrangements can be used once all taxa have been inserted. The procedure stops when no rearrangement improves the current best tree. Despite significant decreases in computing times, no- tably in fastDNAml (Olsen et al., 1994 ), this heuristic becomes impracticable with several hundreds of taxa. This is mainly due to the two-level strategy, which sepa- rates branch lengths and tree topology optimization. In- deed, most calculations are done to optimize the branch lengths and evaluate the likelihood of trees that are finally rejected. New methods have thus been proposed. Strimmer and von Haeseler (1996) and others have assembled four- taxon (quartet) trees inferred by ML, in order to recon- struct a complete tree. However, the results of this ap- proach have not been very satisfactory to date (Ranwez and Gascuel, 2001 ). Ota and Li (2000, 2001) described
16,261 citations
••
TL;DR: The multidimensional binary search tree (or k-d tree) as a data structure for storage of information to be retrieved by associative searches is developed and it is shown to be quite efficient in its storage requirements.
Abstract: This paper develops the multidimensional binary search tree (or k-d tree, where k is the dimensionality of the search space) as a data structure for storage of information to be retrieved by associative searches. The k-d tree is defined and examples are given. It is shown to be quite efficient in its storage requirements. A significant advantage of this structure is that a single data structure can handle many types of queries very efficiently. Various utility algorithms are developed; their proven average running times in an n record file are: insertion, O(log n); deletion of the root, O(n(k-1)/k); deletion of a random node, O(log n); and optimization (guarantees logarithmic performance of searches), O(n log n). Search algorithms are given for partial match queries with t keys specified [proven maximum running time of O(n(k-t)/k)] and for nearest neighbor queries [empirically observed average running time of O(log n).] These performances far surpass the best currently known algorithms for these tasks. An algorithm is presented to handle any general intersection query. The main focus of this paper is theoretical. It is felt, however, that k-d trees could be quite useful in many applications, and examples of potential uses are given.
7,159 citations
••
TL;DR: Both a mathematical method for computing the number of trees with a given value of topological difference from the NJ tree and a computer algorithm for identifying all the topologies are developed.
Abstract: A simple method for estimating and testing phylogenetic trees under the principle of minimum evolution (ME) is presented. The basic procedure of this method is first to obtain the neighbor-joining (NJ) tree by Saitou and Nei’s method and then to search for a tree with the minimum value of the sum (S) of branch lengths by examining all trees that are closely related to the NJ tree. Once the ME tree is identified, a statistical test is conducted for the difference in S between this tree and other closely related trees. The mathematical method required for conducting this test is developed by using the least-squares approach. Computer simulation has shown that this method identifies the correct tree with a high probability, as long as the number of nucleotides examined is sufficiently large. It has also been shown that the topology of the NJ tree is almost always identical with that of the ME tree. A method for obtaining least-squares estimates (and their standard errors) of branch lengths for a given topology is also presented. This method can be used for testing the reliability of the branching pattern of the ME tree. However, the statistical test of S values is more powerful in rejecting incorrect trees than is the branch-length test or bootstrapping. Furthermore, both a mathematical method for computing the number of trees with a given value of topological difference from the NJ tree and a computer algorithm for identifying all the topologies are developed.
1,385 citations
••
TL;DR: An algorithm is presented which finds a p-median of a tree (for $p > 1$) in time $O(n^2 \cdot p^2 )$.
Abstract: It is shown that the problem of finding a p-median of a network is an $NP$-hard problem even when the network has a simple structure (e.g., planar graph of maximum vertex degree 3). However, results leading to efficient algorithms are presented when the network is a tree: In particular, we first show that a 1-median of a tree is identical to its w-centroid, and obtain Goldman’s $O(n)$ algorithm for finding a 1-median of a tree out of more general considerations. Then, we present an algorithm which finds a p-median of a tree (for $p > 1$) in time $O(n^2 \cdot p^2 )$.
1,333 citations
••
29 May 2006TL;DR: A new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte- carlo phase is presented, that provides finegrained control of the tree growth, at the level of individual simulations, and allows efficient selectivity.
Abstract: A Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations. The method can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to minmax as the number of simulations grows. This approach provides a finegrained control of the tree growth, at the level of individual simulations, and allows efficient selectivity. The resulting algorithm was implemented in a 9 × 9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament.
1,273 citations