scispace - formally typeset
Search or ask a question

Showing papers on "Tree (data structure) published in 2006"


Book
01 Jan 2006
TL;DR: This paper discusses Fixed-Parameter Algorithms, Parameterized Complexity Theory, and Selected Case Studies, and some of the techniques used in this work.
Abstract: PART I: FOUNDATIONS 1. Introduction to Fixed-Parameter Algorithms 2. Preliminaries and Agreements 3. Parameterized Complexity Theory - A Primer 4. Vertex Cover - An Illustrative Example 5. The Art of Problem Parameterization 6. Summary and Concluding Remarks PART II: ALGORITHMIC METHODS 7. Data Reduction and Problem Kernels 8. Depth-Bounded Search Trees 9. Dynamic Programming 10. Tree Decompositions of Graphs 11. Further Advanced Techniques 12. Summary and Concluding Remarks PART III: SOME THEORY, SOME CASE STUDIES 13. Parameterized Complexity Theory 14. Connections to Approximation Algorithms 15. Selected Case Studies 16. Zukunftsmusik References Index

1,730 citations


Journal ArticleDOI
TL;DR: This paper presents a new method for visualizing compound graphs based on visually bundling the adjacency edges, i.e., non-hierarchical edges, together and discusses the results based on an informal evaluation provided by potential users of such visualizations.
Abstract: A compound graph is a frequently encountered type of data set. Relations are given between items, and a hierarchy is defined on the items as well. We present a new method for visualizing such compound graphs. Our approach is based on visually bundling the adjacency edges, i.e., non-hierarchical edges, together. We realize this as follows. We assume that the hierarchy is shown via a standard tree visualization method. Next, we bend each adjacency edge, modeled as a B-spline curve, toward the polyline defined by the path via the inclusion edges from one node to another. This hierarchical bundling reduces visual clutter and also visualizes implicit adjacency edges between parent nodes that are the result of explicit adjacency edges between their respective child nodes. Furthermore, hierarchical edge bundling is a generic method which can be used in conjunction with existing tree visualization techniques. We illustrate our technique by providing example visualizations and discuss the results based on an informal evaluation provided by potential users of such visualizations

1,057 citations


Journal ArticleDOI
TL;DR: TreeDyn is a tree visualization and annotation tool which includes tools for tree manipulation and annotation and uses meta-information through dynamic graphical operators or scripting to help analyses and annotations of single trees or tree collections.
Abstract: Analyses of biomolecules for biodiversity, phylogeny or structure/function studies often use graphical tree representations. Many powerful tree editors are now available, but existing tree visualization tools make little use of meta-information related to the entities under study such as taxonomic descriptions or gene functions that can hardly be encoded within the tree itself (if using popular tree formats). Consequently, a tedious manual analysis and post-processing of the tree graphics are required if one needs to use external information for displaying or investigating trees. We have developed TreeDyn, a tool using annotations and dynamic graphical methods for editing and analyzing multiple trees. The main features of TreeDyn are 1) the management of multiple windows and multiple trees per window, 2) the export of graphics to several standard file formats with or without HTML encapsulation and a new format called TGF, which enables saving and restoring graphical analysis, 3) the projection of texts or symbols facing leaf labels or linked to nodes, through manual pasting or by using annotation files, 4) the highlight of graphical elements after querying leaf labels (or annotations) or by selection of graphical elements and information extraction, 5) the highlight of targeted trees according to a source tree browsed by the user, 6) powerful scripts for automating repetitive graphical tasks, 7) a command line interpreter enabling the use of TreeDyn through CGI scripts for online building of trees, 8) the inclusion of a library of packages dedicated to specific research fields involving trees. TreeDyn is a tree visualization and annotation tool which includes tools for tree manipulation and annotation and uses meta-information through dynamic graphical operators or scripting to help analyses and annotations of single trees or tree collections.

1,014 citations


Proceedings ArticleDOI
17 Jul 2006
TL;DR: An automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank is presented.
Abstract: We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple X-bar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems.

957 citations


Journal ArticleDOI
TL;DR: It is found that for any species treeTopology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny.
Abstract: Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of within-species evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies.

878 citations


Proceedings Article
01 May 2006
TL;DR: This work provides a combined engine for tree query (Tregex) and manipulation (Tsurgeon) that can operate on arbitrary tree data structures with no need for preprocessing.
Abstract: With syntactically annotated corpora becoming increasingly available for a variety of languages and grammatical frameworks, tree query tools have proven invaluable to linguists and computer scientists for both data exploration and corpus-based research. We provide a combined engine for tree query (Tregex) and manipulation (Tsurgeon) that can operate on arbitrary tree data structures with no need for preprocessing. Tregex remedies several expressive and implementational limitations of existing query tools, while Tsurgeon is to our knowledge the most expressive tree manipulation utility available.

380 citations


Journal ArticleDOI
TL;DR: An algorithm of the intuitionistic fuzzy fault-tree analysis is proposed in this paper to calculate fault interval of system components and to find the most critical system component for the managerial decision-making based on some basic definitions.

367 citations


Proceedings ArticleDOI
17 Jul 2006
TL;DR: A novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string that significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.
Abstract: We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntax-based because TATs are extracted automatically from word-aligned, source side parsed parallel texts. To translate a source sentence, we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. Our experiments show that the TAT-based model significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.

350 citations


Proceedings ArticleDOI
15 May 2006
TL;DR: This work uses a replanning algorithm for repairing rapidly-exploring random trees when changes are made to the configuration space to create a probabilistic analog to the widely-used D* family of deterministic algorithms, and demonstrates its effectiveness in a multirobot planning domain.
Abstract: We present a replanning algorithm for repairing rapidly-exploring random trees when changes are made to the configuration space. Instead of abandoning the current RRT, our algorithm efficiently removes just the newly-invalid parts and maintains the rest. It then grows the resulting tree until a new solution is found. We use this algorithm to create a probabilistic analog to the widely-used D* family of deterministic algorithms, and demonstrate its effectiveness in a multirobot planning domain

284 citations


Journal Article
TL;DR: A kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time and its predictive accuracy was found to be competitive with other recently introduced hierarchical multi-category or multilabel classification learning algorithms.
Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm's predictive accuracy was found to be competitive with other recently introduced hierarchical multi-category or multilabel classification learning algorithms.

281 citations


Journal ArticleDOI
TL;DR: This work presents a dynamic program to find the most parsimonious gene family tree with respect to a macroevolutionary optimization criterion, the weighted sum of the number of gene duplications and losses.
Abstract: Gene family evolution is determined by microevolutionary processes (e.g., point mutations) and macroevolutionary processes (e.g., gene duplication and loss), yet macroevolutionary considerations are rarely incorporated into gene phylogeny reconstruction methods. We present a dynamic program to find the most parsimonious gene family tree with respect to a macroevolutionary optimization criterion, the weighted sum of the number of gene duplications and losses. The existence of a polynomial delay algorithm for duplication/loss phylogeny reconstruction stands in contrast to most formulations of phylogeny reconstruction, which are NP-complete. We next extend this result to obtain a two-phase method for gene tree reconstruction that takes both micro- and macroevolution into account. In the first phase, a gene tree is constructed from sequence data, using any of the previously known algorithms for gene phylogeny construction. In the second phase, the tree is refined by rearranging regions of the tree that do not have strong support in the sequence data to minimize the duplication/lost cost. Components of the tree with strong support are left intact. This hybrid approach incorporates both micro- and macroevolutionary considerations, yet its computational requirements are modest in practice because the two-phase approach constrains the search space. Our hybrid algorithm can also be used to resolve nonbinary nodes in a multifurcating gene tree. We have implemented these algorithms in a software tool, NOTUNG 2.0, that can be used as a unified framework for gene tree reconstruction or as an exploratory analysis tool that can be applied post hoc to any rooted tree with bootstrap values. The NOTUNG 2.0 graphical user interface can be used to visualize alternate duplication/loss histories, root trees according to duplication and loss parsimony, manipulate and annotate gene trees, and estimate gene duplication times. It also offers a command line option that enables high-throughput analysis of a large number of trees.

Journal ArticleDOI
01 Jun 2006-Wetlands
TL;DR: In this article, the authors used classification tree analysis (CTA) and Stochastic gradient boosting (SGB) decision-tree-based classification algorithms to distinguish wetlands and riparian areas from the rest of the landscape.
Abstract: The location and distribution of wetlands and riparian zones influence the ecological functions present on a landscape. Accurate and easily reproducible land-cover maps enable monitoring of land-management decisions and ultimately a greater understanding of landscape ecology. Multi-season Landsat ETM+ imagery from 2001 combined with ancillary topographic and soils data were used to map wetland and riparian systems in the Gallatin Valley of Southwest Montana, USA. Classification Tree Analysis (CTA) and Stochastic Gradient Boosting (SGB) decision-tree-based classification algorithms were used to distinguish wetlands and riparian areas from the rest of the landscape. CTA creates a single classification tree using a one-step-look-ahead procedure to reduce variance. SGB uses classification errors to refine tree development and incorporates multiple tree results into a single best classification. The SGB classification (86.0% overall accuracy) was more effective than CTA (73.1% overall accuracy) at detecting a variety of wetlands and riparian zones present on this landscape.

Journal ArticleDOI
TL;DR: A novel and effective technique to perform the task of Web data extraction automatically, called DEPTA, which consists of two steps: identifying individual records in a page and aligning and extracting data items from the identified records.
Abstract: This paper studies the problem of structured data extraction from arbitrary Web pages. The objective of the proposed research is to automatically segment data records in a page, extract data items/fields from these records, and store the extracted data in a database. Existing methods addressing the problem can be classified into three categories. Methods in the first category provide some languages to facilitate the construction of data extraction systems. Methods in the second category use machine learning techniques to learn wrappers (which are data extraction programs) from human labeled examples. Manual labeling is time-consuming and is hard to scale to a large number of sites on the Web. Methods in the third category are based on the idea of automatic pattern discovery. However, multiple pages that conform to a common schema are usually needed as the input. In this paper, we propose a novel and effective technique (called DEPTA) to perform the task of Web data extraction automatically. The method consists of two steps: 1) identifying individual records in a page and 2) aligning and extracting data items from the identified records. For step 1, a method based on visual information and tree matching is used to segment data records. For step 2, a novel partial alignment technique is proposed. This method aligns only those data items in a pair of records that can be aligned with certainty, making no commitment on the rest of the items. Experimental results obtained using a large number of Web pages from diverse domains show that the proposed two-step technique is highly effective

01 Jan 2006
TL;DR: An extraordinary level of bacterial biodiversity in the tree leaf canopy of a tropical Atlantic forest is found by using culture-independent molecular methods and suggests that each tree species selects for a distinct microbial community.
Abstract: We found an extraordinary level of bacterial biodiversity in the tree leaf canopy of a tropical Atlantic forest by using culture-independent molecular methods. Our survey suggests that each tree species selects for a distinct microbial community. Analysis of the bacterial 16S ribosomal RNA gene sequences revealed that about 97% of the bacteria were unknown species and that the phyllosphere of any one tree species carries at least 95 to 671 bacterial species. The tree canopies of tropical forests likely represent a large reservoir of unexplored microbial diversity.

Journal ArticleDOI
TL;DR: Clearcut implements RNJ as a C program, which takes either a set of aligned sequences or a pre-computed distance matrix as input and produces a phylogenetic tree and can reconstruct phylogenies using an extremely fast standard NJ implementation.
Abstract: Summary: Clearcut is an open source implementation for the relaxed neighbor joining (RNJ) algorithm. While traditional neighbor joining (NJ) remains a popular method for distance-based phylogenetic tree reconstruction, it suffers from a O(N3) time complexity, where N represents the number of taxa in the input. Due to this steep asymptotic time complexity, NJ cannot reasonably handle very large datasets. In contrast, RNJ realizes a typical-case time complexity on the order of N2logN without any significant qualitative difference in output. RNJ is particularly useful when inferring a very large tree or a large number of trees. In addition, RNJ retains the desirable property that it will always reconstruct the true tree given a matrix of additive pairwise distances. Clearcut implements RNJ as a C program, which takes either a set of aligned sequences or a pre-computed distance matrix as input and produces a phylogenetic tree. Alternatively, Clearcut can reconstruct phylogenies using an extremely fast standard NJ implementation. Availability: Clearcut source code is available for download at: http://bioinformatics.hungry.com/clearcut Contact: sheneman@hungry.com Supplementary information: http://bioinformatics.hungry.com/clearcut

Proceedings Article
08 May 2006
TL;DR: A new geographic routing algorithm, Greedy Distributed Spanning Tree Routing (GDSTR), that finds shorter routes and generates less maintenance traffic than previous algorithms, and requires an order of magnitude less bandwidth to maintain its trees than CLDP.
Abstract: We present a new geographic routing algorithm, Greedy Distributed Spanning Tree Routing (GDSTR), that finds shorter routes and generates less maintenance traffic than previous algorithms. While geographic routing potentially scales well, it faces the problem of what to do at local dead ends where greedy forwarding fails. Existing geographic routing algorithms handle dead ends by planarizing the node connectivity graph and then using the right-hand rule to route around the resulting faces. GDSTR handles this situation differently by switching instead to routing on a spanning tree until it reaches a point where greedy forwarding can again make progress. In order to choose a direction on the tree that is most likely to make progress towards the destination, each GDSTR node maintains a summary of the area covered by the subtree below each of its tree neighbors. While GDSTR requires only one tree for correctness, it uses two for robustness and to give it an additional forwarding choice. Our simulations show that GDSTR finds shorter routes than geographic face routing algorithms: GDSTR's stretch is up to 20% less than the best existing algorithm in situations where dead ends are common. In addition, we show that GDSTR requires an order of magnitude less bandwidth to maintain its trees than CLDP, the only distributed planarization algorithm that is known to work with practical radio networks.

Proceedings ArticleDOI
12 Nov 2006
TL;DR: This work constitutes the first implementation of a synthesis algorithm for full LTL by careful optimization of all intermediate automata, and uses an incremental algorithm to compute the emptiness of nondeterministic Buchi tree automata.
Abstract: We present an approach to automatic synthesis of specifications given in linear time logic. The approach is based on a translation through universal co-Buchi tree automata and alternating weak tree automata (O. Kupferman and M. Vardi, 2005). By careful optimization of all intermediate automata, we achieve a major improvement in performance. We present several optimization techniques for alternating tree automata, including a game-based approximation to language emptiness and a simulation-based optimization. Furthermore, we use an incremental algorithm to compute the emptiness of nondeterministic Buchi tree automata. All our optimizations are computed in time polynomial in the size of the automaton on which they are computed. We have applied our implementation to several examples and show a significant improvement over the straightforward implementation. Although our examples are still small, this work constitutes the first implementation of a synthesis algorithm for full LTL. We believe that the optimizations discussed here form an important step towards making LTL synthesis practical

Journal ArticleDOI
TL;DR: It is proved that even in this simple case, the optimization problem is NP-hard, and some efficient, scalable, and distributed heuristic approximation algorithms are proposed for solving this problem and the total transmission cost can be significantly improved over direct transmission or the shortest path tree.
Abstract: We consider the problem of correlated data gathering by a network with a sink node and a tree-based communication structure, where the goal is to minimize the total transmission cost of transporting the information collected by the nodes, to the sink node. For source coding of correlated data, we consider a joint entropy-based coding model with explicit communication where coding is simple and the transmission structure optimization is difficult. We first formulate the optimization problem definition in the general case and then we study further a network setting where the entropy conditioning at nodes does not depend on the amount of side information, but only on its availability. We prove that even in this simple case, the optimization problem is NP-hard. We propose some efficient, scalable, and distributed heuristic approximation algorithms for solving this problem and show by numerical simulations that the total transmission cost can be significantly improved over direct transmission or the shortest path tree. We also present an approximation algorithm that provides a tree transmission structure with total cost within a constant factor from the optimal.

Proceedings ArticleDOI
22 Apr 2006
TL;DR: A novel approach is described for tree visualization using nested circles where the brother nodes at the same level are represented by externally tangent circles; the tree nodes at different levels are displayed by using 2D nested circles or 3D nested cylinders.
Abstract: In this paper a novel approach is described for tree visualization using nested circles. The brother nodes at the same level are represented by externally tangent circles; the tree nodes at different levels are displayed by using 2D nested circles or 3D nested cylinders. A new layout algorithm for tree structure is described. It provides a good overview for large data sets. It is easy to see all the branches and leaves of the tree. The new method has been applied to the visualization of file systems.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: A novel tree structure, called DSTree (Data Stream Tree), that captures important data from the streams by exploiting its nice properties and can be easily maintained and mined for frequent itemsets as well as various other patterns like constrained itemsets.
Abstract: With advances in technology, a flood of data can be produced in many applications such as sensor networks and Web click streams. This calls for efficient techniques for extracting useful information from streams of data. In this paper, we propose a novel tree structure, called DSTree (Data Stream Tree), that captures important data from the streams. By exploiting its nice properties, the DSTree can be easily maintained and mined for frequent itemsets as well as various other patterns like constrained itemsets.

Proceedings ArticleDOI
15 May 2006
TL;DR: A variant of the Rapidly-Exploring Random Tree (RRT) path planning algorithm that is able to explore narrow passages or difficult areas more effectively and shows that both workspace obstacle information and C-space information can be used when deciding which direction to grow.
Abstract: Tree-based path planners have been shown to be well suited to solve various high dimensional motion planning problems. Here we present a variant of the Rapidly-Exploring Random Tree (RRT) path planning algorithm that is able to explore narrow passages or difficult areas more effectively. We show that both workspace obstacle information and C-space information can be used when deciding which direction to grow. The method includes many ways to grow the tree, some taking into account the obstacles in the environment. This planner works best in difficult areas when planning for free flying rigid or articulated robots. Indeed, whereas the standard RRT can face difficulties planning in a narrow passage, the tree based planner presented here works best in these areas

Proceedings Article
01 Jan 2006
TL;DR: Through simulation, it is shown that Chunkyspread can control load to within a few percent of a heterogeneous target load, and how this can be traded off for improvements in latency and tit-for-tat incentives.
Abstract: The latest debate in P2P and overlay multicast systems is whether or not to build trees. The main argument on the anti-tree side is that tree construction is complex, and that trees are fragile. The main counter-argument is that non-tree systems have a lot of overhead. In this paper, we argue that you can have it both ways: that one can build multi-tree systems with simple and scalable algorithms, and can still yield fast convergence and robustness. This paper presents Chunkyspread, a multi-tree, heterogeneous P2P multicast algorithm based on an unstructured overlay. Through simulation, we show that Chunkyspread can control load to within a few percent of a heterogeneous target load, and how this can be traded off for improvements in latency and tit-for-tat incentives.

Journal ArticleDOI
TL;DR: In this paper, the authors provide computational strategies for obtaining full semiparametric inference for mixtures of finite polya tree models given a standard parameterization, including models that would be troublesome to fit using Dirichlet process mixtures.
Abstract: Mixtures of Polya tree models provide a flexible alternative when a parametric model may only hold approximately. I provide computational strategies for obtaining full semiparametric inference for mixtures of finite Polya tree models given a standard parameterization, including models that would be troublesome to fit using Dirichlet process mixtures. Recommendations are put forth on choosing the level of a finite Polya tree, and model comparison is discussed. Several examples demonstrate the utility of finite Polya tree modeling, including data fit to generalized linear mixed models and several survival models.

Proceedings ArticleDOI
03 Apr 2006
TL;DR: This paper proposes a new Peer-to- Peer framework based on a balanced tree structure overlay, which can support extensible centralized mapping methods and query processingbased on a variety of multidimensional tree structures, including R-Tree, X- Tree, SSTree, and M-Tree.
Abstract: Multi-dimensional data indexing has received much attention in a centralized database. However, not so much work has been done on this topic in the context of Peerto- Peer systems. In this paper, we propose a new Peer-to- Peer framework based on a balanced tree structure overlay, which can support extensible centralized mapping methods and query processing based on a variety of multidimensional tree structures, including R-Tree, X-Tree, SSTree, and M-Tree. Specifically, in a network with N nodes, our framework guarantees that point queries and range queries can be answered within O(logN) hops. We also provide an effective load balancing strategy to allow nodes to balance their work load efficiently. An experimental assessment validates the practicality of our proposal.

Patent
02 May 2006
TL;DR: In this article, the authors present a system for performing fuzzy search of a tree data structure, where the tree is traversed in response to the search request and nodes of the tree are examined using a function or set of rules to generate a score.
Abstract: The subject disclosure pertains to systems and methods for performing fuzzy searches of a tree data structure. A search request can include a search term or terms and search conditions. The tree is traversed in response to the search request and nodes of the tree are examined using a function or set of rules to generate a score. The score reflects the probability that the current node is a match to the search term and can be used to determine the search results to be returned. Due to the organization of the tree, if the score indicates that the current node is not a possible match, child nodes of the current node will not be possible matches. Therefore, the traversal of the current node and its children can be terminated.

Journal ArticleDOI
01 Jan 2006
TL;DR: The problem of constructing accurate decision tree models from data streams is studied with respect to drift, noise, the order of examples, and the initial parameters in different problems and VFDTc is extended with the ability to deal with concept drift.
Abstract: In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. We have extended VFDT in three directions: the ability to deal with continuous data; the use of more powerful classification techniques at tree leaves, and the ability to detect and react to concept drift. VFDTc system can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datasets. This is relevant due to the any-time property. We also extend VFDTc with the ability to deal with concept drift, by continuously monitoring differences between two class-distribution of the examples: the distribution when a node was built and the distribution in a time window of the most recent examples. We study the sensitivity of VFDTc with respect to drift, noise, the order of examples, and the initial parameters in different problems and demonstrate its utility in large and medium data sets.

Journal ArticleDOI
TL;DR: A fast variant of NJ called relaxed neighbor joining (RNJ) is developed and experiments indicate that RNJ is a reasonable alternative to NJ and that it is especially well suited for uses that involve large numbers of taxa or highly repetitive procedures such as bootstrapping.
Abstract: Our ability to construct very large phylogenetic trees is becoming more important as vast amounts of sequence data are becoming readily available. Neighbor joining (NJ) is a widely used distance-based phylogenetic tree construction method that has historically been considered fast, but it is prohibitively slow for building trees from increasingly large datasets. We developed a fast variant of NJ called relaxed neighbor joining (RNJ) and performed experiments to measure the speed improvement over NJ. Since repeated runs of the RNJ algorithm generate a superset of the trees that repeated NJ runs generate, we also assessed tree quality. RNJ is dramatically faster than NJ, and the quality of resulting trees is very similar for the two algorithms. The results indicate that RNJ is a reasonable alternative to NJ and that it is especially well suited for uses that involve large numbers of taxa or highly repetitive procedures such as bootstrapping.

Journal ArticleDOI
TL;DR: Indices quantifying spatial forest structure are frequently used to monitor spatial aspects of tree attributes including biodiversity in research plots of limited size and the treatment of edge trees is investigated.
Abstract: Indices quantifying spatial forest structure are frequently used to monitor spatial aspects of tree attributes including biodiversity in research plots of limited size. The treatment of edge trees,...

Book ChapterDOI
18 Sep 2006
TL;DR: In this paper, an empirical study of decision tree approaches to hierarchical multilabel classification (HMC) in the area of functional genomics is presented. And it turns out that HMC tree learning is more robust to overfitting than regular tree learning.
Abstract: Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes organized in a hierarchy. The task is relevant for several application domains. This paper presents an empirical study of decision tree approaches to HMC in the area of functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to learning a set of regular classification trees (one for each class). Interestingly, on all 12 datasets we use, the HMC tree wins on all fronts: it is faster to learn and to apply, easier to interpret, and has similar or better predictive performance than the set of regular trees. It turns out that HMC tree learning is more robust to overfitting than regular tree learning.