scispace - formally typeset
Search or ask a question
Posted Content

An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

TL;DR: In this article, the authors presented an algorithm for comparing trees that are labeled in an arbitrary manner, which is faster than the previous algorithms and is at the core of their maximum agreement subtree algorithm.
Abstract: A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are node-unbalanced or weight-unbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm.
Citations
More filters
Book ChapterDOI
01 Jan 2008
TL;DR: The maximum agreement subtree problem for k trees (k-MAST) as mentioned in this paper is a generalization of a similar problem for two trees (MAST), where a tuple of k rooted leaf-labeled trees (T1, T2 : : : Tk) is considered.
Abstract: The maximum agreement subtree problem for k trees (k-MAST) is a generalization of a similar problem for two trees (MAST). Consider a tuple of k rooted leaf-labeled trees (T1; T2 : : : Tk). Let A D fa1; a2; : : : ang be the set of leaf labels. Any subset B A uniquely determines the socalled topological restriction T jB of the three T to B . Namely, T jB is the topological subtree of T spanned by all leaves labeled with elements from B and the lowest common ancestors of all pairs of these leaves. In particular, the ancestor relation in T jB is defined so that it agrees with the ancestor relation in T . A subset B of A such T 1 jB; : : : ; T k jB are isomorphic is called an agreement set.

1 citations

Dissertation
08 Dec 2008
TL;DR: The research work presented in this manuscript is of algorithmic kind: it is mainly composed of polynomial, fixed parameter and approximation algorithms, while hardness results are also mentioned.
Abstract: The research work presented in this manuscript is of algorithmic kind: it is mainly composed of polynomial, fixed parameter and approximation algorithms, while hardness results are also mentioned. This work is about building and comparing labelled trees. These objects find application in different areas, but notoriously in phylogenetics, where they represent evolutionary relationships of organisms or sequences. Most of this work can be considered as investigating solutions to so-called \emph{supertree} problems. Supertrees are large trees built by a dynamic programming approach from smaller trees. For instance, the latter are gene trees from which a comprehensive tree on many living species is to be built, such as the \emph{Tree of Life}. First definitions are introduced, then a part of the manuscript is dedicated to quartet tree building methods. The next part details tree comparison methods, mainly variants of the maximum agreement subtree method. Next follows a part on supertree problems in all generality. The manuscript ends with a report of the research plan for the next few years. Several journal papers illustrating the material described in this manuscript are adjoined in appendix.
Proceedings ArticleDOI
29 Jul 2013
TL;DR: This work study anti-unification for unranked terms and hedges, permitting context and hedge variables, and presents a rule based system in Huet’s style, which computes a set of generalizations of input hedges and records all the dierences.
Abstract: In this work we study anti-unification for unranked terms and hedges, permitting context and hedge variables. Hedges are sequences of unranked terms. The anti-unification problem of two hedges ~ s and ~ is concerned with finding their generalization, a hedge ~ such that both ~ and ~ are instances of ~ g under some substitutions. Context variables are used to abstract vertical dierences in the input hedges, and hedge variables are used to abstract horizontal dierences. A rule based system in Huet’s style will be presented, which computes a set of generalizations of input hedges and records all the dierences. The computed generalizations are least general among a certain class of generalizations.
Posted Content
TL;DR: In this paper, a bipartite matching algorithm was proposed to solve the maximum common embeddable subtree problem in labeled and unlabeled rooted trees, where the sought embedding is maximal with regard to a weight function on pairs of labels.
Abstract: The largest common embeddable subtree problem asks for the largest possible tree embeddable into two input trees and generalizes the classical maximum common subtree problem. Several variants of the problem in labeled and unlabeled rooted trees have been studied, e.g., for the comparison of evolutionary trees. We consider a generalization, where the sought embedding is maximal with regard to a weight function on pairs of labels. We support rooted and unrooted trees with vertex and edge labels as well as distance penalties for skipping vertices. This variant is important for many applications such as the comparison of chemical structures and evolutionary trees. Our algorithm computes the solution from a series of bipartite matching instances, which are solved efficiently by exploiting their structural relation and imbalance. Our analysis shows that our approach improves or matches the running time of the formally best algorithms for several problem variants. Specifically, we obtain a running time of $\mathcal O(|T|\,|T'|\Delta)$ for two rooted or unrooted trees $T$ and $T'$, where $\Delta=\min\{\Delta(T),\Delta(T')\}$ with $\Delta(X)$ the maximum degree of $X$. If the weights are integral and at most $C$, we obtain a running time of $\mathcal O(|T|\,|T'|\sqrt\Delta\log (C\min\{|T|,|T'|\}))$ for rooted trees.
Book ChapterDOI
15 Jul 2019
TL;DR: Plagiarism is prevalent in most undergraduate programming courses, including those where more advanced programming is taught, and typical strategies used to avoid detection include changing variable names and adding empty spaces or comments to the code.
Abstract: Plagiarism is prevalent in most undergraduate programming courses, including those where more advanced programming is taught. Typical strategies used to avoid detection include changing variable names and adding empty spaces or comments to the code. Although these changes affect the visual components of the source code, the underlying structure of the code remains the same. This similarity in structure can indicate the presence of plagiarism.
References
More filters
Book
01 Jan 1990
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Abstract: From the Publisher: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures. Like the first edition,this text can also be used for self-study by technical professionals since it discusses engineering issues in algorithm design as well as the mathematical aspects. In its new edition,Introduction to Algorithms continues to provide a comprehensive introduction to the modern study of algorithms. The revision has been updated to reflect changes in the years since the book's original publication. New chapters on the role of algorithms in computing and on probabilistic analysis and randomized algorithms have been included. Sections throughout the book have been rewritten for increased clarity,and material has been added wherever a fuller explanation has seemed useful or new information warrants expanded coverage. As in the classic first edition,this new edition of Introduction to Algorithms presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers. Further,the algorithms are presented in pseudocode to make the book easily accessible to students from all programming language backgrounds. Each chapter presents an algorithm,a design technique,an application area,or a related topic. The chapters are not dependent on one another,so the instructor can organize his or her use of the book in the way that best suits the course's needs. Additionally,the new edition offers a 25% increase over the first edition in the number of problems,giving the book 155 problems and over 900 exercises thatreinforcethe concepts the students are learning.

21,651 citations

Journal ArticleDOI
TL;DR: This paper presents algorithms for the assignment problem, the transportation problem, and the minimum- cost flow problem of operations research that find a minimum-cost solution, yet run in time close to the best-known bounds for the corresponding problems without costs.
Abstract: This paper presents algorithms for the assignment problem, the transportation problem, and the minimum-cost flow problem of operations research. The algorithms find a minimum-cost solution, yet run in time close to the best-known bounds for the corresponding problems without costs. For example, the assignment problem (equivalently, minimum-cost matching in a bipartite graph) can be solved in $O(\sqrt {nm} \log (nN))$ time, where $n,m$, and N denote the number of vertices, number of edges, and largest magnitude of a cost; costs are assumed to be integral. The algorithms work by scaling. As in the work of Goldberg and Tarjan, in each scaled problem an approximate optimum solution is found, rather than an exact optimum.

457 citations

Journal ArticleDOI
TL;DR: This paper presents another approach to the problem of comparing many secondary structures by utilizing a very efficient tree-matching algorithm that will compare two trees in O([T1] X [T2] X L1 X L2) in the worst case and very close to O[T1?] for average trees representing secondary structures.
Abstract: In a previous paper, an algorithm was presented for analyzing multiple RNA secondary structures utilizing a multiple string alignment algorithm. In this paper we present another approach to the problem of comparing many secondary structures by utilizing a very efficient tree-matching algorithm that will compare two trees in O([T1] X [T2] X L1 X L2) in the worst case and very close to O([T1] X [T2]) for average trees representing secondary structures. The result of the pairwise comparison algorithm is then used with a cluster algorithm to produce a multiple structure clustering which can be displayed in a taxonomy tree to show related structures.

346 citations

Journal ArticleDOI
TL;DR: The tree obtained by regrafting branches on to a largest common pruned tree is shown to contain all the classes present in the strict consensus tree.
Abstract: Given two or more dendrograms (rooted tree diagrams) based on the same set of objects, ways are presented of defining and obtaining common pruned trees. Bounds on the size of a largest common pruned tree are introduced, as is a categorization of objects according to whether they belong to all, some, or no largest common pruned trees. Also described is a procedure for regrafting pruned branches, yielding trees for which one can assess the reliability of the depicted relationships. The tree obtained by regrafting branches on to a largest common pruned tree is shown to contain all the classes present in the strict consensus tree. The theory is illustrated by application to two classifications of a set of forty-nine stratigraphical pollen spectra.

221 citations