An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Home
/
Papers
/
An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Posted Content•

An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Ming-Yang Kao¹, Tak-Wah Lam², Wing-Kin Sung², Hing-Fung Ting²•Institutions (2)

Yale University¹, University of Hong Kong²

14 Jan 2001-arXiv: Computer Vision and Pattern Recognition-

TL;DR: In this article, the authors presented an algorithm for comparing trees that are labeled in an arbitrary manner, which is faster than the previous algorithms and is at the core of their maximum agreement subtree algorithm.

read less

Abstract: A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are node-unbalanced or weight-unbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Pairwise global alignment of protein interaction networks by matching neighborhood topology

[...]

Rohit Singh¹, Jinbo Xu², Bonnie Berger¹•Institutions (2)

Massachusetts Institute of Technology¹, Toyota Technological Institute²

21 Apr 2007

TL;DR: An algorithm for global alignment of two protein-protein interaction (PPI) networks is described, guided by the intuition that a protein should be matched with a protein in the other network if and only if the neighbors of the two proteins can also be well matched, and the results of global alignment are interpreted to identify functional orthologs between yeast and fly.

...read moreread less

Abstract: We describe an algorithm, IsoRank, for global alignment of two protein-protein interaction (PPI) networks. IsoRank aims to maximize the overall match between the two networks; in contrast, much of previous work has focused on the local alignment problem-- identifying many possible alignments, each corresponding to a local region of similarity. IsoRank is guided by the intuition that a protein should be matched with a protein in the other network if and only if the neighbors of the two proteins can also be well matched. We encode this intuition as an eigenvalue problem, in a manner analogous to Google's PageRank method. We use IsoRank to compute the first known global alignment between the S. cerevisiae and D. melanogaster PPI networks. The common subgraph has 1420 edges and describes conserved functional components between the two species. Comparisons of our results with those of a well-known algorithm for local network alignment indicate that the globally optimized alignment resolves ambiguity introduced by multiple local alignments. Finally, we interpret the results of global alignment to identify functional orthologs between yeast and fly; our functional ortholog prediction method is much simpler than a recently proposed approach and yet provides results that are more comprehensive.

...read moreread less

338 citations

Journal Article•DOI•

Computing the maximum agreement of phylogenetic networks

[...]

Charles Choy¹, Jesper Jansson¹, Kunihiko Sadakane², Wing-Kin Sung¹•Institutions (2)

National University of Singapore¹, Kyushu University²

20 May 2005-Theoretical Computer Science

TL;DR: The maximum agreement phylogenetic subnetwork problem (MASN) is introduced and it is proved that the problem is NP-hard even if restricted to three phylogenetic networks and an O(n2)-time algorithm is given for the special case of two level-1 phylogenetics networks.

...read moreread less

87 citations

Journal Article•DOI•

Rooted Maximum Agreement Supertrees

[...]

Jesper Jansson¹, Joseph Ng¹, Kunihiko Sadakane², Wing-Kin Sung¹•Institutions (2)

National University of Singapore¹, Kyushu University²

01 Dec 2005-Algorithmica

TL;DR: It is proved that MASP is NP-hard for any fixed $k \geq 3$ when $D$ is unrestricted, and also NP- hard forAny fixed $D \geqi 2$ when £k is unrestricted even if each input tree is required to contain at most three leaves.

...read moreread less

Abstract: Given a set $\T$ of rooted, unordered trees, where each $T_i \in \T$ is distinctly leaf-labeled by a set $\Lambda(T_i)$ and where the sets $\Lambda(T_i)$ may overlap, the maximum agreement supertree problem~(MASP) is to construct a distinctly leaf-labeled tree $Q$ with leaf set $\Lambda(Q) \subseteq $\cup$_{T_i \in \T} \Lambda(T_i)$ such that $|\Lambda(Q)|$ is maximized and for each $T_i \in \T$, the topological restriction of $T_i$ to $\Lambda(Q)$ is isomorphic to the topological restriction of $Q$ to $\Lambda(T_i)$. Let $n = \left| $\cup$_{T_i \in \T} \Lambda(T_i)\right|$, $k = |\T|$, and $D = \max_{T_i \in \T}\{\deg(T_i)\}$. We first show that MASP with $k = 2$ can be solved in $O(\sqrt{D} n \log (2n/D))$ time, which is $O(n \log n)$ when $D = O(1)$ and $O(n^{1.5})$ when $D$ is unrestricted. We then present an algorithm for MASP with $D = 2$ whose running time is polynomial if $k = O(1)$. On the other hand, we prove that MASP is NP-hard for any fixed $k \geq 3$ when $D$ is unrestricted, and also NP-hard for any fixed $D \geq 2$ when $k$ is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomial-time $(n/\!\log n)$-approximation algorithm for MASP.

...read moreread less

46 citations

Journal Article•

Rooted maximum agreement supertrees

[...]

Jesper Jansson¹, Joseph Ng¹, Kunihiko Sadakane², Wing-Kin Sung¹•Institutions (2)

National University of Singapore¹, Kyushu University²

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: The maximum agreement supertree problem (MASP) is proved to be NP-hard for any fixed k ≥ 3 when D is unrestricted, and also NP- hard forAny fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves.

...read moreread less

Abstract: Given a set T of rooted, unordered trees, where each T i ∈ T is distinctly leaf-labeled by a set A(T i ) and where the sets Λ(T i ) may overlap, the maximum agreement supertree problem (MASP) is to construct a distinctly leaf-labeled tree Q with leaf set A(Q) ⊆ ∪ Ti ∈ T Λ(T i ) such that |Λ(Q)| is maximized and for each T i ∈ T, the topological restriction of T i to A(Q) is isomorphic to the topological restriction of Q to Λ(T i ). Let n = |U Ti ∈ T Λ(T i )|, k = |T|, and D = max Ti ∈ T {deg(T i )}. We first show that MASP with k = 2 can be solved in O(√D n log(2n/D)) time, which is O(n log n) when D = O(1) and O(n 1.5 ) when D is unrestricted. We then present an algorithm for MASP with D = 2 whose running time is polynomial if k = O(1). On the other hand, we prove that MASP is NP-hard for any fixed k ≥ 3 when D is unrestricted, and also NP-hard for any fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomial-time (n/ log n)-approximation algorithm for MASP.

...read moreread less

40 citations

Journal Article•DOI•

Maximum agreement and compatible supertrees

[...]

Vincent Berry¹, François Nicolas²•Institutions (2)

University of Montpellier¹, University of Helsinki²

01 Sep 2007-Journal of Discrete Algorithms

TL;DR: This paper proposes extensions of MAST and MCT to the context of supertree inference, where input trees have non-identical leaf sets, and shows that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.

...read moreread less

39 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

References

PDF

Open Access

More filters

Book•

Introduction to Algorithms

[...]

Thomas H. Cormen¹, Charles E. Leiserson¹, Ronald L. Rivest¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1990

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.

...read moreread less

Abstract: From the Publisher: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures. Like the first edition,this text can also be used for self-study by technical professionals since it discusses engineering issues in algorithm design as well as the mathematical aspects. In its new edition,Introduction to Algorithms continues to provide a comprehensive introduction to the modern study of algorithms. The revision has been updated to reflect changes in the years since the book's original publication. New chapters on the role of algorithms in computing and on probabilistic analysis and randomized algorithms have been included. Sections throughout the book have been rewritten for increased clarity,and material has been added wherever a fuller explanation has seemed useful or new information warrants expanded coverage. As in the classic first edition,this new edition of Introduction to Algorithms presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers. Further,the algorithms are presented in pseudocode to make the book easily accessible to students from all programming language backgrounds. Each chapter presents an algorithm,a design technique,an application area,or a related topic. The chapters are not dependent on one another,so the instructor can organize his or her use of the book in the way that best suits the course's needs. Additionally,the new edition offers a 25% increase over the first edition in the number of problems,giving the book 155 problems and over 900 exercises thatreinforcethe concepts the students are learning.

...read moreread less

21,651 citations

Journal Article•DOI•

Formal methods in the study of language

[...]

Jeroen Groenendijk, Theo Janssen, Martin Stokhof

01 Jun 1983-Language

652 citations

Journal Article•DOI•

Faster scaling algorithms for network problems

[...]

Harold N. Gabow, Robert E. Tarjan

01 Oct 1989-SIAM Journal on Computing

TL;DR: This paper presents algorithms for the assignment problem, the transportation problem, and the minimum- cost flow problem of operations research that find a minimum-cost solution, yet run in time close to the best-known bounds for the corresponding problems without costs.

...read moreread less

Abstract: This paper presents algorithms for the assignment problem, the transportation problem, and the minimum-cost flow problem of operations research. The algorithms find a minimum-cost solution, yet run in time close to the best-known bounds for the corresponding problems without costs. For example, the assignment problem (equivalently, minimum-cost matching in a bipartite graph) can be solved in $O(\sqrt {nm} \log (nN))$ time, where $n,m$, and N denote the number of vertices, number of edges, and largest magnitude of a cost; costs are assumed to be integral. The algorithms work by scaling. As in the work of Goldberg and Tarjan, in each scaled problem an approximate optimum solution is found, rather than an exact optimum.

...read moreread less

457 citations

Journal Article•DOI•

Comparing multiple RNA secondary structures using tree comparisons

[...]

Bruce A. Shapiro¹, Kaizhong Zhang²•Institutions (2)

National Institutes of Health¹, University of Cambridge²

01 Oct 1990-Bioinformatics

TL;DR: This paper presents another approach to the problem of comparing many secondary structures by utilizing a very efficient tree-matching algorithm that will compare two trees in O([T1] X [T2] X L1 X L2) in the worst case and very close to O[T1?] for average trees representing secondary structures.

...read moreread less

Abstract: In a previous paper, an algorithm was presented for analyzing multiple RNA secondary structures utilizing a multiple string alignment algorithm. In this paper we present another approach to the problem of comparing many secondary structures by utilizing a very efficient tree-matching algorithm that will compare two trees in O([T1] X [T2] X L1 X L2) in the worst case and very close to O([T1] X [T2]) for average trees representing secondary structures. The result of the pairwise comparison algorithm is then used with a cluster algorithm to produce a multiple structure clustering which can be displayed in a taxonomy tree to show related structures.

...read moreread less

346 citations

Journal Article•DOI•

Obtaining common pruned trees

[...]

C. R. Finden¹, A. D. Gordon¹•Institutions (1)

University of St Andrews¹

01 Dec 1985-Journal of Classification

TL;DR: The tree obtained by regrafting branches on to a largest common pruned tree is shown to contain all the classes present in the strict consensus tree.

...read moreread less

Abstract: Given two or more dendrograms (rooted tree diagrams) based on the same set of objects, ways are presented of defining and obtaining common pruned trees. Bounds on the size of a largest common pruned tree are introduced, as is a categorization of objects according to whether they belong to all, some, or no largest common pruned trees. Also described is a procedure for regrafting pruned branches, yielding trees for which one can assess the reliability of the depicted relationships. The tree obtained by regrafting branches on to a largest common pruned tree is shown to contain all the classes present in the strict consensus tree. The theory is illustrated by application to two classifications of a set of forty-nine stratigraphical pollen spectra.

...read moreread less

221 citations