An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Home
/
Papers
/
An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Posted Content•

An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

Ming-Yang Kao¹, Tak-Wah Lam², Wing-Kin Sung², Hing-Fung Ting²•Institutions (2)

Yale University¹, University of Hong Kong²

14 Jan 2001-arXiv: Computer Vision and Pattern Recognition-

TL;DR: In this article, the authors presented an algorithm for comparing trees that are labeled in an arbitrary manner, which is faster than the previous algorithms and is at the core of their maximum agreement subtree algorithm.

read less

Abstract: A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are node-unbalanced or weight-unbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Pairwise global alignment of protein interaction networks by matching neighborhood topology

[...]

Rohit Singh¹, Jinbo Xu², Bonnie Berger¹•Institutions (2)

Massachusetts Institute of Technology¹, Toyota Technological Institute²

21 Apr 2007

TL;DR: An algorithm for global alignment of two protein-protein interaction (PPI) networks is described, guided by the intuition that a protein should be matched with a protein in the other network if and only if the neighbors of the two proteins can also be well matched, and the results of global alignment are interpreted to identify functional orthologs between yeast and fly.

...read moreread less

Abstract: We describe an algorithm, IsoRank, for global alignment of two protein-protein interaction (PPI) networks. IsoRank aims to maximize the overall match between the two networks; in contrast, much of previous work has focused on the local alignment problem-- identifying many possible alignments, each corresponding to a local region of similarity. IsoRank is guided by the intuition that a protein should be matched with a protein in the other network if and only if the neighbors of the two proteins can also be well matched. We encode this intuition as an eigenvalue problem, in a manner analogous to Google's PageRank method. We use IsoRank to compute the first known global alignment between the S. cerevisiae and D. melanogaster PPI networks. The common subgraph has 1420 edges and describes conserved functional components between the two species. Comparisons of our results with those of a well-known algorithm for local network alignment indicate that the globally optimized alignment resolves ambiguity introduced by multiple local alignments. Finally, we interpret the results of global alignment to identify functional orthologs between yeast and fly; our functional ortholog prediction method is much simpler than a recently proposed approach and yet provides results that are more comprehensive.

...read moreread less

338 citations

Journal Article•DOI•

Computing the maximum agreement of phylogenetic networks

[...]

Charles Choy¹, Jesper Jansson¹, Kunihiko Sadakane², Wing-Kin Sung¹•Institutions (2)

National University of Singapore¹, Kyushu University²

20 May 2005-Theoretical Computer Science

TL;DR: The maximum agreement phylogenetic subnetwork problem (MASN) is introduced and it is proved that the problem is NP-hard even if restricted to three phylogenetic networks and an O(n2)-time algorithm is given for the special case of two level-1 phylogenetics networks.

...read moreread less

87 citations

Journal Article•DOI•

Rooted Maximum Agreement Supertrees

[...]

Jesper Jansson¹, Joseph Ng¹, Kunihiko Sadakane², Wing-Kin Sung¹•Institutions (2)

National University of Singapore¹, Kyushu University²

01 Dec 2005-Algorithmica

TL;DR: It is proved that MASP is NP-hard for any fixed $k \geq 3$ when $D$ is unrestricted, and also NP- hard forAny fixed $D \geqi 2$ when £k is unrestricted even if each input tree is required to contain at most three leaves.

...read moreread less

Abstract: Given a set $\T$ of rooted, unordered trees, where each $T_i \in \T$ is distinctly leaf-labeled by a set $\Lambda(T_i)$ and where the sets $\Lambda(T_i)$ may overlap, the maximum agreement supertree problem~(MASP) is to construct a distinctly leaf-labeled tree $Q$ with leaf set $\Lambda(Q) \subseteq $\cup$_{T_i \in \T} \Lambda(T_i)$ such that $|\Lambda(Q)|$ is maximized and for each $T_i \in \T$, the topological restriction of $T_i$ to $\Lambda(Q)$ is isomorphic to the topological restriction of $Q$ to $\Lambda(T_i)$. Let $n = \left| $\cup$_{T_i \in \T} \Lambda(T_i)\right|$, $k = |\T|$, and $D = \max_{T_i \in \T}\{\deg(T_i)\}$. We first show that MASP with $k = 2$ can be solved in $O(\sqrt{D} n \log (2n/D))$ time, which is $O(n \log n)$ when $D = O(1)$ and $O(n^{1.5})$ when $D$ is unrestricted. We then present an algorithm for MASP with $D = 2$ whose running time is polynomial if $k = O(1)$. On the other hand, we prove that MASP is NP-hard for any fixed $k \geq 3$ when $D$ is unrestricted, and also NP-hard for any fixed $D \geq 2$ when $k$ is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomial-time $(n/\!\log n)$-approximation algorithm for MASP.

...read moreread less

46 citations

Journal Article•

Rooted maximum agreement supertrees

[...]

Jesper Jansson¹, Joseph Ng¹, Kunihiko Sadakane², Wing-Kin Sung¹•Institutions (2)

National University of Singapore¹, Kyushu University²

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: The maximum agreement supertree problem (MASP) is proved to be NP-hard for any fixed k ≥ 3 when D is unrestricted, and also NP- hard forAny fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves.

...read moreread less

Abstract: Given a set T of rooted, unordered trees, where each T i ∈ T is distinctly leaf-labeled by a set A(T i ) and where the sets Λ(T i ) may overlap, the maximum agreement supertree problem (MASP) is to construct a distinctly leaf-labeled tree Q with leaf set A(Q) ⊆ ∪ Ti ∈ T Λ(T i ) such that |Λ(Q)| is maximized and for each T i ∈ T, the topological restriction of T i to A(Q) is isomorphic to the topological restriction of Q to Λ(T i ). Let n = |U Ti ∈ T Λ(T i )|, k = |T|, and D = max Ti ∈ T {deg(T i )}. We first show that MASP with k = 2 can be solved in O(√D n log(2n/D)) time, which is O(n log n) when D = O(1) and O(n 1.5 ) when D is unrestricted. We then present an algorithm for MASP with D = 2 whose running time is polynomial if k = O(1). On the other hand, we prove that MASP is NP-hard for any fixed k ≥ 3 when D is unrestricted, and also NP-hard for any fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomial-time (n/ log n)-approximation algorithm for MASP.

...read moreread less

40 citations

Journal Article•DOI•

Maximum agreement and compatible supertrees

[...]

Vincent Berry¹, François Nicolas²•Institutions (2)

University of Montpellier¹, University of Helsinki²

01 Sep 2007-Journal of Discrete Algorithms

TL;DR: This paper proposes extensions of MAST and MCT to the context of supertree inference, where input trees have non-identical leaf sets, and shows that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.

...read moreread less

39 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Improved Algorithms for Bipartite Network Flow

[...]

Ravindra K. Ahuja, James B. Orlin, Clifford Stein

01 Oct 1994-SIAM Journal on Computing

TL;DR: It is shown that several maximum flow algorithms can be substantially sped up when applied to unbalanced networks, and ideas are extended to dynamic tree implementations, parametric maximum flows, and minimum-cost flows.

...read moreread less

Abstract: In this paper, network flow algorithms for bipartite networks are studied. A network $G=(V,E)$ is called bipartite if its vertex set $V$ can be partitioned into two subsets $V_1$ and $V_2$ such that all edges have one endpoint in $V_1$ and the other in $V_2$. Let $n=|V|$, $n_1 = |V_1|$, $n_2 = |V_2|$, $m=|E|$ and assume without loss of generality that $n_1 \leq n_2$. A bipartite network is called unbalanced if $n_1 \ll n_2$ and balanced otherwise. (This notion is necessarily imprecise.) It is shown that several maximum flow algorithms can be substantially sped up when applied to unbalanced networks. The basic idea in these improvements is a two-edge push rule that allows one to "charge" most computation to vertices in $V_1$, and hence develop algorithms whose running times depend on $n_1$ rather than $n$. For example, it is shown that the two-edge push version of Goldberg and Tarjan's FIFO preflow-push algorithm runs in $O(n_1 m + n_1^3)$ time and that the analogous version of Ahuja and Orlin's excess scaling algorithm runs in $O(n_1 m + n_1^2 log U)$ time, where $U$ is the largest edge capacity. These ideas are also extended to dynamic tree implementations, parametric maximum flows, and minimum-cost flows.

...read moreread less

176 citations

Proceedings Article•DOI•

On views and XML

[...]

Serge Abiteboul

01 May 1999

TL;DR: The paper focuses on XML and looks on various works of the author on the issue of views in a classical database setting, and looks at these various works in the context of a system of views for XML.

...read moreread less

Abstract: The notion of views is essential in databases, see for instance [29, 30, 5]. It allows various users to see data from different viewpoints. In the present paper, we informally present works of the author on the topic. Instead of addressing the issue of views in a classical database setting, the paper focuses on XML [32] and looks at these various works in the context of a system of views for XML. The Web has revolutionized the electronic publication of data. It has relied primarily on HTML that emphasizes a hypertext document approach. More recently, XML, although originally a document mark-up language, is promoting an approach more focused on data exchange. In XML, explicit structuring is enforced • and presentation is separated from the data content. For data sources containing information with some structure, it is therefore more appropriate to use XML rather than HTML to export their data to the Web. When data is exported via XML, the problem of views becomes essential. Indeed, views in this setting are even more crucial than in standard database applications because (i) one often has to integrate heterogeneous sources and also (ii) views provide the means to add a structured interface on top of some otherwise (more chaotic) semistructured data. In some sense, a language already allows to define views for XML documents, namely XSL. XSL is the current (still unstable) W3C proposal for expressing stylesheets. Although primarily targeted towards presentation, XSL allows to transform/restructure XML documents using templates rules. We are discussing such restructuring here. However, we will ignore presentation issues and will consider more general views than offered by XSL. A view specification for XML data 1 will primarily

...read moreread less

161 citations

Journal Article•DOI•

Kaikoura tree theorems: computing the maximum agreement subtree

[...]

Mike Steel¹, Tandy Warnow•Institutions (1)

Massey University¹

08 Nov 1993-Information Processing Letters

TL;DR: An O(n4.5 log n + V) algorithm to determine the largest agreement subtree of two trees on n leaves, where V is the maximum number of nodes in the trees.

...read moreread less

149 citations

Journal Article•DOI•

An O ( n log n ) Algorithm for the Maximum Agreement Subtree Problem for Binary Trees

[...]

Richard Cole, Martin Farach-Colton¹, Ramesh Hariharan², Teresa M. Przytycka³, Mikkel Thorup² - Show less +1 more•Institutions (3)

Rutgers University¹, Max Planck Society², Johns Hopkins University³

01 May 2000-SIAM Journal on Computing

TL;DR: This work considers the case which occurs frequently in practice, i.e., the case when the trees are binary, and gives an O(nlog n) time algorithm for the maximum agreement subtree problem.

...read moreread less

Abstract: The maximum agreement subtree problem is the following. Given two rooted trees whose leaves are drawn from the same set of items (e.g., species), find the largest subset of these items so that the portions of the two trees restricted to these items are isomorphic. We consider the case which occurs frequently in practice, i.e., the case when the trees are binary, and give an O(nlog n) time algorithm for this problem.

...read moreread less

125 citations

Book•

Information Modelling and Knowledge Bases

[...]

Hannu Kangassalo¹, Setsuo Ohsuga², Hannu Jaakkola³•Institutions (3)

University of Tampere¹, University of Tokyo², Tampere University of Technology³

01 Jan 1990

TL;DR: The topics of the articles cover a wide variety of themes in the domain of information modelling, design and specification of information systems and knowledge bases, ranging from foundations and theories to systems construction and application studies.

...read moreread less

Abstract: This book is concerned with modelling of information in various ways, and the development and use of models in information systems of different kinds. Interesting applications of modelling can also be found in fields other than in information systems, eg. in conceptual modelling of narrative texts. The topics of the articles cover a wide variety of themes in the domain of information modelling, design and specification of information systems and knowledge bases, ranging from foundations and theories to systems construction and application studies. The contributions in this volume represent the following major themes: models in intelligent activity; concept modelling and conceptual modelling; conceptual modelling and information requirements specification; collections of concepts, knowledge base design and data base design; human-computer interaction and modelling; software engineering and modelling; and applications.

...read moreread less

125 citations