scispace - formally typeset
Search or ask a question
Posted Content

An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

TL;DR: In this article, the authors presented an algorithm for comparing trees that are labeled in an arbitrary manner, which is faster than the previous algorithms and is at the core of their maximum agreement subtree algorithm.
Abstract: A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are node-unbalanced or weight-unbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm.
Citations
More filters
Book ChapterDOI
21 Apr 2007
TL;DR: An algorithm for global alignment of two protein-protein interaction (PPI) networks is described, guided by the intuition that a protein should be matched with a protein in the other network if and only if the neighbors of the two proteins can also be well matched, and the results of global alignment are interpreted to identify functional orthologs between yeast and fly.
Abstract: We describe an algorithm, IsoRank, for global alignment of two protein-protein interaction (PPI) networks. IsoRank aims to maximize the overall match between the two networks; in contrast, much of previous work has focused on the local alignment problem-- identifying many possible alignments, each corresponding to a local region of similarity. IsoRank is guided by the intuition that a protein should be matched with a protein in the other network if and only if the neighbors of the two proteins can also be well matched. We encode this intuition as an eigenvalue problem, in a manner analogous to Google's PageRank method. We use IsoRank to compute the first known global alignment between the S. cerevisiae and D. melanogaster PPI networks. The common subgraph has 1420 edges and describes conserved functional components between the two species. Comparisons of our results with those of a well-known algorithm for local network alignment indicate that the globally optimized alignment resolves ambiguity introduced by multiple local alignments. Finally, we interpret the results of global alignment to identify functional orthologs between yeast and fly; our functional ortholog prediction method is much simpler than a recently proposed approach and yet provides results that are more comprehensive.

338 citations

Journal ArticleDOI
TL;DR: The maximum agreement phylogenetic subnetwork problem (MASN) is introduced and it is proved that the problem is NP-hard even if restricted to three phylogenetic networks and an O(n2)-time algorithm is given for the special case of two level-1 phylogenetics networks.

87 citations

Journal ArticleDOI
TL;DR: It is proved that MASP is NP-hard for any fixed $k \geq 3$ when $D$ is unrestricted, and also NP- hard forAny fixed $D \geqi 2$ when £k is unrestricted even if each input tree is required to contain at most three leaves.
Abstract: Given a set $\T$ of rooted, unordered trees, where each $T_i \in \T$ is distinctly leaf-labeled by a set $\Lambda(T_i)$ and where the sets $\Lambda(T_i)$ may overlap, the maximum agreement supertree problem~(MASP) is to construct a distinctly leaf-labeled tree $Q$ with leaf set $\Lambda(Q) \subseteq $\cup$_{T_i \in \T} \Lambda(T_i)$ such that $|\Lambda(Q)|$ is maximized and for each $T_i \in \T$, the topological restriction of $T_i$ to $\Lambda(Q)$ is isomorphic to the topological restriction of $Q$ to $\Lambda(T_i)$. Let $n = \left| $\cup$_{T_i \in \T} \Lambda(T_i)\right|$, $k = |\T|$, and $D = \max_{T_i \in \T}\{\deg(T_i)\}$. We first show that MASP with $k = 2$ can be solved in $O(\sqrt{D} n \log (2n/D))$ time, which is $O(n \log n)$ when $D = O(1)$ and $O(n^{1.5})$ when $D$ is unrestricted. We then present an algorithm for MASP with $D = 2$ whose running time is polynomial if $k = O(1)$. On the other hand, we prove that MASP is NP-hard for any fixed $k \geq 3$ when $D$ is unrestricted, and also NP-hard for any fixed $D \geq 2$ when $k$ is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomial-time $(n/\!\log n)$-approximation algorithm for MASP.

46 citations

Journal Article
TL;DR: The maximum agreement supertree problem (MASP) is proved to be NP-hard for any fixed k ≥ 3 when D is unrestricted, and also NP- hard forAny fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves.
Abstract: Given a set T of rooted, unordered trees, where each T i ∈ T is distinctly leaf-labeled by a set A(T i ) and where the sets Λ(T i ) may overlap, the maximum agreement supertree problem (MASP) is to construct a distinctly leaf-labeled tree Q with leaf set A(Q) ⊆ ∪ Ti ∈ T Λ(T i ) such that |Λ(Q)| is maximized and for each T i ∈ T, the topological restriction of T i to A(Q) is isomorphic to the topological restriction of Q to Λ(T i ). Let n = |U Ti ∈ T Λ(T i )|, k = |T|, and D = max Ti ∈ T {deg(T i )}. We first show that MASP with k = 2 can be solved in O(√D n log(2n/D)) time, which is O(n log n) when D = O(1) and O(n 1.5 ) when D is unrestricted. We then present an algorithm for MASP with D = 2 whose running time is polynomial if k = O(1). On the other hand, we prove that MASP is NP-hard for any fixed k ≥ 3 when D is unrestricted, and also NP-hard for any fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomial-time (n/ log n)-approximation algorithm for MASP.

40 citations

Journal ArticleDOI
TL;DR: This paper proposes extensions of MAST and MCT to the context of supertree inference, where input trees have non-identical leaf sets, and shows that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.

39 citations

References
More filters
Journal ArticleDOI
TL;DR: It is shown that several maximum flow algorithms can be substantially sped up when applied to unbalanced networks, and ideas are extended to dynamic tree implementations, parametric maximum flows, and minimum-cost flows.
Abstract: In this paper, network flow algorithms for bipartite networks are studied. A network $G=(V,E)$ is called bipartite if its vertex set $V$ can be partitioned into two subsets $V_1$ and $V_2$ such that all edges have one endpoint in $V_1$ and the other in $V_2$. Let $n=|V|$, $n_1 = |V_1|$, $n_2 = |V_2|$, $m=|E|$ and assume without loss of generality that $n_1 \leq n_2$. A bipartite network is called unbalanced if $n_1 \ll n_2$ and balanced otherwise. (This notion is necessarily imprecise.) It is shown that several maximum flow algorithms can be substantially sped up when applied to unbalanced networks. The basic idea in these improvements is a two-edge push rule that allows one to "charge" most computation to vertices in $V_1$, and hence develop algorithms whose running times depend on $n_1$ rather than $n$. For example, it is shown that the two-edge push version of Goldberg and Tarjan's FIFO preflow-push algorithm runs in $O(n_1 m + n_1^3)$ time and that the analogous version of Ahuja and Orlin's excess scaling algorithm runs in $O(n_1 m + n_1^2 log U)$ time, where $U$ is the largest edge capacity. These ideas are also extended to dynamic tree implementations, parametric maximum flows, and minimum-cost flows.

176 citations

Proceedings ArticleDOI
01 May 1999
TL;DR: The paper focuses on XML and looks on various works of the author on the issue of views in a classical database setting, and looks at these various works in the context of a system of views for XML.
Abstract: The notion of views is essential in databases, see for instance [29, 30, 5]. It allows various users to see data from different viewpoints. In the present paper, we informally present works of the author on the topic. Instead of addressing the issue of views in a classical database setting, the paper focuses on XML [32] and looks at these various works in the context of a system of views for XML. The Web has revolutionized the electronic publication of data. It has relied primarily on HTML that emphasizes a hypertext document approach. More recently, XML, although originally a document mark-up language, is promoting an approach more focused on data exchange. In XML, explicit structuring is enforced • and presentation is separated from the data content. For data sources containing information with some structure, it is therefore more appropriate to use XML rather than HTML to export their data to the Web. When data is exported via XML, the problem of views becomes essential. Indeed, views in this setting are even more crucial than in standard database applications because (i) one often has to integrate heterogeneous sources and also (ii) views provide the means to add a structured interface on top of some otherwise (more chaotic) semistructured data. In some sense, a language already allows to define views for XML documents, namely XSL. XSL is the current (still unstable) W3C proposal for expressing stylesheets. Although primarily targeted towards presentation, XSL allows to transform/restructure XML documents using templates rules. We are discussing such restructuring here. However, we will ignore presentation issues and will consider more general views than offered by XSL. A view specification for XML data 1 will primarily

161 citations

Journal ArticleDOI
TL;DR: An O(n4.5 log n + V) algorithm to determine the largest agreement subtree of two trees on n leaves, where V is the maximum number of nodes in the trees.

149 citations

Journal ArticleDOI
TL;DR: This work considers the case which occurs frequently in practice, i.e., the case when the trees are binary, and gives an O(nlog n) time algorithm for the maximum agreement subtree problem.
Abstract: The maximum agreement subtree problem is the following. Given two rooted trees whose leaves are drawn from the same set of items (e.g., species), find the largest subset of these items so that the portions of the two trees restricted to these items are isomorphic. We consider the case which occurs frequently in practice, i.e., the case when the trees are binary, and give an O(nlog n) time algorithm for this problem.

125 citations

Book
01 Jan 1990
TL;DR: The topics of the articles cover a wide variety of themes in the domain of information modelling, design and specification of information systems and knowledge bases, ranging from foundations and theories to systems construction and application studies.
Abstract: This book is concerned with modelling of information in various ways, and the development and use of models in information systems of different kinds. Interesting applications of modelling can also be found in fields other than in information systems, eg. in conceptual modelling of narrative texts. The topics of the articles cover a wide variety of themes in the domain of information modelling, design and specification of information systems and knowledge bases, ranging from foundations and theories to systems construction and application studies. The contributions in this volume represent the following major themes: models in intelligent activity; concept modelling and conceptual modelling; conceptual modelling and information requirements specification; collections of concepts, knowledge base design and data base design; human-computer interaction and modelling; software engineering and modelling; and applications.

125 citations